Australian house price data

February 15, 2020 - Reading time: 4 minutes

Australia has a love affair with property. Some speculate that Australian property is the largest bubble in the history of capitalism. Our ratio of household debt to disposable income is the highest in the world. This obsession with property has warped our national politics, distorting policy and influencing federal elections.

In Western Australia and most notably in suburban Perth prices have now been falling for over 5 years, from mid-2014 to present. The graph below (source) neatly summarises this decline up to April 2019 (the decline has since continued).

I was curious to see the effect of this decline on the advertised asking price of properties within a handful of blue chip Perth suburbs. Sales data is easily obtainable, however that only records the final sales price of properties. With homes taking longer to sell and anecdotal evidence suggesting multiple price reductions prior to sale, I was curious to visualise just how long some properties were taking to sell, and how many times prices were reduced during sales campaigns. With a collection of advertised prices from public listings, I plotted these on a map showing a table of asking price by date for each geolocated address. Each property is also linked to the Domain property profile page for that property, allowing you to see whether the property eventually sold, and how much for (the Domain profile page does not always record all price fluctuations during a sales campaign). Here is an example of the results, showing a single property in Mt Lawley, and here is the Domain property profile page linked to from the property pin. In this one example, we can see that the asking price of this property was up to $360,000 more than the eventual sale price.


The output for a single Perth suburb, Mt Lawley, can be seen here

Approach

The source data is stored is stored in a lightweight relational database and for the Mt Lawley data consists of ~11,000 individual asking prices across ~3,300 individual addresses.

The HTML file is generated using a Python script that

  • Uses the pandas read_sql_query method to read data into a data frame directly from a SQL query
  • Indexes the resulting data frame on address, lat, and lon fields using the set_index method
  • Groups prices by address in a dictionary
  • Uses Folium to visualise the price data on a leaflet.js map
  • Uses the Folium MarkerCluster plugin to group the data points and declutter the map as show below


Issues

The geolocation of properties is not always correct, with multiple properties being placed at the same coordinates. This is an issue with the source data which can be cleansed prior to generating the visualisation. The price data is reported in multiple formats including

  • 'Low to mid 1's' (an asking price range of ~$500,000!)
  • '1.5 mill'
  • 'High 800's'
  • 'Expressions of interest'

The source data is cleansed as best as possible to convert these to actual numbers wherever possible, but not all asking prices can be converted to actual numbers for comparison (!).