A few weeks ago, the Geocoded National Address File (a.k.a. G-NAF) was made openly available by PSMA, an unlisted public company formed by the nine governments of Australia to collate and standardise, format and aggregate location data from each of the jurisdictions into authoritative location based national datasets. The reason this is such great news is nicely summarised on data.gov.au
G-NAF is one of the most ubiquitous and powerful spatial datasets. It contains more than 13 million Australian physical address records. The records include geocodes. These are latitude and longitude map coordinates.
If you're doing anything with geospatial or address data within Australia, this is going to be invaluable to you.
Loading the data into SQL Server
When you’ve downloaded the data, you’ll find that all of the data is available in PSV (Pipe Separated Value) format, along with a few SQL scripts that can create the tables and a view for the data. Being someone that wanted to get up and running to explore the data, I wrote a little F# script to load the data into SQL Server that I wanted to share. It fixes a couple of issues with the scripts, namely, the non-idempotency nature of the table creation script (it tries to drop tables that may not be there) and making the view creation script compatible with older versions of SQL Server.
open System open System.Data open System.Data.SqlClient open System.IO open System.Text.RegularExpressions let downloadDir = @"C:\Users\russ\Downloads\FEB16_GNAF+EULA_PipeSeparatedValue_20160222170142" let connectionString = @"Server=.;Database=gnaf;Integrated Security=true;" let baseDir = Path.Combine(downloadDir, @"FEB16_GNAF_PipeSeparatedValue_20160222170142\G-NAF") let tableScriptsDir = Path.Combine(baseDir, @"Extras\GNAF_TableCreation_Scripts") let createTables = Path.Combine(tableScriptsDir, "create_tables_sqlserver.sql") let constraints = Path.Combine(tableScriptsDir, "add_fk_constraints.sql") let createView = Path.Combine(baseDir, @"Extras\GNAF_View_Scripts\address_view.sql") let dataDir = Path.Combine(baseDir, @"G-NAF FEBRUARY 2016\Standard") // fix the non-idempotency of the table creation script :( let lines = File.ReadAllLines(createTables) for i = 0 to lines.Length - 1 do let current = lines.[i] let dropTableLine = Regex.Match(current, @"^DROP TABLE (?<table>.*?);$") if dropTableLine.Success then lines.[i] <- (sprintf "IF OBJECT_ID('%s', 'U') IS NOT NULL\r\n %s" dropTableLine.Groups.["table"].Value current) File.WriteAllLines(createTables, lines) let connection = new SqlConnection(connectionString) connection.Open() // fix compatibility of ADDRESS_VIEW :( let dropView = new SqlCommand("IF OBJECT_ID('ADDRESS_VIEW', 'V') IS NOT NULL\r\n DROP VIEW ADDRESS_VIEW;", connection) if dropView.ExecuteNonQuery() <> -1 then failwith "error dropping view" let viewLines = File.ReadAllLines(createView) if viewLines..StartsWith("CREATE OR REPLACE VIEW ADDRESS_VIEW") then viewLines. <- "CREATE VIEW ADDRESS_VIEW" File.WriteAllLines(createView, viewLines) for setupStep in [| createTables; constraints; createView |] do let command = new SqlCommand(File.ReadAllText(setupStep), connection) if command.ExecuteNonQuery() <> -1 then failwith (sprintf "Received failure return value for %s" setupStep) for file in Directory.EnumerateFiles(dataDir) do let fileInfo = FileInfo file let rMatch = Regex.Match(fileInfo.Name, "^(?<state>[^_]*)_(?<table>.*?)_psv.psv$") let state = rMatch.Groups.["state"].Value let table = rMatch.Groups.["table"].Value let bulkInsert = sprintf "BULK INSERT %s FROM '%s' WITH (FIELDTERMINATOR = '|', FIRSTROW = 2)" table fileInfo.FullName let command = new SqlCommand(bulkInsert, connection) command.CommandTimeout <- 300 let returnValue = command.ExecuteNonQuery() if returnValue = 0 then failwith (sprintf "no records inserted into %s for %s" table state) else Console.WriteLine (sprintf "inserted %i records into %s for %s" returnValue table state) Console.WriteLine "finished!" connection.Close()
Simply change the
connectionString to point to the download directory of the G-NAF data and the connection string for the database, respectively.