PSMA Geocoded National Address File made openly available

A few weeks ago, the Geocoded National Address File (a.k.a. G-NAF) was made openly available by PSMA, an unlisted public company formed by the nine governments of Australia to collate and standardise, format and aggregate location data from each of the jurisdictions into authoritative location based national datasets. The reason this is such great news is nicely summarised on data.gov.au

G-NAF is one of the most ubiquitous and powerful spatial datasets. It contains more than 13 million Australian physical address records. The records include geocodes. These are latitude and longitude map coordinates.

If you're doing anything with geospatial or address data within Australia, this is going to be invaluable to you.

Loading the data into SQL Server

When you’ve downloaded the data, you’ll find that all of the data is available in PSV (Pipe Separated Value) format, along with a few SQL scripts that can create the tables and a view for the data. Being someone that wanted to get up and running to explore the data, I wrote a little F# script to load the data into SQL Server that I wanted to share. It fixes a couple of issues with the scripts, namely, the non-idempotency nature of the table creation script (it tries to drop tables that may not be there) and making the view creation script compatible with older versions of SQL Server.

open System
open System.Data
open System.Data.SqlClient
open System.IO
open System.Text.RegularExpressions

let downloadDir = @"C:\Users\russ\Downloads\FEB16_GNAF+EULA_PipeSeparatedValue_20160222170142"
let connectionString = @"Server=.;Database=gnaf;Integrated Security=true;"

let baseDir = Path.Combine(downloadDir, @"FEB16_GNAF_PipeSeparatedValue_20160222170142\G-NAF")
let tableScriptsDir = Path.Combine(baseDir, @"Extras\GNAF_TableCreation_Scripts")
let createTables = Path.Combine(tableScriptsDir, "create_tables_sqlserver.sql")
let constraints = Path.Combine(tableScriptsDir, "add_fk_constraints.sql")
let createView = Path.Combine(baseDir, @"Extras\GNAF_View_Scripts\address_view.sql")
let dataDir = Path.Combine(baseDir, @"G-NAF FEBRUARY 2016\Standard")

// fix the non-idempotency of the table creation script :(
let lines = File.ReadAllLines(createTables)
for i = 0 to lines.Length - 1 do
    let current = lines.[i]
    let dropTableLine = Regex.Match(current, @"^DROP TABLE (?<table>.*?);$")
    if dropTableLine.Success
    then lines.[i] <- (sprintf "IF OBJECT_ID('%s', 'U') IS NOT NULL\r\n  %s" dropTableLine.Groups.["table"].Value current)
File.WriteAllLines(createTables, lines)

let connection = new SqlConnection(connectionString)
connection.Open()

// fix compatibility of ADDRESS_VIEW :(
let dropView = new SqlCommand("IF OBJECT_ID('ADDRESS_VIEW', 'V') IS NOT NULL\r\n  DROP VIEW ADDRESS_VIEW;", connection)
if dropView.ExecuteNonQuery() <> -1 then failwith "error dropping view"
let viewLines = File.ReadAllLines(createView)
if viewLines.[0].StartsWith("CREATE OR REPLACE VIEW ADDRESS_VIEW")
then 
    viewLines.[0] <- "CREATE VIEW ADDRESS_VIEW"
    File.WriteAllLines(createView, viewLines)

for setupStep in [| createTables; constraints; createView |] do
    let command = new SqlCommand(File.ReadAllText(setupStep), connection)
    if command.ExecuteNonQuery() <> -1 then failwith (sprintf "Received failure return value for %s" setupStep)

for file in Directory.EnumerateFiles(dataDir) do
    let fileInfo = FileInfo file
    let rMatch = Regex.Match(fileInfo.Name, "^(?<state>[^_]*)_(?<table>.*?)_psv.psv$")
    let state = rMatch.Groups.["state"].Value
    let table = rMatch.Groups.["table"].Value
    let bulkInsert = sprintf "BULK INSERT %s FROM '%s' WITH (FIELDTERMINATOR = '|', FIRSTROW = 2)" table fileInfo.FullName
    let command = new SqlCommand(bulkInsert, connection)
    command.CommandTimeout <- 300
    let returnValue = command.ExecuteNonQuery() 
    if returnValue = 0 then failwith (sprintf "no records inserted into %s for %s" table state)
    else Console.WriteLine (sprintf "inserted %i records into %s for %s" returnValue table state)

Console.WriteLine "finished!"
connection.Close()

Simply change the downloadDir and connectionString to point to the download directory of the G-NAF data and the connection string for the database, respectively.

Comments

comments powered by Disqus