HAG-GIS geocoding pipeline overview

HAG-GIS

Other

Digitising Scotland / University of St Andrews

2014–2016

PythonPostGISPostgreSQLOpen SourceGeocodingNLP

Project Description

HAG-GIS (Historical Address Geocoder — GIS) is an open-source automated geocoding tool developed as part of the Digitising Scotland project. It was designed to handle the unique challenges of pre-standardised historical addresses from nineteenth and twentieth century Scottish vital event registers.

The tool addresses problems that standard geocoders cannot handle:

  • Historical street name variations and spelling inconsistencies
  • Abbreviations specific to Scottish civil registration practice
  • Address formats that predate the modern postcode system
  • Matching records to census enumeration districts rather than point locations
  • Handling ambiguity where multiple candidate locations exist

The pipeline was capable of processing millions of records in batch mode, with configurable confidence thresholds and logging for manual review of uncertain matches.

Technical Architecture

  1. Pre-processing: Address string normalisation, tokenisation, abbreviation expansion
  2. Candidate generation: Fuzzy matching against a reference gazetteer (PostGIS)
  3. Spatial disambiguation: Ranking candidates using temporal and spatial priors
  4. Output: Georeferenced records with match confidence scores

Skills Used

  • Python 2.7 / 3.x
  • PostGIS / PostgreSQL
  • Natural language processing (address parsing)
  • Fuzzy string matching
  • Large-scale batch data processing

Dr. Konstantinos DarasSenior Research Fellow in Health Data Science and AI

University of Liverpool, Waterhouse Building, Block F, Liverpool, L69 3GF, UK

© 2026 Dr. Konstantinos Daras. All rights reserved.