HAG-GIS geocoding pipeline overview
HAG-GIS
OtherDigitising Scotland / University of St Andrews
2014–2016
PythonPostGISPostgreSQLOpen SourceGeocodingNLP
Project Description
HAG-GIS (Historical Address Geocoder — GIS) is an open-source automated geocoding tool developed as part of the Digitising Scotland project. It was designed to handle the unique challenges of pre-standardised historical addresses from nineteenth and twentieth century Scottish vital event registers.
The tool addresses problems that standard geocoders cannot handle:
- Historical street name variations and spelling inconsistencies
- Abbreviations specific to Scottish civil registration practice
- Address formats that predate the modern postcode system
- Matching records to census enumeration districts rather than point locations
- Handling ambiguity where multiple candidate locations exist
The pipeline was capable of processing millions of records in batch mode, with configurable confidence thresholds and logging for manual review of uncertain matches.
Technical Architecture
- Pre-processing: Address string normalisation, tokenisation, abbreviation expansion
- Candidate generation: Fuzzy matching against a reference gazetteer (PostGIS)
- Spatial disambiguation: Ranking candidates using temporal and spatial priors
- Output: Georeferenced records with match confidence scores
Skills Used
- Python 2.7 / 3.x
- PostGIS / PostgreSQL
- Natural language processing (address parsing)
- Fuzzy string matching
- Large-scale batch data processing