Dedupe Geocoder
Demonstration app to show how Dedupe might be used as a geocoder
Part of the Dedupe.io cloud service and open source toolset for de-duplicating and finding fuzzy matches in your data.
Setup
Install OS level dependencies:
- Python 3.4
- PostgreSQL 9.4 +
Install app requirements
We recommend using virtualenv and virtualenvwrapper for working in a virtualized development environment. Read how to set up virtualenv.
Once you have virtualenvwrapper set up,
mkvirtualenv dedupe-geocoder
git clone https://github.com/datamade/dedupe-geocoder.git
cd dedupe-geocoder
pip install -r requirements.txt
cp geocoder/app_config.py.example geocoder/app_config.pyIn app_config.py, put your Postgres user in DB_USER and password in DB_PW.
Afterwards, whenever you want to work on dedupe-geocoder,
workon dedupe-geocoderSetup your database
Before we can run the website, we need to create a database.
createdb geocoderThen, we run the loadAddresses.py script to download our data from the Cook
County data portal.
python loadAddresses.py --download --load_data This command will take between 15-45 min depending on your internet connection.
You can run loadAddresses.py again to get the latest data from the Cook
County, add more training data, or create a table of block keys for dedupe to
use to match new records. Useful flags are:
--download Download fresh address data.
--load_data Load downloaded address data into database.
--train Add more training data and save settings file.
--block After training, create the block table used by dedupe for matching.
Running Dedupe Geocoder
To run locally:
workon dedupe-geocoder
python runserver.py
navigate to http://localhost:5000/
Team
- Eric van Zanten - developer
- Derek Eder - developer
- Forest Gregg - developer
- Cathy Deng - developer
Errors / Bugs
If something is not behaving intuitively, it is a bug, and should be reported. Report it here: https://github.com/datamade/dedupe-geocoder/issues
Note on Patches/Pull Requests
- Fork the project.
- Make your feature addition or bug fix.
- Commit, do not mess with rakefile, version, or history.
- Send a pull request. Bonus points for topic branches.
Copyright
Copyright (c) 2015 DataMade. Released under the MIT License.

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
