Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • C csvkit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 61
    • Issues 61
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wireservice
  • csvkit
  • Merge requests
  • !202

Fuzzy join

  • Review changes

  • Download
  • Email patches
  • Plain diff
Closed Administrator requested to merge github/fork/lukerosiak/master into master Feb 20, 2013
  • Overview 0
  • Commits 1
  • Pipelines 0
  • Changes 8

Created by: lukerosiak

Don't know if you have any interest in adding this, but I've made a quick fuzzy join csv tool that uses levenstein to do closest-string matching, and I use it when I have two csv's including names that are standardized differently and I want to reconcile them, incorporating IDs from one spreadsheet into the other.

(Seems like a common question people have on NICAR-L--how to reconcile names across two separate sets, whereas Google Refine only lumps similar names together within one set.)

This is the first time I've ever tried to merge code into a project, so apologies if something's not right. I did include a simple test (and CSVs in the examples directory) that do a fuzzy join on a list of all black lawmakers, pulling in the correct IDs and other information from a master list of all lawmakers (which spells their names differently).

As noted in the markdown file I've added, it depends on this repo: https://github.com/seatgeek/fuzzywuzzy which I've added to requirements.txt.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/lukerosiak/master