Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • C csvkit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 61
    • Issues 61
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wireservice
  • csvkit
  • Issues
  • #200
Closed
Open
Issue created Jan 17, 2013 by Administrator@rootContributor

csvsql --or {ignore,replace}

Created by: shawnbot

I'm working on tools to import messy and sometimes redundant CSVs into a sqlite database, and csvsql mostly does what I want (and, FWIW, is just totally awesome). However, what I'd like to be able to do is create a unique key constraint on the table I'm inserting and have my inserts statements include an OR IGNORE clause to skip the duplicate rows. My initial thinking was that if csvsql could write SQL to stdout I could just modify the output and pipe it back through sqilte, but #147 (closed) seems to suggest that this isn't possible.

That's why I'm suggesting the introduction of a new flag to csvsql:

csvsql --or {ignore,replace}

I'm not sure off the top of my head if SQLAlchemy provides a place for the OR clause, but if it does (or if the variations across the different APIs are minor), then I think it would be worth implementing, because every single potential use that I've found for csvsql has been stymied by unique key constraints.

For example, some open data is published daily in rolling intervals (e.g., one week), so importing data daily into a larger archive means knowing which swath of the data to ignore on import.

I've got some more specific examples if you're interested. Let me know if I can help!

Assignee
Assign to
Time tracking