Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • C csvkit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 61
    • Issues 61
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wireservice
  • csvkit
  • Issues
  • #1160
Closed
Open
Issue created Jan 24, 2022 by Administrator@rootContributor

csvsql: specify --query multiple times

Created by: badbunnyyy

Thank you for cvskit. I find it very useful.

I propose an enhancement to csvsql.py. It would be nice to be able to specify --query multiple times.

Please see the attached diff for my modifications. They work well for me and don't break anything.

Being able to specify multiple queries would be nice to eg. preprocess the data with sql commands in a first file, then run your ad-hoc query from the command line, followed by sql commands from a second file to output the data in a specific way.

It would also be useful for dynamically building processing chains from a script and many other scenarios.

The alternatives would be to have a wrapper script gather the sql commands from various sources and feed them into csvsql either as a very long command line or as a temporary file. Both would be cumbersome and prone to breaking.

Another alternative would be to invoke csvsql multiple times in a pipeline. This would also be more inconvenient from the command line and less efficient, as each step would need to output the data, followed by the next step reading and recreating the database.

Please note that already now, --query can be specified multiple times. However, only the last query is processed and all the previous queries are silently discarded. This is very unintuitive.

csvsql.py.diff.txt

Thank you for considering my proposal.

Assignee
Assign to
Time tracking