Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • C csvkit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 61
    • Issues 61
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wireservice
  • csvkit
  • Issues
  • #280
Closed
Open
Issue created May 28, 2014 by Administrator@rootContributor

Support date format hints

Created by: samcrawford

Excel will output CSVs with dates in the user's locale, so a British user will see dates in the DD/MM/YYYY format, whilst an American user would see MM/DD/YYYY.

Currently it appears that csvkit (although I've only really been using csvsql) will always try MM/DD/YYYY first, and if that fails to parse, then it will fall back to DD/MM/YYYY.

The problem with this approach arises when you're using a DD/MM/YYYY formatted sheet and you have ambiguity in some dates. For example:

02/01/2014 31/12/2012

This will produce dates in the database of:

2014-01-02 (incorrectly parsed as MM/DD/YYYY) 2012-12-31

So you silently end up with a mixture of correct and incorrect dates, which is not ideal!

Ideally there'd be an option one could pass to csvkit programs to specify a preference list for parsing dates. It'd be overkill to specify every format (I think), so having some abbreviations would suffice. For example:

--date-formats "dmy,mdy"

The default date parsing schema would then follow after this if the formats in the list did not match successfully.

An extension to this may be to try to infer the date format by examining all rows globally and selecting the format that successfully parses them all (of course, this is still no guarantee of success as people may only have ambiguous dates present).

Assignee
Assign to
Time tracking