X Tutup
The Wayback Machine - https://web.archive.org/web/20201125090211/https://github.com/SheetJS/sheetjs/pull/1348
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add | (pipe) as a potential CSV cell separator. #1348

Open
wants to merge 4 commits into
base: master
from

Conversation

@amfine-soft-drault
Copy link

@amfine-soft-drault amfine-soft-drault commented Nov 9, 2018

Hi
A number of financial file format standards make use of the pipe as a CSV cell separator
it would be nice if you could accept this feature
I used 0 as weight thinking it was the lowest priority (i hope i didn't misread the code)

let me known if you need more info/some changes to accept the pull request
Thanks
David

A number of financial file format standards make use of the pipe as a CSV cell separator
@amfine-soft-drault
Copy link
Author

@amfine-soft-drault amfine-soft-drault commented Oct 17, 2019

Hi guys
any chance this patch/feature would be approved ?

@SheetJSDev
Copy link
Contributor

@SheetJSDev SheetJSDev commented Oct 17, 2019

Even though Excel doesn't normally handle pipe separators, this PR is mostly ok. It's better to set the weights to 4/3/2/1.

Related questions:

  1. What standards use the pipe character? (pick a relatively important standard we can note in the README)

  2. Should ASCII 0x01, used as a delimiter in FIX, also be supported? (your opinion)

  3. Should ASCII 0x1F (field sep) and ASCII 0x1E (row sep) also be supported? (your opinion)

* upstream/master:
  TSV Files can start with tab characters
  bug: Remove white spaces due to html tags (#1622)
  fixing some typos in the documentation
  initial release of S [ci skip]
  Fix #1244
  Fix issue #1536
  version bump 0.15.1
  version bump 0.15.0: mini build
  version bump 0.14.5: XLS grind
  README Fix (fixes #1546)
  version bump 0.14.4
  travis config
  to_csv skipHidden corner case (fixes #1508)
  version bump 0.14.3: formula niggle (closes #1388)
  version bump 0.14.2: comment xml (fixes #1468)
  README use typed array (fixes #1362)
  version bump 0.14.1: AutoFilter issues
@amfine-soft-drault
Copy link
Author

@amfine-soft-drault amfine-soft-drault commented Oct 17, 2019

Thanks for your reply
My answers :

  • i just commit the 4/3/2/1 priority weights
  • examples of standards using the pipe character : EPT (European PRIIPS Template), EMT (European MIFID Template) see https://findatex.eu/
    please note that the pipe is not specified by those but is the de facto cell separator that the industry is using to exchange those files
  • as for 0x01, 0x1F as separators, i don't think i really have an opinion to offer : unless i'm mistaken those are invisible chars, i've never seen a use case in which they were used/needed (even tab-separated CSV are, in my experience, really not that common)
@amfine-soft-drault
Copy link
Author

@amfine-soft-drault amfine-soft-drault commented Oct 22, 2019

Hello again
i found a surprising behaviour - i don't whether it's related to my patch or not
so here it comes may be it will be meaningful to you

when parsing (valid) csv files with pipe as separator, sometimes XSLX does not detect the pipe as the separator even though it is by far the most frequent of the separators
if i replace all pipes with ; it detects the ; as separator
if i replace all pips with , it detects the , as separator
i finally found what (seems to) causes the issue : it happens when the whole file does NOT contains any , or ; !!
if this is the case, it seems the detection falls back to an incorrect separator
i tried a (dirty) workaround in my own code using XLSX :
i added an extra column (replaced the first end-of-line with |, pipe-comma)
then hurray the parsing actually uses the pipe as separator
meaning

col 1|col 2|col 3
val 1|val 2|val 3

will fail to detect the pipe whereas

col 1|col 2|col 3|fa,ke
val 1|val 2|val 3

would!
if any of the cell value on any line of the file did contain a , or a ; the issue would not happen either
(this is what made my analysis harder to narrow down what was wrong)

i tried to track the issue in the code but lost myself :(

@SheetJSDev SheetJSDev force-pushed the SheetJS:master branch from 0786b99 to 3b589f0 Aug 12, 2020
@SheetJSDev SheetJSDev force-pushed the SheetJS:master branch from 4254ed4 to eec93b0 Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.
X Tutup