 |
Author |
|
Topic: Improve CSV standard
[Link=243] |
|
rlrandallx
New Member
Posts: 2

|
|
Improve CSV standard (Date posted: 11/05/11 at 16:41:24) |
 |
Improve CSV standard format
Is CSV so perfect that the standard format can not be improved upon? HTML, and XML standards are changing each year. But CSV never seems to change. My biggest pet peeve is that Excel doesn't seem to know how big the columns should be. The logical thing would be to read the first 100 records (or whole file!) and take the length of longest string
as the column width.
Better yet CSV could CHANGE its standard (Heaven Forbid!) and recognize the first record as a special "Header Record" allowing for some new features. How about the following:
"NAME"{20}, "ADDRESS"{15}, "CITY"{20}, "STATE"{2}, "ZIP"{10},
"B. OBABA","1600 Pennsyvania Ave.", "Washington", "DC","12345-6789"
---------------------------------------------
Include the column lengths in the header and avoid all the positioning inside Excel. After all it is based on the DATA not the tool reading it.
Maybe we could allow regex delimiters e.g. "{.*,}"
and include other metadata for each column.
But that is probably going too far.
-Robin
|
|
Ip: Logged |
|
JRepici
Administrator
  
Posts: 328

Gender: 
|
|
Re: Improve CSV standard (Date posted: 11/08/11 at 12:16:47) |
 |
You're preaching to the choir!
Many of those concerns of yours also occurred to me while figuring out all the crazy quirks and subtle pitfalls of CSV for the article (it has been a while since there were any new reports, but I'm sure there are still more left to find in there).
At some point, after dealing with all those shortcomings of CSV, I decided to write a new format that would remain simple, but include generalized functionality. Exactly the kind of stuff you're talking about. Check out the article: CTX - Creativyst® Table Exchange Format
You'll recognize the "\Q" header for field types. Pretty close to your suggestion eh? Actually, in your case you would use two CTX headers \Q for SQL types, and \L for field Labels.
So:
\QVARCHAR(20)|VARCHAR(15)|VARCHAR(20)|VARCHAR(2)|VARCHAR(10)
\LName|Address|City|State|Zip
You wouldn't really have to use SQL types to do what you want. You could simply use the \D (recommended Display Size) type. It gives you the ability to specify how it's aligned as well. You could add a maximum field length for the fields as well, using \X
\LName|Address|City|State|Zip
\D20L|15L|20L|2|10R
\X30|30|30|4|15
Finally, the \D header would normally take fprintf() styled format strings. They are the recommended usage (just keeping it simple in the above example).
Let me know what you think.
-djr
|
|
Ip: Logged |
|
rlrandallx
New Member
Posts: 2

|
|
Re: Improve CSV standard (Date posted: 11/07/11 at 17:15:52) |
 |
Fantastic! Now, how do you get it accepted so MS, Google and Other players include it in their software? Take it to IBM or Apple and show them how much it will save them?
I had a thought to naming it "PSV" for "Pipe Separated Variables". (No quotes needed unless scan reveals extra "pipes" per line.)
-Robin
|
|
Ip: Logged |
|
JRepici
Administrator
  
Posts: 328

Gender: 
|
|
Re: Improve CSV standard (Date posted: 11/08/11 at 13:31:04) |
 |
Ha! Yeah, pipes. Not sure why I put commas in my original post.
It's called Creativyst® Table Exchange (CTX) format, btw (a good, descriptive, name) and other than a requirement to give credit where credit is due, the functionality is freely offered. PSV is an existing format that is essentially just CSV, in which the commas are replaced with pipes.
Also, CTX doesn't use quotes to enclose new-lines (a source of many headaches in CSV). It, instead, uses \n and \r to escape those characters in the records. In CTX, the new-line character serves ONLY as a record-delimiter, though records can be broken across lines (e.g., to deal with transport limitations) using the line-extend escape character \l (lower-case L).
(oops. P.S. I read right past you there, sorry. Pipes are escaped with \p, so no need to use quotes for them either. Another source of quirks squashed.)
-djr
|
|
Ip: Logged |
|
|
|