By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,995 Members | 1,217 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,995 IT Pros & Developers. It's quick & easy.

csv Parser Question - Handling of Double Quotes

P: n/a
Hello,

I am trying to read a csv file. I have the following functioning
code:

---- BEGIN ----
import csv

reader = csv.reader(open("test.csv", "rb"), delimiter=';')

for row in reader:
print row
---- END ----

This code will successfully parse my csv file formatted as such:

"this";"is";"a";"test"

Resulting in an output of:

['this', 'is', 'a', 'test']

However, if I modify the csv to:

"t"h"is";"is";"a";"test"

The output changes to:

['th"is"', 'is', 'a', 'test']

My question is, can you change the behavior of the parser to only
remove quotes when they are next to the delimiter? I would like both
quotes around the h in the example above to remain, however it is
instead removing only the first two instances of quotes it runs across
and leaves the others.

The closest solution I have found is to add to the reader command
"escapechar='\\'" then manually add a single \ character before the
quotes I'd like to keep. But instead of writing something to add
those slashes before csv parsing I was wondering if the parser can
handle it instead.

Thanks in advance for the help.
Mar 27 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
En Thu, 27 Mar 2008 17:37:33 -0300, Aaron Watters
<aa***********@gmail.comescribió:
>"this";"is";"a";"test"

Resulting in an output of:

['this', 'is', 'a', 'test']

However, if I modify the csv to:

"t"h"is";"is";"a";"test"

The output changes to:

['th"is"', 'is', 'a', 'test']

I'd be tempted to say that this is a bug,
except that I think the definition of "csv" is
informal, so the "bug/feature" distinction
cannot be exactly defined, unless I'm mistaken.
AFAIK, the csv module tries to mimic Excel behavior as close as possible.
It has some test cases that look horrible, but that's what Excel does...
I'd try actually using Excel to see what happens.
Perhaps the behavior could be more configurable, like the codecs are.

--
Gabriel Genellina

Mar 27 '08 #2

P: n/a
On Mar 27, 6:00 pm, John Machin <sjmac...@lexicon.netwrote:
...The Python csv module emulates Excel in delivering garbage silently in
cases when the expected serialisation protocol has (detectably) not
been followed....
Fine, but I'd say the heuristic adopted produces
bizarre and surprising results in the illustrated case.
It's a matter of taste of course...
-- Aaron Watters

===
http://www.xfeedme.com/nucular/pydis...=mondo+bizarre
Mar 28 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.