By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,986 Members | 1,583 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,986 IT Pros & Developers. It's quick & easy.

split CSV fields

P: n/a
What is a most simple expression for splitting a CSV line with "-protected fields?

s='"123","a,b,\"c\"",5.640'
Nov 16 '06 #1
Share this Question
Share on Google+
10 Replies


P: n/a

s.split(',');
robert wrote:
What is a most simple expression for splitting a CSV line with "-protected fields?

s='"123","a,b,\"c\"",5.640'
Nov 16 '06 #2

P: n/a
robert wrote:
What is a most simple expression for splitting a CSV line
with "-protected fields?

s='"123","a,b,\"c\"",5.640'
import csv

the preferred way is to read the file using that module. if you insist
on processing a single line, you can do

cols = list(csv.reader([string]))

</F>

Nov 16 '06 #3

P: n/a
robert wrote:
What is a most simple expression for splitting a CSV line with "-protected
fields?

s='"123","a,b,\"c\"",5.640'
Use the csv-module. It should have a dialect for this, albeit I'm not 100%
sure if the escaping of the " is done properly from csv POV. Might be that
it requires excel-standard.

Diez
Nov 16 '06 #4

P: n/a
robert wrote:
What is a most simple expression for splitting a CSV line with "-protected
fields?

s='"123","a,b,\"c\"",5.640'
>>import csv
class mydialect(csv.excel):
.... escapechar = "\\"
....
>>csv.reader(['"123","a,b,\\"c\\"",5.640'], dialect=mydialect).next()
['123', 'a,b,"c"', '5.640']

Peter

Nov 16 '06 #5

P: n/a
Fredrik Lundh wrote:
robert wrote:
What is a most simple expression for splitting a CSV line
with "-protected fields?

s='"123","a,b,\"c\"",5.640'

import csv

the preferred way is to read the file using that module. if you insist
on processing a single line, you can do

cols = list(csv.reader([string]))

</F>
Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
| >>import csv
| >>s='"123","a,b,\"c\"",5.640'
| >>cols = list(csv.reader([s]))
| >>cols
[['123', 'a,b,c""', '5.640']]
# maybe we need a bit more:
| >>cols = list(csv.reader([s]))[0]
| >>cols
['123', 'a,b,c""', '5.640']

I'd guess that the OP is expecting 'a,b,"c"' for the second field.

Twiddling with the knobs doesn't appear to help:

| >>list(csv.reader([s], escapechar='\\'))[0]
['123', 'a,b,c""', '5.640']
| >>list(csv.reader([s], escapechar='\\', doublequote=False))[0]
['123', 'a,b,c""', '5.640']

Looks like a bug to me; AFAICT from the docs, the last attempt should
have worked.

Cheers,
John

Nov 16 '06 #6

P: n/a
John Machin wrote:
Fredrik Lundh wrote:
robert wrote:
What is a most simple expression for splitting a CSV line
with "-protected fields?
>
s='"123","a,b,\"c\"",5.640'
import csv

the preferred way is to read the file using that module. if you insist
on processing a single line, you can do

cols = list(csv.reader([string]))

</F>

Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
| >>import csv
| >>s='"123","a,b,\"c\"",5.640'
| >>cols = list(csv.reader([s]))
| >>cols
[['123', 'a,b,c""', '5.640']]
# maybe we need a bit more:
| >>cols = list(csv.reader([s]))[0]
| >>cols
['123', 'a,b,c""', '5.640']

I'd guess that the OP is expecting 'a,b,"c"' for the second field.

Twiddling with the knobs doesn't appear to help:

| >>list(csv.reader([s], escapechar='\\'))[0]
['123', 'a,b,c""', '5.640']
| >>list(csv.reader([s], escapechar='\\', doublequote=False))[0]
['123', 'a,b,c""', '5.640']

Looks like a bug to me; AFAICT from the docs, the last attempt should
have worked.
Given Peter Otten's post, looks like
(1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
escapechar in my first twiddle, which should give the same result as
Peter's.
(2)
| >>csv.excel.doublequote
True
According to my reading of the docs:
"""
doublequote
Controls how instances of quotechar appearing inside a field should be
themselves be quoted. When True, the character is doubled. When False,
the escapechar is used as a prefix to the quotechar. It defaults to
True.
"""
Peter's example should not have worked.

Nov 16 '06 #7

P: n/a
John Machin wrote:
Given Peter Otten's post, looks like
(1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
escapechar in my first twiddle, which should give the same result as
Peter's.
(2)
| >>csv.excel.doublequote
True
According to my reading of the docs:
"""
doublequote
Controls how instances of quotechar appearing inside a field should be
themselves be quoted. When True, the character is doubled. When False,
the escapechar is used as a prefix to the quotechar. It defaults to
True.
"""
Peter's example should not have worked.
the documentation also mentions a "quoting" parameter that "controls
when quotes should be generated by the writer and recognised by the
reader.". not sure how that changes things.

anyway, it's either unclear documentation or a bug in the code. better
submit a bug report so someone can fix one of them.

</F>

Nov 16 '06 #8

P: n/a

John Machin wrote:
John Machin wrote:
Fredrik Lundh wrote:
robert wrote:
>
What is a most simple expression for splitting a CSV line
with "-protected fields?

s='"123","a,b,\"c\"",5.640'
>
import csv
>
the preferred way is to read the file using that module. if you insist
on processing a single line, you can do
>
cols = list(csv.reader([string]))
>
</F>
Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
| >>import csv
| >>s='"123","a,b,\"c\"",5.640'
| >>cols = list(csv.reader([s]))
| >>cols
[['123', 'a,b,c""', '5.640']]
# maybe we need a bit more:
| >>cols = list(csv.reader([s]))[0]
| >>cols
['123', 'a,b,c""', '5.640']

I'd guess that the OP is expecting 'a,b,"c"' for the second field.

Twiddling with the knobs doesn't appear to help:

| >>list(csv.reader([s], escapechar='\\'))[0]
['123', 'a,b,c""', '5.640']
| >>list(csv.reader([s], escapechar='\\', doublequote=False))[0]
['123', 'a,b,c""', '5.640']

Looks like a bug to me; AFAICT from the docs, the last attempt should
have worked.

Given Peter Otten's post, looks like
(1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
escapechar in my first twiddle, which should give the same result as
Peter's.
(2)
| >>csv.excel.doublequote
True
According to my reading of the docs:
"""
doublequote
Controls how instances of quotechar appearing inside a field should be
themselves be quoted. When True, the character is doubled. When False,
the escapechar is used as a prefix to the quotechar. It defaults to
True.
"""
Peter's example should not have worked.
Doh. The OP's string was a raw string. I need some sleep.
Scrap bug #1!

| >>s=r'"123","a,b,\"c\"",5.640'
| >>list(csv.reader([s]))[0]
['123', 'a,b,\\c\\""', '5.640']
# What's that???
| >>list(csv.reader([s], escapechar='\\'))[0]
['123', 'a,b,"c"', '5.640']
| >>list(csv.reader([s], escapechar='\\', doublequote=False))[0]
['123', 'a,b,"c"', '5.640']

And there's still the problem with doublequote ....

Goodnight ...

Nov 16 '06 #9

P: n/a
John Machin wrote:
| >>s='"123","a,b,\"c\"",5.640'
Note how I fixed the input:
>>'"123","a,b,\"c\"",5.640'
'"123","a,b,"c"",5.640'
>>'"123","a,b,\\"c\\"",5.640'
'"123","a,b,\\"c\\"",5.640'

Peter
Nov 16 '06 #10

P: n/a

Fredrik Lundh wrote:
John Machin wrote:
Given Peter Otten's post, looks like
(1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
escapechar in my first twiddle, which should give the same result as
Peter's.
(2)
| >>csv.excel.doublequote
True
According to my reading of the docs:
"""
doublequote
Controls how instances of quotechar appearing inside a field should be
themselves be quoted. When True, the character is doubled. When False,
the escapechar is used as a prefix to the quotechar. It defaults to
True.
"""
Peter's example should not have worked.

the documentation also mentions a "quoting" parameter that "controls
when quotes should be generated by the writer and recognised by the
reader.". not sure how that changes things.
Hi Fredrik, I read that carefully -- "quoting" appears to have no
effect in this situation.
>
anyway, it's either unclear documentation or a bug in the code. better
submit a bug report so someone can fix one of them.
Tomorrow :-)
Cheers,
John

Nov 16 '06 #11

This discussion thread is closed

Replies have been disabled for this discussion.