467,894 Members | 1,752 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 467,894 developers. It's quick & easy.

Using the CSV module

Hi,

I ve been playing with the CSV module for parsing a few files. A row
in a file looks like this:

some_id\t|\tsome_data\t|t\some_more_data\t|\tlast_ data\t\n

so the lineterminator is \t\n and the delimiter is \t|\t, however when
I subclass Dialect and try to set delimiter is "\t|\t" it says
delimiter can only be a character.

I know its an easy fix to just do .strip("\t") on the output I get,
but I was wondering
a) if theres a better way of doing this when the file is actually
being parsed by the csv module
b) Why are delimiters only allowed to be one character in length.

Many Thanks in advance
Nathan
May 9 '07 #1
  • viewed: 1046
Share:
1 Reply
On May 9, 6:40 pm, "Nathan Harmston" <ratchetg...@googlemail.com>
wrote:
Hi,

I ve been playing with the CSV module for parsing a few files. A row
in a file looks like this:

some_id\t|\tsome_data\t|t\some_more_data\t|\tlast_ data\t\n

so the lineterminator is \t\n and the delimiter is \t|\t, however when
I subclass Dialect and try to set delimiter is "\t|\t" it says
delimiter can only be a character.

I know its an easy fix to just do .strip("\t") on the output I get,
but I was wondering
a) if theres a better way of doing this when the file is actually
being parsed by the csv module
No; usually one would want at least to do .strip() on each field
anyway to remove *all* leading and trailing whitespace. Replacing
multiple whitespace characters with one space is often a good idea.
One may want to get fancier and ensure that NO-BREAK SPACE aka &nbsp;
(\xA0 in many encodings) is treated as whitespace.

So your gloriously redundant tabs vanish, for free.
b) Why are delimiters only allowed to be one character in length.
Speed. The reader is a hand-crafted finite-state machine designed to
operate on a byte at a time. Allowing for variable-length delimiters
would increase the complexity and lower the speed -- for what gain?
How often does one see 2-byte or 3-byte delimiters?

May 9 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by David | last post: by
13 posts views Thread by Bijoy Naick | last post: by
2 posts views Thread by Martin v. Lwis | last post: by
5 posts views Thread by pyapplico | last post: by
reply views Thread by MrMoon | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.