473,320 Members | 1,719 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

csv module strangeness.

I'm trying to create a cvs.reader object using a custom dialect.

The docs are a little terse, but I gather that I am supposed
to subclass cvs.Dialect:

class dialect(csv.Dialect):
pass

Now, the docs say that all of the attributes have reasonable
defaults, but instantiating the above gives:

Traceback (most recent call last):
File "<stdin>", line 15, in ?
File "/usr/local/lib/python2.4/csv.py", line 39, in __init__
raise Error, "Dialect did not validate: %s" % ", ".join(errors)
_csv.Error: Dialect did not validate: delimiter character not set, quotechar not set, lineterminator not set, doublequote parameter must be True or False, skipinitialspace parameter must be True or False, quoting parameter not set

So I look at the source. The Dialect class is very simple,
and starts with:

class Dialect:
_name = ""
_valid = False
# placeholders
delimiter = None
quotechar = None
escapechar = None
doublequote = None
skipinitialspace = None
lineterminator = None
quoting = None

So, it's no wonder that it fails its validate() call.
The only thing that I can think of to do is to set
these on the class itself before instantiation:

###############################################
import csv

class dialect(csv.Dialect):
pass

dialect.delimiter = "\t"
dialect.quotechar = '"'
dialect.lineterminator = "\n"
dialect.doublequote = True
dialect.skipinitialspace = True
dialect.quoting = csv.QUOTE_MINIMAL

d = dialect()

reader = csv.reader(open('list.csv'))
for row in reader:
print row
###############################################

This runs, but the delimiter is still the comma.
When list.csv is comma delim, it works correctly,
but when list.csv has tab separated values, I
get back a single field with the entire line in
it.

I suppose I must be doing something horribly wrong.

Thanks,

Tobiah

--
Posted via a free Usenet account from http://www.teranews.com

Aug 30 '06 #1
16 2551
Ok, I'm an idiot. I didn't even pass my dialect
object to the reader() call.

So now it works, but it is still strange about
the absent defaults.

Tobiah

This runs, but the delimiter is still the comma.
When list.csv is comma delim, it works correctly,
but when list.csv has tab separated values, I
get back a single field with the entire line in it.

I suppose I must be doing something horribly wrong.

Thanks,

Tobiah
--
Posted via a free Usenet account from http://www.teranews.com

Aug 30 '06 #2
In <44**********************@free.teranews.com>, tobiah wrote:
I'm trying to create a cvs.reader object using a custom dialect.

The docs are a little terse, but I gather that I am supposed
to subclass cvs.Dialect:

class dialect(csv.Dialect):
pass

Now, the docs say that all of the attributes have reasonable
defaults, but instantiating the above gives:

Traceback (most recent call last):
File "<stdin>", line 15, in ?
File "/usr/local/lib/python2.4/csv.py", line 39, in __init__
raise Error, "Dialect did not validate: %s" % ", ".join(errors)
_csv.Error: Dialect did not validate: delimiter character not set, quotechar not set, lineterminator not set, doublequote parameter must be True or False, skipinitialspace parameter must be True or False, quoting parameter not set

So I look at the source. The Dialect class is very simple,
and starts with:

class Dialect:
_name = ""
_valid = False
# placeholders
delimiter = None
quotechar = None
escapechar = None
doublequote = None
skipinitialspace = None
lineterminator = None
quoting = None

So, it's no wonder that it fails its validate() call.
The only thing that I can think of to do is to set
these on the class itself before instantiation:

###############################################
import csv

class dialect(csv.Dialect):
pass

dialect.delimiter = "\t"
dialect.quotechar = '"'
dialect.lineterminator = "\n"
dialect.doublequote = True
dialect.skipinitialspace = True
dialect.quoting = csv.QUOTE_MINIMAL
That's possible but why didn't you follow the way `csv.Dialect` set the
class attributes?

class MyDialect(csv.Dialect):
delimiter = '\t'
lineterminator = '\n'
# and so on…

Ciao,
Marc 'BlackJack' Rintsch
Aug 30 '06 #3
>
That's possible but why didn't you follow the way `csv.Dialect` set the
class attributes?

class MyDialect(csv.Dialect):
delimiter = '\t'
lineterminator = '\n'
# and so on…
Because I'm hung over.

--
Posted via a free Usenet account from http://www.teranews.com

Aug 30 '06 #4
tobiah wrote:
The docs are a little terse, but I gather that I am supposed
to subclass cvs.Dialect:

class dialect(csv.Dialect):
pass

Now, the docs say that all of the attributes have reasonable
defaults, but instantiating the above gives:
you may be misreading the docs; the Dialect has no values at all, and
must be subclassed (and the subclass must provide settings). The
easiest way to do get reasonable defaults is to subclass an existing
dialect class, such as csv.excel:

class dialect(csv.excel):
...
The only thing that I can think of to do is to set
these on the class itself before instantiation:
the source code for the Dialect class that you posted shows how to set
class attributes; simple assign them inside the class statement!

class dialect(csv.excel):
# like excel, but with a different delimiter
delimiter = "|"

you must also remember to pass the dialect to the reader:

reader = csv.reader(open('list.csv'), dialect)
for row in reader:
print row

note that you don't really have to create an instance; the reader
expects an object with a given set of attributes, and the class object
works as well as an instance of the same class.

</F>

Aug 30 '06 #5
you may be misreading the docs; the Dialect has no values at all, and
must be subclassed (and the subclass must provide settings).
The docs clearly state what the defaults are, but they are not
in the code. It seems so clumsy to have to specify every one
of these, just to change the delimiter from comma to tab.

http://docs.python.org/lib/csv-fmt-params.html :

delimiter
A one-character string used to separate fields. It defaults to ','.

doublequote
Controls how instances of quotechar appearing inside a field should be themselves be quoted. When True, the character is doubled. When False, the escapechar must be a one-character string which is used as a prefix to the quotechar. It defaults to True.

escapechar
A one-character string used to escape the delimiter if quoting is set to QUOTE_NONE. It defaults to None.

lineterminator
The string used to terminate lines in the CSV file. It defaults to '\r\n'.

quotechar
A one-character string used to quote elements containing the delimiter or which start with the quotechar. It defaults to '"'.

quoting
Controls when quotes should be generated by the writer. It can take on any of the QUOTE_* constants (see section 12.20.1) and defaults to QUOTE_MINIMAL.

skipinitialspace
When True, whitespace immediately following the delimiter is ignored. The default is False.

--
Posted via a free Usenet account from http://www.teranews.com

Aug 30 '06 #6
tobiah wrote:
>you may be misreading the docs; the Dialect has no values at all, and
must be subclassed (and the subclass must provide settings).

The docs clearly state what the defaults are, but they are not
in the code. It seems so clumsy to have to specify every one
of these, just to change the delimiter from comma to tab.

http://docs.python.org/lib/csv-fmt-params.html :

delimiter
A one-character string used to separate fields. It defaults to ','.
Note that you need not bother with a dialect class just to change the
delimiter:
>>import csv
from cStringIO import StringIO
instream = StringIO(
.... "alpha\tbeta\tgamma\r\n"
.... "one\ttoo\ttree\r\n")
>>for row in csv.reader(instream, delimiter="\t"):
.... print row
....
['alpha', 'beta', 'gamma']
['one', 'too', 'tree']

Peter
Aug 30 '06 #7

tobiahSo now it works, but it is still strange about the absent
tobiahdefaults.

The csv.Dialect class is essentially pure abstract. Most of the time I
subclass csv.excel and just change the one or two things I need.

Skip
Aug 30 '06 #8
>>>for row in csv.reader(instream, delimiter="\t"):
Awesome. Thanks.

--
Posted via a free Usenet account from http://www.teranews.com

Aug 30 '06 #9
tobiah wrote:
>
The docs clearly state what the defaults are, but they are not
in the code. It seems so clumsy to have to specify every one
of these, just to change the delimiter from comma to tab.
That particular case is handled by the built-in (but cunningly
concealed) 'excel-tab' class:
|>>import csv
|>>csv.list_dialects()
['excel-tab', 'excel']
|>>td = csv.get_dialect('excel-tab')
|>>dir(td)
['__doc__', '__init__', '__module__', '_name', '_valid', '_validate',
'delimiter', 'doublequote', 'escapechar', 'lineterminator',
'quotechar', 'quoting', 'skipinitialspace']
|>>td.delimiter
'\t'

However, more generally, the docs also clearly state that "In addition
to, or instead of, the dialect parameter, the programmer can also
specify individual formatting parameters, which have the same names as
the attributes defined below for the Dialect class."

In practice, using a Dialect class would be a rather rare occurrence.

E.g. here's the guts of the solution to the "fix a csv file by
rsplitting one column" problem, using the "quoting" attribute on the
assumption that the solution really needs those usually redundant
quotes:

import sys, csv

def fix(inf, outf, fixcol):
wtr = csv.writer(outf, quoting=csv.QUOTE_ALL)
for fields in csv.reader(inf):
fields[fixcol:fixcol+1] = fields[fixcol].rsplit(None, 1)
wtr.writerow(fields)

if __name__ == "__main__":
av = sys.argv
fix(open(av[1], 'rb'), open(av[2], 'wb'), int(av[3]))

HTH,
John

Aug 30 '06 #10
However, more generally, the docs also clearly state that "In addition
to, or instead of, the dialect parameter, the programmer can also
specify individual formatting parameters, which have the same names as
the attributes defined below for the Dialect class."
I definitely missed that. Knowing that, I don't think I will ever need the Dialect
class, but I still think that the docs for the Dialect class are broken.

--
Posted via a free Usenet account from http://www.teranews.com

Aug 30 '06 #11

tobiah wrote:
However, more generally, the docs also clearly state that "In addition
to, or instead of, the dialect parameter, the programmer can also
specify individual formatting parameters, which have the same names as
the attributes defined below for the Dialect class."

I definitely missed that. Knowing that, I don't think I will ever need the Dialect
class, but I still think that the docs for the Dialect class are broken.
FWIW, I think the whole Dialect class idea is a baroque byzantine
over-elaborated unnecessity that also happens to suffer from poor docs.
[Exit, pursued by a bear]

Aug 30 '06 #12
tobiah wrote:
>you may be misreading the docs; the Dialect has no values at all, and
must be subclassed (and the subclass must provide settings).

The docs clearly state what the defaults are, but they are not
in the code. It seems so clumsy to have to specify every one
of these, just to change the delimiter from comma to tab.

http://docs.python.org/lib/csv-fmt-params.html :
The "it defaults to" clauses should probably be seen in the context of
the two "the programmer can" sentences in the first paragraph on that
page; the default is what's used if you don't do that.

I agree that the first paragraph could need some work. Any volunteers?

</F>

Aug 31 '06 #13
>The docs clearly state what the defaults are, but they are not
>in the code. It seems so clumsy to have to specify every one of
these, just to change the delimiter from comma to tab.

http://docs.python.org/lib/csv-fmt-params.html :

The "it defaults to" clauses should probably be seen in the context of
the two "the programmer can" sentences in the first paragraph on that
page; the default is what's used if you don't do that.

I agree that the first paragraph could need some work. Any volunteers?

</F>
I agree with Henryk's evaluation, but if Dialect is to remain,
why not just fix all of the 'None' assignments in the class
definition to match the sensible defaults that are already in
the docs?

--
Posted via a free Usenet account from http://www.teranews.com

Aug 31 '06 #14
>The docs clearly state what the defaults are, but they are not
>in the code. It seems so clumsy to have to specify every one of
these, just to change the delimiter from comma to tab.

http://docs.python.org/lib/csv-fmt-params.html :

The "it defaults to" clauses should probably be seen in the context of
the two "the programmer can" sentences in the first paragraph on that
page; the default is what's used if you don't do that.

I agree that the first paragraph could need some work. Any volunteers?

</F>
I agree with Henryk's evaluation, but if Dialect is to remain,
why not just fix all of the 'None' assignments in the class
definition to match the sensible defaults that are already in
the docs?
Aug 31 '06 #15

tobiah wrote:

</F>

I agree with Henryk's evaluation
Henryk?? Have I missed a message in the thread, or has the effbot
metamorphosed into the aitchbot?

Aug 31 '06 #16
John Machin wrote:
tobiah wrote:
>></F>
I agree with Henryk's evaluation

Henryk?? Have I missed a message in the thread, or has the effbot
metamorphosed into the aitchbot?
How strange. Either my client was whacked, or I was. I was
actually referring to your "baroque byzantine over-elaborated unnecessity"
comment. Henryk was from a later thread, I guess.

--
Posted via a free Usenet account from http://www.teranews.com

Aug 31 '06 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Bo Peng | last post by:
Dear list, I am writing a Python extension module that needs a way to expose pieces of a big C array to python. Currently, I am using NumPy like the following: PyObject* res =...
5
by: dody suria wijaya | last post by:
I found this problem when trying to split a module into two. Here's an example: ============== #Module a (a.py): from b import * class Main: pass ============== ==============
2
by: Robert M. Gary | last post by:
I'm using JRE 1.5 on Solaris Japanese (Sparc). The JVM claims its default character set is EUC-JP I'm seeing two strange things when using Japanese character sets... 1) If I write a program that...
59
by: seberino | last post by:
I've heard 2 people complain that word 'global' is confusing. Perhaps 'modulescope' or 'module' would be better? Am I the first peope to have thought of this and suggested it? Is this a...
3
by: David T. Ashley | last post by:
Hi, Red Hat Enterprise Linux 4.X. I'm writing command-line PHP scripts for the first time. I get the messages below. What do they mean? Are these operating system library modules, or...
10
by: Bonzol | last post by:
vb.net Hey there, could someone just tell me what the differnce is between classes and modules and when each one would be used compared to the other? Any help would be great Thanx in...
0
by: Robin Becker | last post by:
I'm trying to understand the following strangeness C:\code\rlextra\ers>python Python 2.4.3 (#69, Mar 29 2006, 17:35:34) on win32 Type "help", "copyright", "credits" or "license" for more...
21
KevinADC
by: KevinADC | last post by:
Note: You may skip to the end of the article if all you want is the perl code. Introduction Uploading files from a local computer to a remote web server has many useful purposes, the most...
0
by: Fredrik Lundh | last post by:
Jeff Dyke wrote: so how did that processing use the "mymodulename" name? the calling method has nothing to do with what's considered to be a local variable in the method being called, so...
8
by: tow | last post by:
I have a python script (part of a django application, if it makes any difference) which is exhibiting the following behaviour: import my_module # succeeds imp.find_module("my_module") # fails,...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.