473,467 Members | 1,455 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Alternatives for the CSV module

-
I am going to make a program that reads files with different
csv-dialects. Sometimes the field-separator or line-separator can be
more than one character. The standard CSV module in Python 2.3 is not
a good solution, because it expects single characters.

Example of a file

"ABC"<>"DEF"""<>"GHI"¤¤123<>456<>"XYZ"¤¤

Here the field delimiter is "<>" and the "line" terminator "¤¤".
Fields can be enclosed in quotes, and a double qoute is treated as
normal text.

This is not the only format the parser can expect. The format is given
to the program by the user, so the program should have no problems
parsing the text. An ideal solution would be a similar parser to the
standard CSV-parser, except that it accepts strings as delimiters.

I could always manipulate the input file and replace the delimiters by
single characters, but I would like a more generic solution.

SimpleParse (http://simpleparse.sourceforge.net/) looks like a good
alternative. It doesn't support Unicode, but most most files can be
converted to ISO-8859-1 first.

Would SimpleParse be suitable for this purpose, or are there better
alternatives out there, like a more flexible CSV-parser?
Jul 18 '05 #1
2 4295
ma***********@hotmail.com (-) writes:
I am going to make a program that reads files with different
csv-dialects. Sometimes the field-separator or line-separator can be
more than one character. The standard CSV module in Python 2.3 is not
a good solution, because it expects single characters.

[...]

Dunno if it's any good to you, but there's one called DSV.
John
Jul 18 '05 #2
I am going to make a program that reads files with different
csv-dialects. Sometimes the field-separator or line-separator can be
more than one character. The standard CSV module in Python 2.3 is not
a good solution, because it expects single characters.
Well, I might disagree with you there. By all reasonable accounts,
delimited files containing multi-character delimiters are not CSV files, at
least not as operationally defined by Excel (which I mention only because
it's probably the largest producer and consumer of such files).
Example of a file "ABC"<>"DEF"""<>"GHI"¤¤123<>456<>"XYZ"¤¤ Here the field delimiter is "<>" and the "line" terminator "¤¤".
Fields can be enclosed in quotes, and a double qoute is treated as
normal text. This is not the only format the parser can expect. The format is
given to the program by the user, so the program should have no
problems parsing the text. An ideal solution would be a similar
parser to the standard CSV-parser, except that it accepts strings as
delimiters. I could always manipulate the input file and replace the delimiters
by single characters, but I would like a more generic solution.


That's pretty generic. How about this (untested):

class DelimitedFile:
def __init__(self, fname, mode='rb', ind=',', outd=','):
self.f = open(fname, mode)
self.ind = ind
self.outd = outd

def __iter__(self):
return self

def next(self):
line = self.f.next()
return line.replace(self.ind, self.outd)

Use it like so:

import csv

class d(csv.Excel):
delimiter = '\001'
lineterminator = '¤¤'

reader = csv.reader(DelimitedFile(fname, ind='<>', outd='\001'),
dialect=d)

for row in reader:
print row

The goal is of course to pick a delimiter which won't appear in the file,
hence the Ctl-A.

Skip
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Ben Finney | last post by:
Howdy all, In searching for tools to run automated regression tests against a Web aplication, I found (amid a number of tools requiring non-free Java) the WebUnit code: ...
31
by: CYBER | last post by:
Hello Is there any other way under python to create blocks ?? instead of def sth(x): return x
7
by: Antoon Pardon | last post by:
I'm writing a little game, a gridler application, where you can turn pixmaps into puzzle's and try to solve them. I already have the data structure for such a puzzle worked out, one of the...
43
by: Steven T. Hatton | last post by:
Now that I have a better grasp of the scope and capabilities of the C++ Standard Library, I understand that products such as Qt actually provide much of the same functionality through their own...
11
by: binarybana | last post by:
After recently getting excited about the possibilities that stackless python has to offer (http://harkal.sylphis3d.com/2005/08/10/multithreaded-game-scripting-with-stackless-python/) and then...
5
by: WindAndWaves | last post by:
Hi Team The function below searches all the tables in a database. However, if subsearch = true then it searches all the objects listed in a recordset (which are all table names). I thought to...
6
by: greek_bill | last post by:
Hi, I'm interested in developing an application that needs to run on more than one operating system. Naturally, a lot of the code will be shared between the various OSs, with OS specific...
1
by: Spencer | last post by:
Our Mainframe DBA insists that the IDTHTOIN parameter be set to 600 so that all idle threads timeout after 10 minutes. This is causing a particular packaged application that expects to hold idle...
3
by: =?Utf-8?B?bXVzb3NkZXY=?= | last post by:
Hi guys I've used an Application_BeginRequest function in my global.asax page to implement some URL rewriting functionality on our website. However, upon moving it to my host (1&1.co.uk), it...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.