473,396 Members | 1,935 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

csv.Sniffer: wrong detection of the end of line delimiter

hello,

I'm using cvs standard module under Python 2.3 / 2.4 to read a CSV
file. The file is opened in binary mode, so I keep the end of line
terminator.

It appears that the csv.Sniffer force the line terminator to be
'\r\n'. It's fine under Windows but wrong under Linux or
Macintosh.

More about this line terminator: Potential bug in the
_guess_delimiter() method.
The first line of code does a wrong splitting:
data = filter(None, data.split('\n'))
It doesn't take care of the real line terminator!

Here is a patch (not a perfect one):
# ------- begin of patch -------
class PatchedSniffer(csv.Sniffer):

def __init__(self):
csv.Sniffer.__init__(self)
def sniff(self, p_data, p_delimiters = None):
t_dialect = csv.Sniffer.sniff(self, p_data, p_delimiters)
t_dialect.lineterminator = self._guessLineTerminator(p_data)
return t_dialect
def _guessLineTerminator(self, p_data):
for t_lineTerminator in ['\r\n', '\n', '\r']:
if t_lineTerminator in p_data:
return t_lineTerminator
else:
return '\r\n' # Windows default (Excel)
def _formatDataForGuess(self, p_data):
t_lineTerminator = self._guessLineTerminator(p_data)
return '\n'.join(p_data.split(t_lineTerminator))
def _guess_delimiter(self, p_data, p_delimiters):
t_data = self._formatDataForGuess(p_data)

(t_delimiter, t_skipInitialSpace) = \
csv.Sniffer._guess_delimiter(self, t_data, p_delimiters)

if t_delimiter == '' and '\t' in p_data:
t_delimiter = '\t'

return (t_delimiter, t_skipInitialSpace)
# ------- end of patch -------

Bye.
------- Laurent.

Dec 28 '05 #1
2 3071
Laurent Laporte wrote:
hello,

I'm using cvs standard module under Python 2.3 / 2.4 to read a CSV
file. The file is opened in binary mode, so I keep the end of line
terminator.
It's not advisable to open a file like a CSV, intended for use as text,
in binary mode.
It appears that the csv.Sniffer force the line terminator to be
'\r\n'. It's fine under Windows but wrong under Linux or
Macintosh.
Perhaps you should try opening the file in text mode, as this will
normally end up giving you a "\n" terminator on all platforms: that's
what text mode is intended to ensure, and that's probably why the csv
module assumes that splitting on "\n" is safe.
More about this line terminator: Potential bug in the
_guess_delimiter() method.
The first line of code does a wrong splitting:
data = filter(None, data.split('\n'))
It doesn't take care of the real line terminator!
[...]


I suspect it's not supposed to be trying to!

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Dec 29 '05 #2
In <ma***************************************@python. org>, Steve Holden
wrote:
Laurent Laporte wrote:
I'm using cvs standard module under Python 2.3 / 2.4 to read a CSV
file. The file is opened in binary mode, so I keep the end of line
terminator.

It's not advisable to open a file like a CSV, intended for use as text,
in binary mode.


But the docs "demand" this explicitly and all examples in the docs fulfill
that demand.

From http://docs.python.org/lib/csv-contents.html :

If csvfile is a file object, it must be opened with the 'b' flag on
platforms where that makes a difference.

I guess the reason is the same as for "text" pickle format: If you don't
use binary mode the file is not platform independend anymore because some
OSes "manipulate" the data in text mode.

Ciao,
Marc 'BlackJack' Rintsch
Dec 29 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Laurent | last post by:
Hello, is it possible to read a file in python line by line by redefining a new end-of-line delimiter ? I would like for example to have the string "END" being the new delimiter for each...
6
by: Gustav Medler | last post by:
Hello, there is a known problem with Opera and the execution of content shown in <NOSCRIPT> tag. Everythings works fine, if there is only one simple script like:...
12
by: AMC | last post by:
Hi, I need to code an asp based browser sniffer. It needs to detect if a browser can support css and if not redirect to a different site. Does anyone have sample code that does this? thx
6
by: billiejoex | last post by:
Hi! I made a little, simple program that sends strings over an ICMP packet. The source here: http://billiejoex.altervista.org/a1.txt Now all I need is create a simple network sniffer able to...
3
by: PyPK | last post by:
Does anyone know of a simple implementation of a straight line detection algorithm something like hough or anything simpler.So something like if we have a 2D arary of pixel elements representing a...
4
by: trpost | last post by:
I am looking for a script using javascript to pull browser information including, browser type and version, OS info, plugins (flash, acrobat, media player, etc), java version, etc. that will work...
61
by: jacob navia | last post by:
In the documents presented in the post Portland meeting of the C standards comitee http://www.open-std.org/jtc1/sc22/wg14/ there is a document called ISO/IEC WDTR 24731-2, Specification for...
3
by: AramAz | last post by:
Hello eveerbody, I'm trying to make a procedure that will take care of an insertion. Unfortunately I get an error saying: "Script line: 5 You have an error in your SQL syntax; check the manual...
0
by: showellshowell | last post by:
Hi everybody, I'm looking for a very simple HTTP debugging sniffer in Python-- hopefully 200 lines of less--that allows me to write simple methods to inspect requests and responses. It would...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.