By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,551 Members | 1,127 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,551 IT Pros & Developers. It's quick & easy.

_csv.Error: string with NUL bytes

P: n/a
Anyone have an idea of what I might do to fix this? I have googled adn
can only find some random conversations about it that doesn't make
sense to me.

I am basically reading in a csv file to create an xml and get this
error.

I don't see any empty values in any fields or anything...

May 3 '07 #1
Share this Question
Share on Google+
9 Replies


P: n/a
fscked wrote:
Anyone have an idea of what I might do to fix this? I have googled adn
can only find some random conversations about it that doesn't make
sense to me.

I am basically reading in a csv file to create an xml and get this
error.

I don't see any empty values in any fields or anything...
You really should post some code and the actual traceback error your
get for us to help. I suspect that you have an ill-formed record in
your CSV file. If you can't control that, you may have to write your
own CSV dialect parser.

-Larry
May 3 '07 #2

P: n/a
On May 3, 9:11 am, Larry Bates <larry.ba...@websafe.comwrote:
fscked wrote:
Anyone have an idea of what I might do to fix this? I have googled adn
can only find some random conversations about it that doesn't make
sense to me.
I am basically reading in a csv file to create an xml and get this
error.
I don't see any empty values in any fields or anything...

You really should post some code and the actual traceback error your
get for us to help. I suspect that you have an ill-formed record in
your CSV file. If you can't control that, you may have to write your
own CSV dialect parser.

-Larry
Certainly, here is the code:

import os,sys
import csv
from elementtree.ElementTree import Element, SubElement, ElementTree

def indent(elem, level=0):
i = "\n" + level*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
for elem in elem:
indent(elem, level+1)
if not elem.tail or not elem.tail.strip():
elem.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i

root = Element("{Boxes}boxes")
myfile = open('test.csv', 'rb')
csvreader = csv.reader(myfile)

for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name, address,
phone, country, city, in csvreader:
mainbox = SubElement(root, "{Boxes}box")
mainbox.attrib["city"] = city
mainbox.attrib["country"] = country
mainbox.attrib["phone"] = phone
mainbox.attrib["address"] = address
mainbox.attrib["name"] = name
mainbox.attrib["pl_heartbeat"] = heartbeat
mainbox.attrib["sw_ver"] = sw_ver
mainbox.attrib["hw_ver"] = hw_ver
mainbox.attrib["date_activated"] = activated
mainbox.attrib["mac_address"] = mac
mainbox.attrib["boxid"] = boxid

indent(root)

ElementTree(root).write('test.xml', encoding='UTF-8')

The traceback is as follows:

Traceback (most recent call last):
File "createXMLPackage.py", line 35, in ?
for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name,
address, phone, country, city, in csvreader:
_csv.Error: string with NUL bytes
Exit code: 1 , 0001h

May 3 '07 #3

P: n/a
In <11**********************@o5g2000hsb.googlegroups. com>, fscked wrote:
The traceback is as follows:

Traceback (most recent call last):
File "createXMLPackage.py", line 35, in ?
for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name,
address, phone, country, city, in csvreader:
_csv.Error: string with NUL bytes
Exit code: 1 , 0001h
As Larry said, this most likely means there are null bytes in the CSV file.

Ciao,
Marc 'BlackJack' Rintsch
May 3 '07 #4

P: n/a
On May 3, 9:29 am, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
In <1178209090.674787.202...@o5g2000hsb.googlegroups. com>, fscked wrote:
The traceback is as follows:
Traceback (most recent call last):
File "createXMLPackage.py", line 35, in ?
for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name,
address, phone, country, city, in csvreader:
_csv.Error: string with NUL bytes
Exit code: 1 , 0001h

As Larry said, this most likely means there are null bytes in the CSV file.

Ciao,
Marc 'BlackJack' Rintsch
How would I go about identifying where it is?

May 3 '07 #5

P: n/a
On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
As Larry said, this most likely means there are null bytes in the CSV file.

Ciao,
Marc 'BlackJack' Rintsch

How would I go about identifying where it is?
A hex editor might be easiest.

You could also use Python:

print open("filewithnuls").read().replace("\0", ">>>NUL<<<")

Dustin
May 3 '07 #6

P: n/a
On May 3, 10:12 am, dus...@v.igoro.us wrote:
On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
As Larry said, this most likely means there are null bytes in the CSV file.
Ciao,
Marc 'BlackJack' Rintsch
How would I go about identifying where it is?

A hex editor might be easiest.

You could also use Python:

print open("filewithnuls").read().replace("\0", ">>>NUL<<<")

Dustin
Hmm, interesting if I run:

print open("test.csv").read().replace("\0", ">>>NUL<<<")

every single character gets a >>>NUL<<< between them...

What the heck does that mean?

Example, here is the first field in the csv

89114608511,

the above code produces:
>>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4 >>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1 >>>NUL<<<1>>>NUL<<<,
May 3 '07 #7

P: n/a
On Thu, May 03, 2007 at 10:28:34AM -0700, IA********@gmail.com wrote:
On May 3, 10:12 am, dus...@v.igoro.us wrote:
On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
As Larry said, this most likely means there are null bytes in the CSV file.
Ciao,
Marc 'BlackJack' Rintsch
How would I go about identifying where it is?
A hex editor might be easiest.

You could also use Python:

print open("filewithnuls").read().replace("\0", ">>>NUL<<<")

Dustin

Hmm, interesting if I run:

print open("test.csv").read().replace("\0", ">>>NUL<<<")

every single character gets a >>>NUL<<< between them...

What the heck does that mean?

Example, here is the first field in the csv

89114608511,

the above code produces:
>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4> >>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1> >>NUL<<<1>>>NUL<<<,
I'm guessing that your file is in UTF-16, then -- Windows seems to do
that a lot. It kind of makes it *not* a CSV file, but oh well. Try

print open("test.csv").decode('utf-16').read().replace("\0", ">>>NUL<<<")

I'm not terribly unicode-savvy, so I'll leave it to others to suggest a
way to get the CSV reader to handle such encoding without reading in the
whole file, decoding it, and setting up a StringIO file.

Dustin
May 3 '07 #8

P: n/a
du****@v.igoro.us wrote:
I'm guessing that your file is in UTF-16, then -- Windows seems to do
that a lot. It kind of makes it *not* a CSV file, but oh well. Try

print open("test.csv").decode('utf-16').read().replace("\0",
">>>NUL<<<")

I'm not terribly unicode-savvy, so I'll leave it to others to suggest a
way to get the CSV reader to handle such encoding without reading in the
whole file, decoding it, and setting up a StringIO file.
Not pretty, but seems to work:

from __future__ import with_statement

import csv
import codecs

def recoding_reader(stream, from_encoding, args=(), kw={}):
intermediate_encoding = "utf8"
efrom = codecs.lookup(from_encoding)
einter = codecs.lookup(intermediate_encoding)
rstream = codecs.StreamRecoder(stream, einter.encode, efrom.decode,
efrom.streamreader, einter.streamwriter)

for row in csv.reader(rstream, *args, **kw):
yield [unicode(column, intermediate_encoding) for column in row]

def main():
file_encoding = "utf16"

# generate sample data:
data = u"\xe4hnlich,\xfcblich\r\nalpha,beta\r\ngamma,delt a\r\n"
with open("tmp.txt", "wb") as f:
f.write(data.encode(file_encoding))

# read it
with open("tmp.txt", "rb") as f:
for row in recoding_reader(f, file_encoding):
print u" | ".join(row)

if __name__ == "__main__":
main()

Data from the file is recoded to UTF-8, then passed to a csv.reader() whose
output is decoded to unicode.

Peter

May 3 '07 #9

P: n/a
On May 4, 3:40 am, dus...@v.igoro.us wrote:
On Thu, May 03, 2007 at 10:28:34AM -0700, IAmStar...@gmail.com wrote:
On May 3, 10:12 am, dus...@v.igoro.us wrote:
On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
As Larry said, this most likely means there are null bytes in the CSV file.
Ciao,
Marc 'BlackJack' Rintsch
How would I go about identifying where it is?
A hex editor might be easiest.
You could also use Python:
print open("filewithnuls").read().replace("\0", ">>>NUL<<<")
Dustin
Hmm, interesting if I run:
print open("test.csv").read().replace("\0", ">>>NUL<<<")
every single character gets a >>>NUL<<< between them...
What the heck does that mean?
Example, here is the first field in the csv
89114608511,
the above code produces:
>>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4 >>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1 >>>NUL<<<1>>>NUL<<<,

I'm guessing that your file is in UTF-16, then -- Windows seems to do
that a lot.
Do what a lot? Encode data in UTF-16xE without putting in a BOM or
telling the world in some other fashion what x is? Humans seem to do
that occasionally. When they use Windows software, the result is
highly likely to be encoded in UTF-16LE -- unless of course the human
deliberately chooses otherwise (e.g. the "Unicode bigendian" option in
NotePad's "Save As" dialogue). Further, the data is likely to have a
BOM prepended.

The above is consistent with BOM-free UTF-16BE.

May 3 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.