473,396 Members | 2,013 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

"UnicodeError: UTF-16 stream does not start with BOM"

10
I have text file which contain Unicode data (say inp.txt)
I read file using following code:-

Expand|Select|Wrap|Line Numbers
  1. import codecs
  2. infile = codecs.open('C:\\tdata\\inp.txt','r','utf-16',errors='ignore')
  3. data = infile.readlines()
  4.  
If I run above code ... it throws following error :-
Expand|Select|Wrap|Line Numbers
  1. "Traceback (most recent call last):
  2.   File "C:\script\hypen\hyp.py", line 34, in ?
  3.     data = infile.readlines()
  4.   File "C:\Python24\lib\codecs.py", line 489, in readlines
  5.     return self.reader.readlines(sizehint)
  6.   File "C:\Python24\lib\codecs.py", line 404, in readlines
  7.     data = self.read()
  8.   File "C:\Python24\lib\codecs.py", line 293, in read
  9.     newchars, decodedbytes = self.decode(data, self.errors)
  10.   File "C:\Python24\lib\encodings\utf_16.py", line 49, in decode
  11.     raise UnicodeError,"UTF-16 stream does not start with BOM"
  12. UnicodeError: UTF-16 stream does not start with BOM"
But if I do create a new file (I did in Notepad on Win XP) and copy paste content of 'inp.txt' in it and save it as text file (choosing Unicode encoding which same as of inp.txt). Now with same above code reading this new file, it works absolutely fine. this seems weird... is notepad created file added some own magic chars :)

Can anyone help me regarding this , what can be the issue here ? . Why creating a new file and saving contents in it worked FINE while original file still throws error. (I have got such 15 localized files from clients on which some processing as to be done, I want to avoid manually copy/paste rework). Any help appreciated...


Thanks,
anil
Jan 6 '09 #1
1 34928
bvdet
2,851 Expert Mod 2GB
I found information on this link helpful. Since you know your encoding is "UTF-16", you may be able to use string method decode() to read your data. Notepad adds the BOM based on the encoding selected.
Jan 6 '09 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

6
by: nico | last post by:
In my python scripts, I use a lot of accented characters as I work in french. In order to do this, I put the line # -*- coding: UTF-8 -*- at the beginning of the script file. Then, when I need...
3
by: Mike Kennedy | last post by:
I have an XML Snippet <?xml version="1.0" encoding="UTF-8"?> and when I convert the entire xml file to a DOM and then generate a new file from the DOM, results in <?xml version="1.0"?>. Any...
4
by: peter | last post by:
When I transform a dataset to a xml file by applying a xslt file, why by default, a "encoding="utf-8" ?" is added to the output file?How to get rid of it? Thanks
3
by: Christian Nunciato | last post by:
Hi all: I've read through the various related posts in this forum, but without any success as yet. I've got an ASP.NET application built in VS.NET 2003, and I'm trying to get the Armenian...
12
by: Mark | last post by:
In our web.config, we have changed the first line below to look like the second: OLD: <globalization requestEncoding="utf-8" responseEncoding="utf-8" /> NEW: <globalization...
1
by: Daniel | last post by:
how to parse <?xml version="1.0" encoding="UTF-8"?> with xpath? is it possible?
1
by: NevilleDNZ | last post by:
Hi, Apologies first as I am not a unicode expert.... indeed I the details probably totally elude me. Not withstanding: how can I convert a binary string containing UTF-8 binary into a python...
6
by: Flavio | last post by:
Hi I am havin a problem with urllib2.urlopen. I get this error when I try to pass a unicode to it. raise UnicodeError, "label too long" is this problem avoidable? no browser or programs such...
2
by: ashwinij | last post by:
Hello The steps which i am doing in my program 1) I am having an xml file. 2) I am performing some updations in the file using XQueryUtil class from nux package. 3)After that i am...
3
by: robert.szczepanski | last post by:
Hi Everybody Why the following script return NULL? this is the following: <?php var_dump ( setlocale(LC_ALL, "pl") ); ?>
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.