By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
462,951 Members | 761 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 462,951 IT Pros & Developers. It's quick & easy.

"UnicodeError: UTF-16 stream does not start with BOM"

P: 10
I have text file which contain Unicode data (say inp.txt)
I read file using following code:-

Expand|Select|Wrap|Line Numbers
  1. import codecs
  2. infile ='C:\\tdata\\inp.txt','r','utf-16',errors='ignore')
  3. data = infile.readlines()
If I run above code ... it throws following error :-
Expand|Select|Wrap|Line Numbers
  1. "Traceback (most recent call last):
  2.   File "C:\script\hypen\", line 34, in ?
  3.     data = infile.readlines()
  4.   File "C:\Python24\lib\", line 489, in readlines
  5.     return self.reader.readlines(sizehint)
  6.   File "C:\Python24\lib\", line 404, in readlines
  7.     data =
  8.   File "C:\Python24\lib\", line 293, in read
  9.     newchars, decodedbytes = self.decode(data, self.errors)
  10.   File "C:\Python24\lib\encodings\", line 49, in decode
  11.     raise UnicodeError,"UTF-16 stream does not start with BOM"
  12. UnicodeError: UTF-16 stream does not start with BOM"
But if I do create a new file (I did in Notepad on Win XP) and copy paste content of 'inp.txt' in it and save it as text file (choosing Unicode encoding which same as of inp.txt). Now with same above code reading this new file, it works absolutely fine. this seems weird... is notepad created file added some own magic chars :)

Can anyone help me regarding this , what can be the issue here ? . Why creating a new file and saving contents in it worked FINE while original file still throws error. (I have got such 15 localized files from clients on which some processing as to be done, I want to avoid manually copy/paste rework). Any help appreciated...

Jan 6 '09 #1
Share this Question
Share on Google+
1 Reply

Expert Mod 2.5K+
P: 2,851
I found information on this link helpful. Since you know your encoding is "UTF-16", you may be able to use string method decode() to read your data. Notepad adds the BOM based on the encoding selected.
Jan 6 '09 #2

Post your reply

Sign in to post your reply or Sign up for a free account.