Connecting Tech Pros Worldwide Help | Site Map

"UnicodeError: UTF-16 stream does not start with BOM"

 
LinkBack Thread Tools Search this Thread
  #1  
Old January 6th, 2009, 07:49 AM
Newbie
 
Join Date: Sep 2008
Posts: 10
Default "UnicodeError: UTF-16 stream does not start with BOM"

I have text file which contain Unicode data (say inp.txt)
I read file using following code:-

Expand|Select|Wrap|Line Numbers
  1. import codecs
  2. infile = codecs.open('C:\\tdata\\inp.txt','r','utf-16',errors='ignore')
  3. data = infile.readlines()
  4.  
If I run above code ... it throws following error :-
Expand|Select|Wrap|Line Numbers
  1. "Traceback (most recent call last):
  2.   File "C:\script\hypen\hyp.py", line 34, in ?
  3.     data = infile.readlines()
  4.   File "C:\Python24\lib\codecs.py", line 489, in readlines
  5.     return self.reader.readlines(sizehint)
  6.   File "C:\Python24\lib\codecs.py", line 404, in readlines
  7.     data = self.read()
  8.   File "C:\Python24\lib\codecs.py", line 293, in read
  9.     newchars, decodedbytes = self.decode(data, self.errors)
  10.   File "C:\Python24\lib\encodings\utf_16.py", line 49, in decode
  11.     raise UnicodeError,"UTF-16 stream does not start with BOM"
  12. UnicodeError: UTF-16 stream does not start with BOM"
But if I do create a new file (I did in Notepad on Win XP) and copy paste content of 'inp.txt' in it and save it as text file (choosing Unicode encoding which same as of inp.txt). Now with same above code reading this new file, it works absolutely fine. this seems weird... is notepad created file added some own magic chars :)

Can anyone help me regarding this , what can be the issue here ? . Why creating a new file and saving contents in it worked FINE while original file still throws error. (I have got such 15 localized files from clients on which some processing as to be done, I want to avoid manually copy/paste rework). Any help appreciated...


Thanks,
anil

Last edited by bvdet; January 6th, 2009 at 02:59 PM. Reason: Fixed code tag
Reply
  #2  
Old January 6th, 2009, 04:11 PM
bvdet's Avatar
Moderator
 
Join Date: Oct 2006
Location: Nashville, TN
Posts: 1,434
Default

I found information on this link helpful. Since you know your encoding is "UTF-16", you may be able to use string method decode() to read your data. Notepad adds the BOM based on the encoding selected.
Reply
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search


Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,840 network members.