Connecting Tech Pros Worldwide Help | Site Map

Utilizing unicode strings

Newbie
 
Join Date: Mar 2008
Posts: 10
#1: May 10 '08
Ok, I am having some trouble handling unicode strings. Let's say that the file example.txt contains the word Ångström. When I put the u in front of '%s' I'll get the error below. Without the u, the text will not show up properly.

Expand|Select|Wrap|Line Numbers
  1. import re
  2. inputfile = file('C:/example.txt', 'r')
  3. inputfile = inputfile.read()
  4. patt = re.compile(r'(.*)')
  5. m = patt.search(inputfile)
  6. print u'%s' % m.group(1)
  7. Traceback (most recent call last):
  8.   File "<pyshell#8>", line 1, in <module>
  9.     print u'%s' % m.group(1)
  10. UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 2: ordinal not in range(128)
  11.  
Expert
 
Join Date: Sep 2007
Posts: 856
#2: May 10 '08

re: Utilizing unicode strings


u%s does not mean a Unicode string. To print Unicode, read this article, which, while old, should still work.
Newbie
 
Join Date: Mar 2008
Posts: 10
#3: May 26 '08

re: Utilizing unicode strings


Quote:

Originally Posted by Laharl

u%s does not mean a Unicode string. To print Unicode, read this article, which, while old, should still work.

So if I use

Expand|Select|Wrap|Line Numbers
  1. outputFile= codecs.open('outputFile.txt', 'w', 'utf-8')
how do I then implement newline, \n, ?
Reply