By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,504 Members | 1,190 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,504 IT Pros & Developers. It's quick & easy.

converting octal strings to unicode

P: n/a
I have several ascii files that contain '\ooo' strings which represent
the octal value for a character. I want to convert these files to
unicode, and I came up with the following script. But it seems to me
that there must be a much simpler way to do it. Could someone more
experienced suggest some improvements?

I want to convert a file eg. containing:

hello \326du

with the unicode file containing:

hello Ödu
----------8<---------------------------------------
#!/usr/bin/python

import re, string, sys

if len(sys.argv) > 1:
file = open(sys.argv[1],'r')
lines = file.readlines()
file.close()
else:
print "give a filename"
sys.exit()

def to_unichr(str):
oct = string.atoi(str.group(1),8)
return unichr(oct)

for line in lines:
line = string.rstrip(unicode(line,'Latin-1'))
if re.compile(r'\\\d\d\d').search(line):
line = re.sub(r'\\(\d\d\d)', to_unichr, line)
line = line.encode('utf-8')
print line

----------8<---------------------------------------

Jul 18 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
On 23 Dec 2004 18:41:57 -0800, rumours say that fl************@gmail.com
might have written:
I have several ascii files that contain '\ooo' strings which represent
the octal value for a character. I want to convert these files to
unicode, and I came up with the following script. But it seems to me
that there must be a much simpler way to do it. Could someone more
experienced suggest some improvements?


decoded_string = "\326du".decode("string_escape")
unicode_text = unicode(decoded_string, "latin-1")
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #2

P: n/a
On 23 Dec 2004 18:41:57 -0800, rumours say that fl************@gmail.com
might have written:
I have several ascii files that contain '\ooo' strings which represent
the octal value for a character. I want to convert these files to
unicode, and I came up with the following script. But it seems to me
that there must be a much simpler way to do it. Could someone more
experienced suggest some improvements?


(hope I cancelled the previous off-by-one-backslash post...)

your_string = "\\326du"
decoded_string = your_string.decode("string_escape")
unicode_text = unicode(decoded_string, "latin-1")
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.