By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,748 Members | 1,267 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,748 IT Pros & Developers. It's quick & easy.

Interpreting string containing \u000a

P: n/a
Hi,

I have an ISO-8859-1 file containing things like
"Hello\u000d\u000aWorld", i.e. the character '\', followed by the
character 'u' and then '0', etc.

What is the easiest way to automatically translate these codes into
unicode characters ?

Thank you

Francis Girard
Jun 27 '08 #1
Share this Question
Share on Google+
2 Replies

P: n/a
"Francis Girard" <fr**************@gmail.comwrote:
I have an ISO-8859-1 file containing things like
"Hello\u000d\u000aWorld", i.e. the character '\', followed by the
character 'u' and then '0', etc.

What is the easiest way to automatically translate these codes into
unicode characters ?
>>s = r"Hello\u000d\u000aWorld"
print s
Hello\u000d\u000aWorld
>>s.decode('iso-8859-1').decode('unicode-escape')
u'Hello\r\nWorld'
>>>
--
Duncan Booth http://kupuguy.blogspot.com
Jun 27 '08 #2

P: n/a
Francis Girard wrote:
I have an ISO-8859-1 file containing things like
"Hello\u000d\u000aWorld", i.e. the character '\', followed by the
character 'u' and then '0', etc.

What is the easiest way to automatically translate these codes into
unicode characters ?
If the file really contains the escape sequences use "unicode-escape" as the
encoding:
>>"Hello\\u000d\\u000aWorld".decode("unicode-escape")
u'Hello\r\nWorld'

If it contains the raw bytes use "iso-8859-1":
>>"Hello\x0d\x0aWorld".decode("iso-8859-1")
u'Hello\r\nWorld'

Open the file with

codecs.open(filename, encoding=encoding_as_determined_above)

instead of the builtin open().

Peter
Jun 27 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.