473,398 Members | 2,165 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

Q: a simple(?) raw-utf-8 conversion to internal type unicode "\304\246\311\231\316\257\316\271\303\222"

Hi,

Apologies first as I am not a unicode expert.... indeed I the details
probably totally elude me. Not withstanding: how can I convert a
binary string containing UTF-8 binary into a python unicode string?

cutdown example:
$ cat ./uc.py
#!/usr/bin/env python
imported="\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220"
print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :-)" # xterm
encoding if UTF8
print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test"
+"\N{runic cross punctuation}","AOK :-)"
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("

$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-)
German/ALCOR quoting: *test* AOK :-)
German/ALCOR quoting:
Traceback (most recent call last):
File "./uc.py", line 5, in <module>
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)

The last print statement fails because the ascii "imported" characters
are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that
it is actually already UTF-8 unicode?

Cheers
NevilleDNZ

Jan 1 '07 #1
1 2562
It was just TOO easy... on posting my message to google groups, and
when I re-read the posting on groups I found that google had pointed me
to a python-unicode tutorial...
http://www.reportlab.com/i18n/python..._tutorial.html - exercise one :-)

Gosh sometime a google is worth so much more then ₁₀¹⁰⁰!

Happy New Year
NevilleD

It works now:
$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-)
German/ALCOR quoting: *test* AOK :-)
German/ALCOR quoting: *ĦəίιÒ ώσŔĹĐ* FAILS :-(
nevilled@alfa:/root0/home/nevilled/Project/20 $ vi ./uc.py
nevilled@alfa:/root0/home/nevilled/Project/20 $ cat ./uc.py
#!/usr/bin/env python
imported=unicode("\304\246\311\231\316\257\316\271 \303\222
\317\216\317\203\305\224\304\271\304\220","utf-8")
print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :-)" # xterm
encoding if UTF8
print "German/ALCOR quoting:",u"\N{runic cross punctuation}test\N{runic
cross punctuation}","AOK :-)"
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","Just TOO easy
:-)"

$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-)
German/ALCOR quoting: *test* AOK :-)
German/ALCOR quoting: *ĦəίιÒ ώσŔĹĐ* Just TOO easy :-)

NevilleDNZ wrote:
Hi,

Apologies first as I am not a unicode expert.... indeed I the details
probably totally elude me. Not withstanding: how can I convert a
binary string containing UTF-8 binary into a python unicode string?

cutdown example:
$ cat ./uc.py
#!/usr/bin/env python
imported="\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220"
print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :-)" # xterm
encoding if UTF8
print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test"
+"\N{runic cross punctuation}","AOK :-)"
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("

$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-)
German/ALCOR quoting: *test* AOK :-)
German/ALCOR quoting:
Traceback (most recent call last):
File "./uc.py", line 5, in <module>
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)

The last print statement fails because the ascii "imported" characters
are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that
it is actually already UTF-8 unicode?

Cheers
NevilleDNZ
Jan 1 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: GRoll21 | last post by:
In this program, it is going to change a value in the array. So to give it it's new value I need to assign it the new value. It doesn't like the conversion. Here is the error. ...
8
by: Steve Wasser | last post by:
I'm pulling SQL data into a dataset to be used to perform some math against. I asked this question earlier, but the answer someone gave me left me with further questions. The SQL data is stored as...
2
by: ibiza | last post by:
Hi all, I'm using a business logic layer as described in this source code : ...
2
by: Chris | last post by:
Hi again, I want to read all the records of a table with 2 fields. The problem is that some records have null value in the second field. This code below works when all records have both fields...
0
by: kenny | last post by:
Hello, I have a problem converting a hexadecimal string to unicode text and save it to a file. I do not know how to make it, so I ask here... And I had much problems replacing unicode strings...
8
by: ma740988 | last post by:
Data stored on a storage device is byte swapped. The data is big endian and my PC is little. At issue: There's a composite type ( a header ) at the front of the files that I'm trying to read in....
1
by: ankushganatra | last post by:
Hi.... Can you please explain me the difference between type conversion and type casting...?
3
by: Sep410 | last post by:
Hi all, Please help me in this code: txtLastNameFM.Text = IIf(dgFamilyMember.CurrentRow.Cells(3).Equals(System.DBNull.Value), "", dgFamilyMember.CurrentRow.Cells(3).Value) this is an error I...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.