472,328 Members | 1,522 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,328 software developers and data experts.

Q: a simple(?) raw-utf-8 conversion to internal type unicode "\304\246\311\231\316\257\316\271\303\222"

Hi,

Apologies first as I am not a unicode expert.... indeed I the details
probably totally elude me. Not withstanding: how can I convert a
binary string containing UTF-8 binary into a python unicode string?

cutdown example:
$ cat ./uc.py
#!/usr/bin/env python
imported="\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220"
print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :-)" # xterm
encoding if UTF8
print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test"
+"\N{runic cross punctuation}","AOK :-)"
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("

$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-)
German/ALCOR quoting: *test* AOK :-)
German/ALCOR quoting:
Traceback (most recent call last):
File "./uc.py", line 5, in <module>
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)

The last print statement fails because the ascii "imported" characters
are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that
it is actually already UTF-8 unicode?

Cheers
NevilleDNZ

Jan 1 '07 #1
1 2442
It was just TOO easy... on posting my message to google groups, and
when I re-read the posting on groups I found that google had pointed me
to a python-unicode tutorial...
http://www.reportlab.com/i18n/python..._tutorial.html - exercise one :-)

Gosh sometime a google is worth so much more then ₁₀¹⁰⁰!

Happy New Year
NevilleD

It works now:
$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-)
German/ALCOR quoting: *test* AOK :-)
German/ALCOR quoting: *ĦəίιÒ ώσŔĹĐ* FAILS :-(
nevilled@alfa:/root0/home/nevilled/Project/20 $ vi ./uc.py
nevilled@alfa:/root0/home/nevilled/Project/20 $ cat ./uc.py
#!/usr/bin/env python
imported=unicode("\304\246\311\231\316\257\316\271 \303\222
\317\216\317\203\305\224\304\271\304\220","utf-8")
print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :-)" # xterm
encoding if UTF8
print "German/ALCOR quoting:",u"\N{runic cross punctuation}test\N{runic
cross punctuation}","AOK :-)"
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","Just TOO easy
:-)"

$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-)
German/ALCOR quoting: *test* AOK :-)
German/ALCOR quoting: *ĦəίιÒ ώσŔĹĐ* Just TOO easy :-)

NevilleDNZ wrote:
Hi,

Apologies first as I am not a unicode expert.... indeed I the details
probably totally elude me. Not withstanding: how can I convert a
binary string containing UTF-8 binary into a python unicode string?

cutdown example:
$ cat ./uc.py
#!/usr/bin/env python
imported="\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220"
print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :-)" # xterm
encoding if UTF8
print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test"
+"\N{runic cross punctuation}","AOK :-)"
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("

$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-)
German/ALCOR quoting: *test* AOK :-)
German/ALCOR quoting:
Traceback (most recent call last):
File "./uc.py", line 5, in <module>
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)

The last print statement fails because the ascii "imported" characters
are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that
it is actually already UTF-8 unicode?

Cheers
NevilleDNZ
Jan 1 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: GRoll21 | last post by:
In this program, it is going to change a value in the array. So to give it it's new value I need to assign it the new value. It doesn't like the...
8
by: Steve Wasser | last post by:
I'm pulling SQL data into a dataset to be used to perform some math against. I asked this question earlier, but the answer someone gave me left me...
2
by: ibiza | last post by:
Hi all, I'm using a business logic layer as described in this source code : ...
2
by: Chris | last post by:
Hi again, I want to read all the records of a table with 2 fields. The problem is that some records have null value in the second field. This...
0
by: kenny | last post by:
Hello, I have a problem converting a hexadecimal string to unicode text and save it to a file. I do not know how to make it, so I ask here... And...
8
by: ma740988 | last post by:
Data stored on a storage device is byte swapped. The data is big endian and my PC is little. At issue: There's a composite type ( a header ) at...
1
by: ankushganatra | last post by:
Hi.... Can you please explain me the difference between type conversion and type casting...?
3
by: Sep410 | last post by:
Hi all, Please help me in this code: txtLastNameFM.Text = IIf(dgFamilyMember.CurrentRow.Cells(3).Equals(System.DBNull.Value), "",...
0
by: tammygombez | last post by:
Hey everyone! I've been researching gaming laptops lately, and I must say, they can get pretty expensive. However, I've come across some great...
0
by: concettolabs | last post by:
In today's business world, businesses are increasingly turning to PowerApps to develop custom business applications. PowerApps is a powerful tool...
0
better678
by: better678 | last post by:
Question: Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct? Answer: Java is an object-oriented...
0
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
0
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
0
by: CD Tom | last post by:
This only shows up in access runtime. When a user select a report from my report menu when they close the report they get a menu I've called Add-ins...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.