473,695 Members | 1,976 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Short questions wrt Python & Unicode

KvS
Hi all,

I've been reading about unicode in general and using it in Python in
particular lately as this turns out to be not so straightforward
actually. I wanted to aks two questions:

1) I'm writing a program that interacts with the user through wxPython
(unicode build) and stores & retrieves data using PySQLite. As fas as I
know now, both packages are capable of handling Python unicode objects
(wxPython returns the values of text controls etc. by default as Python
unicode objects and "TEXT" columns in PySQLite have unicode entries)
and since of course both interface with me through Python unicode
objects I should be able to use each others generated unicode objects
without any fear in each other functions, right??

2) How do I get a representation of a unic. object in terms of Unicode
code points? repr() doesn't do that, it sometimes parses or encodes the
code points right:
s=u"\u0040\u016 6\u00e6"
s

u'@\u0166\xe6'

(does this latter \xe6 have to do with the internal representation of
unic. objects, maybe with this UCS-2 encoding?)

Thanks in advance!

- Kees

Jun 9 '06 #1
3 2146
On 9/06/2006 10:04 PM, KvS wrote:
2) How do I get a representation of a unic. object in terms of Unicode
code points? repr() doesn't do that, it sometimes parses or encodes the
code points right:

|>>> s=u"\u0040\u016 6\u00e6"
|>>> s
u'@\u0166\xe6'
|>>> ' '.join('U+%04X % ord(c) for c in s)
'U+0040 U+0166 U+00E6'

If you'd prefer it more Pythonic than unicode.orgic, adjust the format
string and separator to suit your taste.
(does this latter \xe6 have to do with the internal representation of
unic. objects, maybe with this UCS-2 encoding?)


|>>> u'\xe6' == u'\u00e6' == unichr(0xe6)
True
|>>> hex(ord(u'\u00e 6'))
'0xe6'

U+nnnnnn is represented internally as the integer 0xnnnnnn -- except if
it won't fit, but you can pretend that surrogate pairs don't exist, for
the moment :-)

Cheers,
John

Jun 9 '06 #2
KvS wrote:
s=u"\u0040\u016 6\u00e6"
s

u'@\u0166\xe6'

(does this latter \xe6 have to do with the internal representation of
unic. objects, maybe with this UCS-2 encoding?)


no, it's simply the shortest way to represent U+00E6 as Python Unicode
string literal, when limited to ASCII only.

</F>

Jun 9 '06 #3
KvS

John Machin wrote:
On 9/06/2006 10:04 PM, KvS wrote:
2) How do I get a representation of a unic. object in terms of Unicode
code points? repr() doesn't do that, it sometimes parses or encodes the
code points right:

|>>> s=u"\u0040\u016 6\u00e6"
|>>> s
u'@\u0166\xe6'


|>>> ' '.join('U+%04X % ord(c) for c in s)
'U+0040 U+0166 U+00E6'

If you'd prefer it more Pythonic than unicode.orgic, adjust the format
string and separator to suit your taste.
(does this latter \xe6 have to do with the internal representation of
unic. objects, maybe with this UCS-2 encoding?)


|>>> u'\xe6' == u'\u00e6' == unichr(0xe6)
True
|>>> hex(ord(u'\u00e 6'))
'0xe6'

U+nnnnnn is represented internally as the integer 0xnnnnnn -- except if
it won't fit, but you can pretend that surrogate pairs don't exist, for
the moment :-)

Cheers,
John


Thanks to you and Fredrik! What about q1? I know it's silly since for
integers e.g. one doesn't give such an issue any thought at all, it's
just that this understanding of en/decodings etc. make things a bit
more blurry to me. It should be the case that a package may do
internally (en-/decodign etc.) what it wants to represent/manipulate
unic. strings but should always communicate to the outside world via
the interchangable & uniform Python unicode object right?

Jun 9 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2253
by: Jonathon Blake | last post by:
All: Question Python is currently Unicode Compliant. What happens when strings are read in from text files that were created using GB 2312-1980, or KPS 9566-2003, or other, equally obscure code ranges?
4
1857
by: WX | last post by:
I love Python, and the unicode support is wonderful. The character set I am using is the Hindi/Devanagari character set at unicode range U+901.) I have TWO newbie questions: (#1) If I paste some unicode stuff from the clipboard into IDLE, it accepts it, but it can't execute a PRINT command like this:
4
1226
by: Fuzzyman | last post by:
I have a couple of questions about the UTF encodings. The codecs module has constants definded for the UTF32 encoding, yet this encoding isn't supported as a standard encoding. Why isn't it supported ? It possibly has something to do with my next question. I know that unicode has (recently?) been expanded to include new character sets. This means that the latest unicode standard can't be fully supported with 2 bytes per character. As...
1
2454
by: Kenneth McDonald | last post by:
I am going to demonstrate my complete lack of understanding as to going back and forth between character encodings, so I hope someone out there can shed some light on this. I have always depended on the kindness of strangers... :-) I'm playing around with some very simplistic french to english translation. As some text to work with, I copied the following from a french news site:
16
5206
by: PyDenis | last post by:
Today, I found strange error while using py2exe: 1. I wrote simple program and save as 1.py: import win32ui import win32con win32ui.MessageBox('Test messageBox.' , 'Test', win32con.MB_OK | win32con.MB_TOPMOST ) 2. I create 1_setup.py file for py2exe:
0
263
by: Kurt B. Kaiser | last post by:
Patch / Bug Summary ___________________ Patches : 431 open ( +3) / 3425 closed ( +8) / 3856 total (+11) Bugs : 916 open (-23) / 6273 closed (+44) / 7189 total (+21) RFE : 244 open ( +4) / 240 closed ( +1) / 484 total ( +5) New / Reopened Patches ______________________
3
2065
by: bsagert | last post by:
Some web feeds use decimal character entities that seem to confuse Python (or me). For example, the string "doesn't" may be coded as "doesn’t" which should produce a right leaning apostrophe. Python hates decimal entities beyond 128 so it chokes unless you do something like string.encode('utf-8'). Even then, what should have been a right-leaning apostrophe ends up as "’". The following script does just that. Look for the string "The...
0
8625
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9113
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8977
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8841
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8822
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6488
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
1
2997
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2269
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
1971
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.