On Fri, 12 Oct 2007 19:09:46 -0700, 7stud wrote:
On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
>You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.
Then why don't I end up with the same situation as this:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
I don't get the question!? In string literals in source code the
backslash has a special meaning, like I wrote above. When Python compiles
that above snippet you end up with a string of three bytes, one with the
ASCII value of an 'A' and two bytes where you typed in the byte value in
hexadecimal:
In [191]: s = 'A\xcc\x88'
In [192]: len(s)
Out[192]: 3
In [193]: map(ord, s)
Out[193]: [65, 204, 136]
In [194]: print s
Ä
The last works this way only if the receiving/displaying program expected
UTF-8 as encoding. Otherwise something other than an Ä would have been
shown.
If you type in that text when asked by `raw_input()` then you get exactly
what you typed because there is no Python source code compiled:
In [195]: s = raw_input()
A\xcc\x88
In [196]: len(s)
Out[196]: 9
In [197]: map(ord, s)
Out[197]: [65, 92, 120, 99, 99, 92, 120, 56, 56]
In [198]: print s
A\xcc\x88
And what is it that your keyboard enters to produce an 'a' with an
umlaut?
*I* just hit the key. The one right next to the ö key. ;-)
...and what if you don't have an a-with-umlaut key?
I find other means to enter it. <Alt+ some magic number on the numeric
keypad in windows, or <Compose>, <a>, <"on Unix/Linux. Some text editors
offer special sequences too. If all fails there are character map
programs that show all unicode characters to choose from and copy'n'paste
them.
Ciao,
Marc 'BlackJack' Rintsch