471,357 Members | 1,083 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,357 software developers and data experts.

raw_input() and utf-8 formatted chars

s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut

s = raw_input('Enter: ') #A\xcc\x88
print s #displays A\xcc\x88

print len(input) #9
It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?

Oct 12 '07 #1
6 4132
On Oct 12, 1:53 pm, 7stud <bbxx789_0...@yahoo.comwrote:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut

s = raw_input('Enter: ') #A\xcc\x88
print s #displays A\xcc\x88

print len(input) #9

It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?
I don't know. This works for me:
>>x = raw_input('Enter: ')
Enter:
>>len(x)
1
>>>
I'm using Python 2.4 with Default Source Encoding set to None on
Windows XP SP2.

Mike

Oct 12 '07 #2
On Oct 12, 1:18 pm, kyoso...@gmail.com wrote:
On Oct 12, 1:53 pm, 7stud <bbxx789_0...@yahoo.comwrote:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
s = raw_input('Enter: ') #A\xcc\x88
print s #displays A\xcc\x88
print len(input) #9
It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?

I don't know. This works for me:
>x = raw_input('Enter: ')
Enter:
>len(x)
1

I'm using Python 2.4 with Default Source Encoding set to None on
Windows XP SP2.

Mike
Yeah, but what happens when you enter A\xcc\x88? And what is it that
your keyboard enters to produce an 'a' with an umlaut?

Oct 12 '07 #3
On Fri, 12 Oct 2007 13:18:35 -0700, 7stud wrote:
On Oct 12, 1:18 pm, kyoso...@gmail.com wrote:
>On Oct 12, 1:53 pm, 7stud <bbxx789_0...@yahoo.comwrote:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
s = raw_input('Enter: ') #A\xcc\x88
print s #displays A\xcc\x88
print len(input) #9
It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?

I don't know. This works for me:
>>x = raw_input('Enter: ')
Enter:
>>len(x)
1

I'm using Python 2.4 with Default Source Encoding set to None on
Windows XP SP2.

Mike

Yeah, but what happens when you enter A\xcc\x88?
You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.
And what is it that your keyboard enters to produce an 'a' with an umlaut?
*I* just hit the ä key. The one right next to the ö key. ;-)

Ciao,
Marc 'BlackJack' Rintsch
Oct 12 '07 #4
On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.
Then why don't I end up with the same situation as this:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
And what is it that your keyboard enters to produce an 'a' with an umlaut?

*I* just hit the key. The one right next to the key. ;-)
....and what if you don't have an a-with-umlaut key?

Oct 13 '07 #5
On Fri, 12 Oct 2007 19:09:46 -0700, 7stud wrote:
On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
>You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.

Then why don't I end up with the same situation as this:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
I don't get the question!? In string literals in source code the
backslash has a special meaning, like I wrote above. When Python compiles
that above snippet you end up with a string of three bytes, one with the
ASCII value of an 'A' and two bytes where you typed in the byte value in
hexadecimal:

In [191]: s = 'A\xcc\x88'

In [192]: len(s)
Out[192]: 3

In [193]: map(ord, s)
Out[193]: [65, 204, 136]

In [194]: print s
Ä

The last works this way only if the receiving/displaying program expected
UTF-8 as encoding. Otherwise something other than an Ä would have been
shown.

If you type in that text when asked by `raw_input()` then you get exactly
what you typed because there is no Python source code compiled:

In [195]: s = raw_input()
A\xcc\x88

In [196]: len(s)
Out[196]: 9

In [197]: map(ord, s)
Out[197]: [65, 92, 120, 99, 99, 92, 120, 56, 56]

In [198]: print s
A\xcc\x88
And what is it that your keyboard enters to produce an 'a' with an
umlaut?

*I* just hit the key. The one right next to the ö key. ;-)
...and what if you don't have an a-with-umlaut key?
I find other means to enter it. <Alt+ some magic number on the numeric
keypad in windows, or <Compose>, <a>, <"on Unix/Linux. Some text editors
offer special sequences too. If all fails there are character map
programs that show all unicode characters to choose from and copy'n'paste
them.

Ciao,
Marc 'BlackJack' Rintsch
Oct 13 '07 #6
On Oct 13, 3:09 am, 7stud <bbxx789_0...@yahoo.comwrote:
On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.

Then why don't I end up with the same situation as this:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
And what is it that your keyboard enters to produce an 'a' with an umlaut?
*I* just hit the key. The one right next to the key. ;-)

...and what if you don't have an a-with-umlaut key?
raw_input() returns the string exactly as you entered it. You can
decode that into the actual UTF-8 string with decode("string_escape"):

s = raw_input('Enter: ') #A\xcc\x88
s = s.decode("string_escape")

It looks like your system already understands UTF-8 and will decode
the UTF-8 string you print to the Unicode character.

Oct 13 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Hugh | last post: by
5 posts views Thread by Helmut Jarausch | last post: by
2 posts views Thread by J. W. McCall | last post: by
reply views Thread by dale | last post: by
21 posts views Thread by planetthoughtful | last post: by
17 posts views Thread by Stuart McGraw | last post: by
8 posts views Thread by Dox33 | last post: by
reply views Thread by XIAOLAOHU | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.