473,404 Members | 2,137 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

raw_input() and utf-8 formatted chars

s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut

s = raw_input('Enter: ') #A\xcc\x88
print s #displays A\xcc\x88

print len(input) #9
It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?

Oct 12 '07 #1
6 4194
On Oct 12, 1:53 pm, 7stud <bbxx789_0...@yahoo.comwrote:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut

s = raw_input('Enter: ') #A\xcc\x88
print s #displays A\xcc\x88

print len(input) #9

It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?
I don't know. This works for me:
>>x = raw_input('Enter: ')
Enter: ä
>>len(x)
1
>>>
I'm using Python 2.4 with Default Source Encoding set to None on
Windows XP SP2.

Mike

Oct 12 '07 #2
On Oct 12, 1:18 pm, kyoso...@gmail.com wrote:
On Oct 12, 1:53 pm, 7stud <bbxx789_0...@yahoo.comwrote:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
s = raw_input('Enter: ') #A\xcc\x88
print s #displays A\xcc\x88
print len(input) #9
It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?

I don't know. This works for me:
>x = raw_input('Enter: ')
Enter: ä
>len(x)
1

I'm using Python 2.4 with Default Source Encoding set to None on
Windows XP SP2.

Mike
Yeah, but what happens when you enter A\xcc\x88? And what is it that
your keyboard enters to produce an 'a' with an umlaut?

Oct 12 '07 #3
On Fri, 12 Oct 2007 13:18:35 -0700, 7stud wrote:
On Oct 12, 1:18 pm, kyoso...@gmail.com wrote:
>On Oct 12, 1:53 pm, 7stud <bbxx789_0...@yahoo.comwrote:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
s = raw_input('Enter: ') #A\xcc\x88
print s #displays A\xcc\x88
print len(input) #9
It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?

I don't know. This works for me:
>>x = raw_input('Enter: ')
Enter:
>>len(x)
1

I'm using Python 2.4 with Default Source Encoding set to None on
Windows XP SP2.

Mike

Yeah, but what happens when you enter A\xcc\x88?
You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.
And what is it that your keyboard enters to produce an 'a' with an umlaut?
*I* just hit the ä key. The one right next to the ö key. ;-)

Ciao,
Marc 'BlackJack' Rintsch
Oct 12 '07 #4
On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.
Then why don't I end up with the same situation as this:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
And what is it that your keyboard enters to produce an 'a' with an umlaut?

*I* just hit the ä key. The one right next to the ö key. ;-)
....and what if you don't have an a-with-umlaut key?

Oct 13 '07 #5
On Fri, 12 Oct 2007 19:09:46 -0700, 7stud wrote:
On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
>You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.

Then why don't I end up with the same situation as this:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
I don't get the question!? In string literals in source code the
backslash has a special meaning, like I wrote above. When Python compiles
that above snippet you end up with a string of three bytes, one with the
ASCII value of an 'A' and two bytes where you typed in the byte value in
hexadecimal:

In [191]: s = 'A\xcc\x88'

In [192]: len(s)
Out[192]: 3

In [193]: map(ord, s)
Out[193]: [65, 204, 136]

In [194]: print s
Ä

The last works this way only if the receiving/displaying program expected
UTF-8 as encoding. Otherwise something other than an Ä would have been
shown.

If you type in that text when asked by `raw_input()` then you get exactly
what you typed because there is no Python source code compiled:

In [195]: s = raw_input()
A\xcc\x88

In [196]: len(s)
Out[196]: 9

In [197]: map(ord, s)
Out[197]: [65, 92, 120, 99, 99, 92, 120, 56, 56]

In [198]: print s
A\xcc\x88
And what is it that your keyboard enters to produce an 'a' with an
umlaut?

*I* just hit the key. The one right next to the ö key. ;-)
...and what if you don't have an a-with-umlaut key?
I find other means to enter it. <Alt+ some magic number on the numeric
keypad in windows, or <Compose>, <a>, <"on Unix/Linux. Some text editors
offer special sequences too. If all fails there are character map
programs that show all unicode characters to choose from and copy'n'paste
them.

Ciao,
Marc 'BlackJack' Rintsch
Oct 13 '07 #6
On Oct 13, 3:09 am, 7stud <bbxx789_0...@yahoo.comwrote:
On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.

Then why don't I end up with the same situation as this:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
And what is it that your keyboard enters to produce an 'a' with an umlaut?
*I* just hit the ä key. The one right next to the ö key. ;-)

...and what if you don't have an a-with-umlaut key?
raw_input() returns the string exactly as you entered it. You can
decode that into the actual UTF-8 string with decode("string_escape"):

s = raw_input('Enter: ') #A\xcc\x88
s = s.decode("string_escape")

It looks like your system already understands UTF-8 and will decode
the UTF-8 string you print to the Unicode character.

Oct 13 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Hugh | last post by:
I am using python 2.3 through the PythonWin program on windows. I would like to create a console-based interactive session. The program raw_input is almost exactly what I'd like, except that...
5
by: Helmut Jarausch | last post by:
Hi when using an interactive Python script, I'd like the prompt given by raw_input to go to stderr since stdout is redirected to a file. How can I change this (and suggest making this the...
2
by: J. W. McCall | last post by:
I'm working on a MUD server and I have a thread that gets keyboard input so that you can enter commands from the command line while it's in its main server loop. Everything works fine except that...
0
by: dale | last post by:
Python newbie disclaimer on I am running an app with Tkinter screen in one thread and command-line input in another thread using raw_input(). First question - is this legal, should it run...
1
by: JerryKreps | last post by:
Hi, folks -- I'm a Python pup. As you can see from the session copied at the end of this post, I have the latest version of Python, and I've been using the Editor-Shell of the latest version of...
21
by: planetthoughtful | last post by:
Hi All, As always, my posts come with a 'Warning: Newbie lies ahead!' disclaimer... I'm wondering if it's possible, using raw_input(), to provide a 'default' value with the prompt? I would...
2
by: tim | last post by:
I want to write a program that looks into a given folder, groups files that have a certain part of the filename in common and then copy those groups one at a time to another place, using the...
17
by: Stuart McGraw | last post by:
In the announcement for Python-2.3 http://groups.google.com/group/comp.lang.python/msg/287e94d9fe25388d?hl=en it says "raw_input(): can now return Unicode objects". But I didn't see anything...
7
by: Mike Kent | last post by:
It's often useful for debugging to print something to stderr, and to route the error output to a file using '2>filename' on the command line. However, when I try that with a python script, all...
8
by: Dox33 | last post by:
I ran into a very strange behaviour of raw_input(). I hope somebody can tell me how to fix this. (Or is this a problem in the python source?) I will explain the problem by using 3 examples....
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.