473,668 Members | 2,690 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

raw_input() and utf-8 formatted chars

s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut

s = raw_input('Ente r: ') #A\xcc\x88
print s #displays A\xcc\x88

print len(input) #9
It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?

Oct 12 '07 #1
6 4204
On Oct 12, 1:53 pm, 7stud <bbxx789_0...@y ahoo.comwrote:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut

s = raw_input('Ente r: ') #A\xcc\x88
print s #displays A\xcc\x88

print len(input) #9

It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?
I don't know. This works for me:
>>x = raw_input('Ente r: ')
Enter: ä
>>len(x)
1
>>>
I'm using Python 2.4 with Default Source Encoding set to None on
Windows XP SP2.

Mike

Oct 12 '07 #2
On Oct 12, 1:18 pm, kyoso...@gmail. com wrote:
On Oct 12, 1:53 pm, 7stud <bbxx789_0...@y ahoo.comwrote:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
s = raw_input('Ente r: ') #A\xcc\x88
print s #displays A\xcc\x88
print len(input) #9
It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?

I don't know. This works for me:
>x = raw_input('Ente r: ')
Enter: ä
>len(x)
1

I'm using Python 2.4 with Default Source Encoding set to None on
Windows XP SP2.

Mike
Yeah, but what happens when you enter A\xcc\x88? And what is it that
your keyboard enters to produce an 'a' with an umlaut?

Oct 12 '07 #3
On Fri, 12 Oct 2007 13:18:35 -0700, 7stud wrote:
On Oct 12, 1:18 pm, kyoso...@gmail. com wrote:
>On Oct 12, 1:53 pm, 7stud <bbxx789_0...@y ahoo.comwrote:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
s = raw_input('Ente r: ') #A\xcc\x88
print s #displays A\xcc\x88
print len(input) #9
It looks like every character of the string I enter in utf-8 is being
interpreted literally as 9 separate characters rather than one
character. How do I enter a capital A with an umlaut so that python
treats it as one character?

I don't know. This works for me:
>>x = raw_input('Ente r: ')
Enter:
>>len(x)
1

I'm using Python 2.4 with Default Source Encoding set to None on
Windows XP SP2.

Mike

Yeah, but what happens when you enter A\xcc\x88?
You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.
And what is it that your keyboard enters to produce an 'a' with an umlaut?
*I* just hit the ä key. The one right next to the ö key. ;-)

Ciao,
Marc 'BlackJack' Rintsch
Oct 12 '07 #4
On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.
Then why don't I end up with the same situation as this:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
And what is it that your keyboard enters to produce an 'a' with an umlaut?

*I* just hit the ä key. The one right next to the ö key. ;-)
....and what if you don't have an a-with-umlaut key?

Oct 13 '07 #5
On Fri, 12 Oct 2007 19:09:46 -0700, 7stud wrote:
On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
>You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.

Then why don't I end up with the same situation as this:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
I don't get the question!? In string literals in source code the
backslash has a special meaning, like I wrote above. When Python compiles
that above snippet you end up with a string of three bytes, one with the
ASCII value of an 'A' and two bytes where you typed in the byte value in
hexadecimal:

In [191]: s = 'A\xcc\x88'

In [192]: len(s)
Out[192]: 3

In [193]: map(ord, s)
Out[193]: [65, 204, 136]

In [194]: print s
Ä

The last works this way only if the receiving/displaying program expected
UTF-8 as encoding. Otherwise something other than an Ä would have been
shown.

If you type in that text when asked by `raw_input()` then you get exactly
what you typed because there is no Python source code compiled:

In [195]: s = raw_input()
A\xcc\x88

In [196]: len(s)
Out[196]: 9

In [197]: map(ord, s)
Out[197]: [65, 92, 120, 99, 99, 92, 120, 56, 56]

In [198]: print s
A\xcc\x88
And what is it that your keyboard enters to produce an 'a' with an
umlaut?

*I* just hit the key. The one right next to the ö key. ;-)
...and what if you don't have an a-with-umlaut key?
I find other means to enter it. <Alt+ some magic number on the numeric
keypad in windows, or <Compose>, <a>, <"on Unix/Linux. Some text editors
offer special sequences too. If all fails there are character map
programs that show all unicode characters to choose from and copy'n'paste
them.

Ciao,
Marc 'BlackJack' Rintsch
Oct 13 '07 #6
On Oct 13, 3:09 am, 7stud <bbxx789_0...@y ahoo.comwrote:
On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
You mean literally!? Then of course I get A\xcc\x88 because that's what I
entered. In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.

Then why don't I end up with the same situation as this:
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
And what is it that your keyboard enters to produce an 'a' with an umlaut?
*I* just hit the ä key. The one right next to the ö key. ;-)

...and what if you don't have an a-with-umlaut key?
raw_input() returns the string exactly as you entered it. You can
decode that into the actual UTF-8 string with decode("string_ escape"):

s = raw_input('Ente r: ') #A\xcc\x88
s = s.decode("strin g_escape")

It looks like your system already understands UTF-8 and will decode
the UTF-8 string you print to the Unicode character.

Oct 13 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2514
by: Hugh | last post by:
I am using python 2.3 through the PythonWin program on windows. I would like to create a console-based interactive session. The program raw_input is almost exactly what I'd like, except that whenever I call raw_input(), it pops up a window on my screen. I'd much rather have it read from the interactive window. Is there something out there, as easy as raw_input(), that I can use? Thanks for any clues!
5
2994
by: Helmut Jarausch | last post by:
Hi when using an interactive Python script, I'd like the prompt given by raw_input to go to stderr since stdout is redirected to a file. How can I change this (and suggest making this the default behaviour) Many thanks for a hint, Helmut Jarausch
2
7233
by: J. W. McCall | last post by:
I'm working on a MUD server and I have a thread that gets keyboard input so that you can enter commands from the command line while it's in its main server loop. Everything works fine except that if a player enters the 'shutdown' command, everything shuts down, but the input thread is still sitting waiting for enter to be pressed for raw_input. After enter is pressed, it exits back to the command prompt as it should. I'm wondering if...
0
1816
by: dale | last post by:
Python newbie disclaimer on I am running an app with Tkinter screen in one thread and command-line input in another thread using raw_input(). First question - is this legal, should it run without issue? If not can you point me to a description of why. While updating objects on the screen I get a segfault after an indeterminate number of updates. It doesn't seem to matter how quickly the updates occur, but it does segfault faster...
1
2942
by: JerryKreps | last post by:
Hi, folks -- I'm a Python pup. As you can see from the session copied at the end of this post, I have the latest version of Python, and I've been using the Editor-Shell of the latest version of Boa Constructor while going through some Python tutorials. Everything was working as expected until I started using the raw_input built-in function. There seems to be some unreliable behavior in Boa Constructor's Editor - Shell. If you look...
21
8759
by: planetthoughtful | last post by:
Hi All, As always, my posts come with a 'Warning: Newbie lies ahead!' disclaimer... I'm wondering if it's possible, using raw_input(), to provide a 'default' value with the prompt? I would like to include the ability to edit an existing value (drawn from an SQLite table) using a DOS console Python app, but my gut
2
1484
by: tim | last post by:
I want to write a program that looks into a given folder, groups files that have a certain part of the filename in common and then copy those groups one at a time to another place, using the raw_input prompt to continue or break. here's what I have: ########### def makegroepen(): global p
17
4184
by: Stuart McGraw | last post by:
In the announcement for Python-2.3 http://groups.google.com/group/comp.lang.python/msg/287e94d9fe25388d?hl=en it says "raw_input(): can now return Unicode objects". But I didn't see anything about this in Andrew Kuchling's "2.3 What's New", nor does the current python docs for raw_input() say anything about this. A test on a MS Windows system with a cp932 (japanese) default locale shows the object returned by raw_input() is a str()...
7
2037
by: Mike Kent | last post by:
It's often useful for debugging to print something to stderr, and to route the error output to a file using '2>filename' on the command line. However, when I try that with a python script, all prompt output from raw_input goes to stderr. Consider the following test program: === Start test.py === import sys
8
5302
by: Dox33 | last post by:
I ran into a very strange behaviour of raw_input(). I hope somebody can tell me how to fix this. (Or is this a problem in the python source?) I will explain the problem by using 3 examples. (Sorry, long email) The first two examples are behaving normal, the thirth is strange....... I wrote the following flabbergasting code: #-------------------------------------------------------------
0
8462
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8381
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8797
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8583
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8656
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7401
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5681
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4205
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
1786
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.