473,800 Members | 2,529 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Python 3.0b2 cannot map '\u12b'

Hello,

I am using Python 3.0b2.
I have an XML file that has the unicode character '\u012b' in it,
which, when parsed, causes a UnicodeEncodeEr ror:

'charmap' codec can't encode character '\u012b' in position 26:
character maps to <undefined>

This happens even when I assign this character to a reference in the
interpreter:

Python 3.0b2 (r30b2:65106, Jul 18 2008, 18:44:17) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright" , "credits" or "license" for more information.
>>s = '\u012b'
s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python30\li b\io.py", line 1428, in write
b = encoder.encode( s)
File "C:\Python30\li b\encodings\cp4 37.py", line 19, in encode
return codecs.charmap_ encode(input,se lf.errors,encod ing_map)[0]
UnicodeEncodeEr ror: 'charmap' codec can't encode character '\u012b' in
position
1: character maps to <undefined>

Is this a known issue, or am I doing something wrong?
Here is a link to the XML file. The character is on line 600, char 54

http://rubyquiz.com/SongLibrary.xml.gz
Aug 31 '08 #1
8 2004
josh logan <de************ @gmail.comwrote :
>
I am using Python 3.0b2.
I have an XML file that has the unicode character '\u012b' in it,
which, when parsed, causes a UnicodeEncodeEr ror:

'charmap' codec can't encode character '\u012b' in position 26:
character maps to <undefined>

This happens even when I assign this character to a reference in the
interpreter:

Python 3.0b2 (r30b2:65106, Jul 18 2008, 18:44:17) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright" , "credits" or "license" for more information.
>>>s = '\u012b'
s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python30\li b\io.py", line 1428, in write
b = encoder.encode( s)
File "C:\Python30\li b\encodings\cp4 37.py", line 19, in encode
return codecs.charmap_ encode(input,se lf.errors,encod ing_map)[0]
UnicodeEncodeE rror: 'charmap' codec can't encode character '\u012b' in
position
1: character maps to <undefined>

Is this a known issue, or am I doing something wrong?
Both. U+012B is the Latin lower-case i with macron (i with a bar instead
of a dot). That character does not exist in the 8-bit character set CP437,
which you are trying to use.

If you choose an 8-bit character set that includes i-with-macron, then it
will work. UTF-8 would be a good choice. It's in ISO-8859-10.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Sep 1 '08 #2


Tim Roberts wrote:
josh logan <de************ @gmail.comwrote :
>I am using Python 3.0b2.
I have an XML file that has the unicode character '\u012b' in it,
which, when parsed, causes a UnicodeEncodeEr ror:

'charmap' codec can't encode character '\u012b' in position 26:
character maps to <undefined>

This happens even when I assign this character to a reference in the
interpreter:

Python 3.0b2 (r30b2:65106, Jul 18 2008, 18:44:17) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright" , "credits" or "license" for more information.
>>>>s = '\u012b'
s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python30\li b\io.py", line 1428, in write
b = encoder.encode( s)
File "C:\Python30\li b\encodings\cp4 37.py", line 19, in encode
return codecs.charmap_ encode(input,se lf.errors,encod ing_map)[0]
UnicodeEncodeE rror: 'charmap' codec can't encode character '\u012b' in
position
1: character maps to <undefined>

Is this a known issue, or am I doing something wrong?

Both. U+012B is the Latin lower-case i with macron (i with a bar instead
of a dot). That character does not exist in the 8-bit character set CP437,
which you are trying to use.

If you choose an 8-bit character set that includes i-with-macron, then it
will work. UTF-8 would be a good choice. It's in ISO-8859-10.
I doubt the OP 'chose' cp437. Why does Python using cp437 even when the
default encoding is utf-8?

On WinXP
>>sys.getdefaul tencoding()
'utf-8'
>>s='\u012b'
s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python30\ lib\io.py", line 1428, in write
b = encoder.encode( s)
File "C:\Program Files\Python30\ lib\encodings\c p437.py", line 19, in
encode
return codecs.charmap_ encode(input,se lf.errors,encod ing_map)[0]
UnicodeEncodeEr ror: 'charmap' codec can't encode character '\u012b' in
position
1: character maps to <undefined>

To put it another way, how can one 'choose' utf-8 for display to screen?

Using IDLE, display works fine.

IDLE 3.0b2
>>s='\u012b'
s
'Ä«' # i macron
>>import sys
sys.getdefaul tencoding()
'utf-8'

I ran across this is a different context and mentioned it on the bug
tracker, but the Windows interpreter seems broken here.

I will send this in UTF-8 so the i-macron will hopefully show up.

tjr

Sep 1 '08 #3
On Mon, 01 Sep 2008 02:27:54 -0400, Terry Reedy wrote:
I doubt the OP 'chose' cp437. Why does Python using cp437 even when the
default encoding is utf-8?

On WinXP
>>sys.getdefaul tencoding()
'utf-8'
>>s='\u012b'
>>s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python30\ lib\io.py", line 1428, in write
b = encoder.encode( s)
File "C:\Program Files\Python30\ lib\encodings\c p437.py", line 19, in
encode
return codecs.charmap_ encode(input,se lf.errors,encod ing_map)[0]
UnicodeEncodeEr ror: 'charmap' codec can't encode character '\u012b' in
position
1: character maps to <undefined>
Most likely because Python figured out that the terminal expects cp437.
What does `sys.stdout.enc oding` say?
To put it another way, how can one 'choose' utf-8 for display to screen?
If the terminal expects cp437 then displaying utf-8 might give some
problems.

Ciao,
Marc 'BlackJack' Rintsch
Sep 1 '08 #4
On Sep 1, 8:19*am, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
On Mon, 01 Sep 2008 02:27:54 -0400, Terry Reedy wrote:
I doubt the OP 'chose' cp437. *Why does Python using cp437 even when the
default encoding is utf-8?
On WinXP
*>>sys.getdefau ltencoding()
'utf-8'
*>>s='\u012b'
*>>s
Traceback (most recent call last):
* *File "<stdin>", line 1, in <module>
* *File "C:\Program Files\Python30\ lib\io.py", line 1428, in write
* * *b = encoder.encode( s)
* *File "C:\Program Files\Python30\ lib\encodings\c p437.py", line 19, in
encode
* * *return codecs.charmap_ encode(input,se lf.errors,encod ing_map)[0]
UnicodeEncodeEr ror: 'charmap' codec can't encode character '\u012b' in
position
1: character maps to <undefined>

Most likely because Python figured out that the terminal expects cp437. *
What does `sys.stdout.enc oding` say?
To put it another way, how can one 'choose' utf-8 for display to screen?

If the terminal expects cp437 then displaying utf-8 might give some
problems.

Ciao,
* * * * Marc 'BlackJack' Rintsch
So, it is not a problem with the program, but a problem when I print
it out.
sys.stdout.enco ding does say cp437.

Now, when I don't print anything out, the program hangs. I will try
this again and let the board know the results.

Thanks for all of your insight.
Sep 1 '08 #5


Marc 'BlackJack' Rintsch wrote:
On Mon, 01 Sep 2008 02:27:54 -0400, Terry Reedy wrote:
>I doubt the OP 'chose' cp437. Why does Python using cp437 even when the
default encoding is utf-8?

On WinXP
> >>sys.getdefaul tencoding()
'utf-8'
> >>s='\u012b'
s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python30\ lib\io.py", line 1428, in write
b = encoder.encode( s)
File "C:\Program Files\Python30\ lib\encodings\c p437.py", line 19, in
encode
return codecs.charmap_ encode(input,se lf.errors,encod ing_map)[0]
UnicodeEncodeE rror: 'charmap' codec can't encode character '\u012b' in
position
1: character maps to <undefined>

Most likely because Python figured out that the terminal expects cp437.
What does `sys.stdout.enc oding` say?
The interpreter in the command prompt window says CP437.
The IDLE Window says 'cp1252', and it handles the character fine.
Given that Windows OS can handle the character, why is Python/Command
Prompt limiting output?

Characters the IDLE window cannot display (like surrogate pairs) it
displays as boxes. But if I cut '[][]' (4 chars) and paste into
Firefox, I get 3 chars. '[]' where [] has some digits instead of being
empty. It is really confusing when every window on 'unicode-based'
Windows handles a different subset. Is this the fault of Windows or of
Python and IDLE (those two being more limited that FireFox)?
>To put it another way, how can one 'choose' utf-8 for display to screen?

If the terminal expects cp437 then displaying utf-8 might give some
problems.
My screen displays whatever Windows tells the graphics card to tell the
screen to display. In OpenOffice, I can select a unicode font that
displays at least everything in the BasicMultilingu alPlane (BMP).

Terry Jan Reedy

Sep 1 '08 #6
On Mon, 01 Sep 2008 14:25:01 -0400, Terry Reedy wrote:
Marc 'BlackJack' Rintsch wrote:
>On Mon, 01 Sep 2008 02:27:54 -0400, Terry Reedy wrote:

Most likely because Python figured out that the terminal expects cp437.
What does `sys.stdout.enc oding` say?

The interpreter in the command prompt window says CP437. The IDLE Window
says 'cp1252', and it handles the character fine. Given that Windows OS
can handle the character, why is Python/Command Prompt limiting output?
The windows command prompt expects cp437 because that's what old DOS
programs print to it.
Characters the IDLE window cannot display (like surrogate pairs) it
displays as boxes. But if I cut '[][]' (4 chars) and paste into
Firefox, I get 3 chars. '[]' where [] has some digits instead of being
empty. It is really confusing when every window on 'unicode-based'
Windows handles a different subset.
That's because it is not 'unicode-based'. Communication between those
programs has to be done with bytes, so the sender has to encode unicode
characters in the encoding the receiver expects.
Is this the fault of Windows or of Python and IDLE (those two being
more limited that FireFox)?
It's nobodies fault. That's simply how the encoding stuff works.
>>To put it another way, how can one 'choose' utf-8 for display to
screen?

If the terminal expects cp437 then displaying utf-8 might give some
problems.

My screen displays whatever Windows tells the graphics card to tell the
screen to display.
But the terminal gets bytes and expects them to be cp437 encoded
characters and not utf-8. So you can't send whatever unicode character
you want, at least not without changing the encoding of the terminal.
In OpenOffice, I can select a unicode font that displays at least
everything in the BasicMultilingu alPlane (BMP).
But OOo works with unicode internally, so there's no communication with
outside programs involved here.

Ciao,
Marc 'BlackJack' Rintsch
Sep 1 '08 #7


Marc 'BlackJack' Rintsch wrote:

First, thank you for the informative responses.
The windows command prompt expects cp437 because that's what old DOS
programs print to it.
Grrr. When the interpreter runs, it opens the command prompt window
with Python running, and the window closes when Python exits, so there
are no other programs involved. I don't suppose there is anyway to tell
Command Prompt to accept something better.
But OOo works with unicode internally, so there's no communication with
outside programs involved here.
Python 3 uses unicode internally also, but I gather CommandPrompt is an
outside program used as a quick substitute for coding a plain window
with MFC, for instance.

----------------------
I did some experiments.

I added the /u flag after cmd.exe in the Command Prompt shortcut and the
font to Lucida Console (which people on the web say handles unicode).

I opened the prompts window and entered 'chcp 1252' the same codepage as
IDLE. Start Python3.
>>import sys
sys.stdout.en coding
'cp1252'
>>'\u012b'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python30\ lib\io.py", line 1428, in
b = encoder.encode( s)
File "C:\Program Files\Python30\ lib\encodings\c p1252.py", <etc>
same with raster font choice.

chcp 65001, which supposedly is UTF-8, disables all output. Perhaps
Python does not recognize it as a synonym for UTF-8.

The same on IDLE (with codepage 1252) gives i macron (bar on top). So
something else is going on other than just codepage.

I tried a second time and instead got "'\u012b'" and no error. Hooray,
I thought, but I closed and tried again the same way, as best I know,
but got the same error as before. Cp65001 also did and then did not
work. Python does notice the code page change.

tjr

Sep 2 '08 #8
Terry Reedy wrote:
>If the terminal expects cp437 then displaying utf-8 might give some
problems.
My screen displays whatever Windows tells the graphics card to tell
the screen to display. In OpenOffice, I can select a unicode font
that displays at least everything in the BasicMultilingu alPlane (BMP).
It would appear that the Windows port of Python is probably just not
forcing the Win32 console into the Unicode mode or using the Unicode
APIs. (If this holds true, it could be a leftover from the Windows
95/98/ME days, I suppose...)

<http://en.wikipedia.or g/wiki/Win32_console>

As a workaround - for the time being - you might want to try something
similar as described in the thread "Changing the (codec) error handler
for the stdout/stderr streams in Python 3.0".

The approach described in there will not let you print characters
outside the codepage 437 repertoaire - any such characters will still
need to be substituted with something else - but at least this
substitution should happen automatically; i.e. you can keep using the
normal print() function the normal way - even for the fancier
characters - and your program will no longer crash.

It would be nice to see proper Unicode Win32 console support in Python,
of course, if at all possible.

--
znark

Sep 2 '08 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

699
34270
by: mike420 | last post by:
I think everyone who used Python will agree that its syntax is the best thing going for it. It is very readable and easy for everyone to learn. But, Python does not a have very good macro capabilities, unfortunately. I'd like to know if it may be possible to add a powerful macro system to Python, while keeping its amazing syntax, and if it could be possible to add Pythonistic syntax to Lisp or Scheme, while keeping all of the...
3
86492
by: Kay Lee | last post by:
Hi, I looked up os module to find out some method to move and copy files in python, but os doesn't support such methods. Is there any way to move & copy files in python? Thanks in adv.
226
12723
by: Stephen C. Waterbury | last post by:
This seems like it ought to work, according to the description of reduce(), but it doesn't. Is this a bug, or am I missing something? Python 2.3.2 (#1, Oct 20 2003, 01:04:35) on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d1 = {'a':1} >>> d2 = {'b':2} >>> d3 = {'c':3}
137
7200
by: Philippe C. Martin | last post by:
I apologize in advance for launching this post but I might get enlightment somehow (PS: I am _very_ agnostic ;-). - 1) I do not consider my intelligence/education above average - 2) I am very pragmatic - 3) I usually move forward when I get the gut feeling I am correct - 4) Most likely because of 1), I usually do not manage to fully explain 3) when it comes true. - 5) I have developed for many years (>18) in many different environments,...
8
4608
by: Joakim Persson | last post by:
Hello all. I am involved in a project where we have a desire to improve our software testing tools, and I'm in charge of looking for solutions regarding the logging of our software (originating from embedded devices). Currently, we are using a heavyweight, proprietary log tool developed by another part of the company. This tool contains all "standard" logging functionality, but we also need to insert "debug" log points in the software of...
0
337
by: Kurt B. Kaiser | last post by:
Patch / Bug Summary ___________________ Patches : 375 open ( -3) / 3264 closed (+26) / 3639 total (+23) Bugs : 910 open ( +3) / 5851 closed (+20) / 6761 total (+23) RFE : 217 open ( -1) / 220 closed ( +3) / 437 total ( +2) New / Reopened Patches ______________________
48
4962
by: meyer | last post by:
Hi everyone, which compiler will Python 2.5 on Windows (Intel) be built with? I notice that Python 2.4 apparently has been built with the VS2003 toolkit compiler, and I read a post from Scott David Daniels where he said that probably the VS2003 toolkit will be used for Python 2.5 again. However, even before the release of Python 2.5, I cannot seem to find many retailers around here that still carry Visual Studio 2003, and some were a...
0
10504
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10033
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9085
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7576
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6811
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5469
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5606
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4149
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3764
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.