string.encode on HP-UX

Richard Townsend

Using Python-2.3.4 on HP-UX11i, the following code:

import locale
loc = locale.setlocale(locale.LC_ALL)
print 'locale =', loc
loc = locale.nl_langinfo(locale.CODESET)
print 'locale =', loc
print 'hello'.encode(loc, 'replace')

produces:

locale = C C C C C C
locale = roman8
Traceback (most recent call last):
File "test_locale.py", line 13, in ?
print 'hello'.encode(loc, 'replace')
LookupError: unknown encoding: roman8
[The same code on SUSE 9.1 doesn't raise an exception].

Should I be able to pass the value returned by nl_langinfo() to the
string.encode call?

Similar code is used by wxGlade and this exception prevents it from
running.

Does anybody know how to fix this on HP-UX?

--
Richard Townsend

Jul 18 '05 #1

Subscribe Post Reply

2658

Michael Hudson

Richard Townsend <ri******@edshk.demon.co.uk> writes:

Using Python-2.3.4 on HP-UX11i, the following code:

import locale
loc = locale.setlocale(locale.LC_ALL)
print 'locale =', loc
loc = locale.nl_langinfo(locale.CODESET)
print 'locale =', loc
print 'hello'.encode(loc, 'replace')

produces:

locale = C C C C C C
locale = roman8
Traceback (most recent call last):
File "test_locale.py", line 13, in ?
print 'hello'.encode(loc, 'replace')
LookupError: unknown encoding: roman8
[The same code on SUSE 9.1 doesn't raise an exception].

Should I be able to pass the value returned by nl_langinfo() to the
string.encode call?

What is roman8? If it's some hp-ux specific thingy, I guess the
solution is to teach Python what to do with it. If it's just HP's
name for iso-8859-1 or something then this is easy (mucking with
encodings.aliases).

If it's some custom encoding, finding out what unicode codepoint each
octet maps to and writing a codec a la macroman can't be impossibly
hard.

I guess a patch would be welcome either way.

Cheers,
mwh

--
Windows 2000: Smaller cow. Just as much crap.
-- Jim's pedigree of operating systems, asr

Jul 18 '05 #2

Christopher T King

On Wed, 21 Jul 2004, Richard Townsend wrote:

Using Python-2.3.4 on HP-UX11i, the following code:

import locale
loc = locale.setlocale(locale.LC_ALL)
print 'locale =', loc
loc = locale.nl_langinfo(locale.CODESET)
print 'locale =', loc
print 'hello'.encode(loc, 'replace')

produces:

locale = C C C C C C
locale = roman8
Traceback (most recent call last):
File "test_locale.py", line 13, in ?
print 'hello'.encode(loc, 'replace')
LookupError: unknown encoding: roman8

[The same code on SUSE 9.1 doesn't raise an exception].

My guess is roman8 is HP-UX's version of latin_1. Setting an alias fixes
this:

import encodings
encodings.aliases.aliases['roman8']='latin_1'
'hello'.encode('roman8')

'hello'

You can add those first two lines to a sitecustomize.py file, located
somewhere in your Python path (generally ~/site-packages/ or
/usr/local/lib/python2.X/ should work).

Jul 18 '05 #3

Michael Piotrowski

Christopher T King <sq******@WPI.EDU> writes:

My guess is roman8 is HP-UX's version of latin_1. Setting an alias fixes
this:

No, HP Roman-8 is *not* the same as Latin 1 (ISO 8859-1). You can
find a table at, e.g., <http://www.kostis.net/charsets/hproman8.htm>.

--
Michael Piotrowski, M.A. <mx*@dynalabs.de>
Public key at <http://www.dynalabs.de/mxp/pubkey.txt>

Jul 18 '05 #4

Christopher T King

On Wed, 21 Jul 2004, Michael Piotrowski wrote:

Christopher T King <sq******@WPI.EDU> writes:
My guess is roman8 is HP-UX's version of latin_1. Setting an alias fixes
this:

No, HP Roman-8 is *not* the same as Latin 1 (ISO 8859-1). You can
find a table at, e.g., <http://www.kostis.net/charsets/hproman8.htm>.

Well, the first 128 characters are the same. I'd say that's close enough,
right?
;)

Jul 18 '05 #5

Michael Piotrowski

Christopher T King <sq******@WPI.EDU> writes:

On Wed, 21 Jul 2004, Michael Piotrowski wrote:
Christopher T King <sq******@WPI.EDU> writes:
My guess is roman8 is HP-UX's version of latin_1. Setting an alias fixes
this:

No, HP Roman-8 is *not* the same as Latin 1 (ISO 8859-1). You can
find a table at, e.g., <http://www.kostis.net/charsets/hproman8.htm>.

Well, the first 128 characters are the same. I'd say that's close enough,
right?
;)

For Americans, perhaps ;-)

--
Michael Piotrowski, M.A. <mx*@dynalabs.de>
Public key at <http://www.dynalabs.de/mxp/pubkey.txt>

Jul 18 '05 #6

Richard Townsend

Hi Christopher,

Thanks for your suggestion, however it produces two problems for me.

1. If I execute the code in the interpreter, it still fails like this:

import encodings
encodings.aliases.aliases['roman8']='latin_1'
'hello'.encode('roman8')

Traceback (most recent call last):
File "<pyshell#3>", line 1, in -toplevel-
'hello'.encode('roman8')
LookupError: unknown encoding: roman8
2. If I put the code in site-packages/sitecustomize.py, it fails like
this:

capulet:home/richardt > python -v
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# /opt/python/lib/python2.3/site.pyc matches /opt/python/lib/python2.3/s
ite.py
import site # precompiled from /opt/python/lib/python2.3/site.pyc
# /opt/python/lib/python2.3/os.pyc matches /opt/python/lib/python2.3/os.
py
import os # precompiled from /opt/python/lib/python2.3/os.pyc
import posix # builtin
# /opt/python/lib/python2.3/posixpath.pyc matches /opt/python/lib/python
2.3/posixpath.py
import posixpath # precompiled from /opt/python/lib/python2.3/posixpath.
pyc
# /opt/python/lib/python2.3/stat.pyc matches /opt/python/lib/python2.3/s
tat.py
import stat # precompiled from /opt/python/lib/python2.3/stat.pyc
# /opt/python/lib/python2.3/UserDict.pyc matches /opt/python/lib/python2
..3/UserDict.py
import UserDict # precompiled from /opt/python/lib/python2.3/UserDict.py
c
# /opt/python/lib/python2.3/copy_reg.pyc matches /opt/python/lib/python2
..3/copy_reg.py
import copy_reg # precompiled from /opt/python/lib/python2.3/copy_reg.py
c
# /opt/python/lib/python2.3/types.pyc matches /opt/python/lib/python2.3/
types.py
import types # precompiled from /opt/python/lib/python2.3/types.pyc
# /opt/python/lib/python2.3/site-packages/sitecustomize.pyc matches
/opt/python/lib/python2.3/site-packages/sitecustomize.py
import sitecustomize # precompiled from /opt/python/lib/python2.3/site-
packages/sitecustomize.pyc
import encodings # directory /opt/python/lib/python2.3/encodings
# /opt/python/lib/python2.3/encodings/__init__.pyc matches /opt/python/l
ib/python2.3/encodings/__init__.py
import encodings # precompiled from /opt/python/lib/python2.3/encodings/
__init__.pyc
# /opt/python/lib/python2.3/codecs.pyc matches /opt/python/lib/python2.3
/codecs.py
import codecs # precompiled from /opt/python/lib/python2.3/codecs.pyc
import _codecs # builtin
'import site' failed; traceback:
Traceback (most recent call last):
File "/opt/python/lib/python2.3/site.py", line 355, in ?
import sitecustomize
File "/opt/python/lib/python2.3/site-packages/sitecustomize.py", line
7, in ?
encodings.aliases.aliases['roman8']='latin_1'
AttributeError: 'module' object has no attribute 'aliases'
# /opt/python/lib/python2.3/warnings.pyc matches /opt/python/lib/python2
..3/warnings.py
import warnings # precompiled from /opt/python/lib/python2.3/warnings.py
c
# /opt/python/lib/python2.3/linecache.pyc matches /opt/python/lib/python
2.3/linecache.py
import linecache # precompiled from /opt/python/lib/python2.3/linecache.
pyc
# /opt/python/lib/python2.3/encodings/aliases.pyc matches /opt/python/li
b/python2.3/encodings/aliases.py
import encodings.aliases # precompiled from /opt/python/lib/python2.3/en
codings/aliases.pyc
Python 2.3.4 (#3, May 28 2004, 13:24:19) [C] on hp-ux11
Type "help", "copyright", "credits" or "license" for more information.

--
Richard Townsend

Jul 18 '05 #7

Richard Townsend

Further, if I put the following in sitecustomize.py:

import encodings
print dir(encodings)

I get:

['CodecRegistryError', '__builtins__', '__doc__', '__file__',
'__name__', '__path__', '_cache', '_import_tail',
'_norm_encoding_map', '_unknown', 'codecs', 'exceptions',
'normalize_encoding', 'search_function', 'types']

Notice there is no 'aliases' attribute.

But if I then run this interactively:

import encodings
dir(encodings)

['CodecRegistryError', '__builtins__', '__doc__', '__file__',
'__name__', '__path__', '_cache', '_import_tail',
'_norm_encoding_map', '_unknown', 'aliases', 'codecs', 'exceptions',
'normalize_encoding', 'search_function', 'types']

then the 'aliases' attribute is there.
--
Richard Townsend

Jul 18 '05 #8

Christopher T King

On Thu, 22 Jul 2004, Richard Townsend wrote:

Further, if I put the following in sitecustomize.py:

import encodings
print dir(encodings)

I get:

['CodecRegistryError', '__builtins__', '__doc__', '__file__',
'__name__', '__path__', '_cache', '_import_tail',
'_norm_encoding_map', '_unknown', 'codecs', 'exceptions',
'normalize_encoding', 'search_function', 'types']

Notice there is no 'aliases' attribute.

Oops, I had only tested it at the prompt :P I had assumed sitecustomize.py
was run after everything was set up.

This code uses a more defined interface for altering the codecs registry,
uses 'ascii' instead of 'latin_1' (to prevent some confusion), and
I've actually tested it in sitecustomize.py:

import codecs

def roman8(n):
if n=='roman8':
return codecs.lookup('ascii')

codecs.register(roman8)

This achieves the same effect as the aliases trick (which I'm surprised
didn't work for you at the prompt), but is less tricksy and should
therefore work better.

Jul 18 '05 #9

Richard Brodie

"Christopher T King" <sq******@WPI.EDU> wrote in message
news:Pi**************************************@ccc2 .wpi.edu...

This achieves the same effect as the aliases trick (which I'm surprised
didn't work for you at the prompt), but is less tricksy and should
therefore work better.

I've put what should be a proper codec as a patch on SF. Really needs
testing on HP-UX though....

Jul 18 '05 #10

Richard Townsend

In article <Pi**************************************@ccc2.wpi .edu>,
Christopher T King <sq******@WPI.EDU> writes

Oops, I had only tested it at the prompt :P I had assumed sitecustomize.py
was run after everything was set up.

This code uses a more defined interface for altering the codecs registry,
uses 'ascii' instead of 'latin_1' (to prevent some confusion), and
I've actually tested it in sitecustomize.py:

import codecs

def roman8(n):
if n=='roman8':
return codecs.lookup('ascii')

codecs.register(roman8)

This achieves the same effect as the aliases trick (which I'm surprised
didn't work for you at the prompt), but is less tricksy and should
therefore work better.

Hi Christopher,

Thanks for your new suggestion. I have tested it on HP-UX and it doesn't
raise the exception anymore.

regards,
Richard

--
Richard Townsend

Jul 18 '05 #11

Richard Townsend

In article <cd********@newton.cc.rl.ac.uk>, Richard Brodie
<R.******@rl.ac.uk> writes

I've put what should be a proper codec as a patch on SF. Really needs
testing on HP-UX though....

Hi Richard,

I copied your hp_roman.py file to ../lib/python2.3/encodings and added
the line

'roman8' : 'hp_roman'

to aliases.py and string.encode('roman8') now runs without raising an
exception.

I called string.printable.encode('roman8') and the returned string
matches string.printable.

Are there any other tests you want me to do with this on HP-UX?

--
Richard Townsend

Jul 18 '05 #12

Martin v. Löwis

Richard Townsend wrote:

Should I be able to pass the value returned by nl_langinfo() to the
string.encode call?

I believe all of "yes", "no", and "perhaps not" are valid answers. Yes,
it is intentional that the strings returned by nl_langinfo are
understood as codec names. However, the string is returned from the OS,
and the codec is provided by Python, so it is perhaps not accepted.

But no, you should never ever invoce string.encode with a character
encoding. Instead, you should use string.decode to use encodings in
a meaningful way. It is an unfortunate "feature" that string.encode
is available and does "something".

Regards,
Martin

Jul 18 '05 #13

Similar topics

encode a number in a javascript string

by: Peter | last post by:

Hi, I try to make up a javascript string which contains numeric numbers in any positions. For example, I want to make a string: secretcode, where secretcode.charAt(0)==(-21),...

Javascript

format a text string???

by: charlie_M | last post by:

I have the following code: <script type=text/javascript> function hide_tooltip(){ var hp = document.getElementById("tooltipper"); hp.style.left=0; hp.style.top=0; hp.style.width=1;...

Javascript

is this a unicode/string bug?

by: olsongt | last post by:

I was going to submit to sourceforge, but my unicode skills are weak. I was trying to strip characters from a string that contained values outside of ASCII. I though I could just encode as 'ascii'...

Python

decode unicode string using 'unicode_escape' codecs

by: aurora | last post by:

I have some unicode string with some characters encode using python notation like '\n' for LF. I need to convert that to the actual LF character. There is a 'unicode_escape' codec that seems to...

Python

Convert Active Directory Object to string

by: Dirk Hagemann | last post by:

Hi! When I receive data from Microsoft Active Directory it is an "ad_object" and has the type unicode. When I try to convert it to a string I get this error: UnicodeEncodeError: 'ascii' codec...

Python

Can I get the 8bit-string representation of any unicode string

by: wanghz | last post by:

Hello, everyone. I have a problem when I'm processing unicode strings. Is it possible to get the 8bit-string representation of any unicode string? Suppose I get a unicode string: a =...

Python

usage of <string>.encode('utf-8','xmlcharrefreplace')?

by: J Peyret | last post by:

Well, as usual I am confused by unicode encoding errors. I have a string with problematic characters in it which I'd like to put into a postgresql table. That results in a postgresql error so I...

Python

How to print first(national) char from unicode string encoded inutf-8?

by: sniipe | last post by:

Hi, I have a problem with unicode string in Pylons templates(Mako). I will print first char from my string encoded in UTF-8 and urllib.quote(), for example string '£ukasz': ...

Python

str() should convert ANY object to a string without EXCEPTIONS !

by: est | last post by:

From python manual str( ) Return a string containing a nicely printable representation of an object. For strings, this returns the string itself. The difference with repr(object) is that...

Python

getline for string will not work

by: erictheone | last post by:

so here is my code. My getlines for the strings keyword and phrase at lines 44 and 79 respectively don't work. Please help!!! #include <cstdlib> #include <string> #include <iostream> #include...

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice