473,396 Members | 1,938 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

string.encode on HP-UX

Using Python-2.3.4 on HP-UX11i, the following code:

import locale
loc = locale.setlocale(locale.LC_ALL)
print 'locale =', loc
loc = locale.nl_langinfo(locale.CODESET)
print 'locale =', loc
print 'hello'.encode(loc, 'replace')

produces:

locale = C C C C C C
locale = roman8
Traceback (most recent call last):
File "test_locale.py", line 13, in ?
print 'hello'.encode(loc, 'replace')
LookupError: unknown encoding: roman8
[The same code on SUSE 9.1 doesn't raise an exception].

Should I be able to pass the value returned by nl_langinfo() to the
string.encode call?

Similar code is used by wxGlade and this exception prevents it from
running.

Does anybody know how to fix this on HP-UX?

--
Richard Townsend
Jul 18 '05 #1
12 2658
Richard Townsend <ri******@edshk.demon.co.uk> writes:
Using Python-2.3.4 on HP-UX11i, the following code:

import locale
loc = locale.setlocale(locale.LC_ALL)
print 'locale =', loc
loc = locale.nl_langinfo(locale.CODESET)
print 'locale =', loc
print 'hello'.encode(loc, 'replace')

produces:

locale = C C C C C C
locale = roman8
Traceback (most recent call last):
File "test_locale.py", line 13, in ?
print 'hello'.encode(loc, 'replace')
LookupError: unknown encoding: roman8
[The same code on SUSE 9.1 doesn't raise an exception].

Should I be able to pass the value returned by nl_langinfo() to the
string.encode call?


What is roman8? If it's some hp-ux specific thingy, I guess the
solution is to teach Python what to do with it. If it's just HP's
name for iso-8859-1 or something then this is easy (mucking with
encodings.aliases).

If it's some custom encoding, finding out what unicode codepoint each
octet maps to and writing a codec a la macroman can't be impossibly
hard.

I guess a patch would be welcome either way.

Cheers,
mwh

--
Windows 2000: Smaller cow. Just as much crap.
-- Jim's pedigree of operating systems, asr
Jul 18 '05 #2
On Wed, 21 Jul 2004, Richard Townsend wrote:
Using Python-2.3.4 on HP-UX11i, the following code:

import locale
loc = locale.setlocale(locale.LC_ALL)
print 'locale =', loc
loc = locale.nl_langinfo(locale.CODESET)
print 'locale =', loc
print 'hello'.encode(loc, 'replace')

produces:

locale = C C C C C C
locale = roman8
Traceback (most recent call last):
File "test_locale.py", line 13, in ?
print 'hello'.encode(loc, 'replace')
LookupError: unknown encoding: roman8

[The same code on SUSE 9.1 doesn't raise an exception].


My guess is roman8 is HP-UX's version of latin_1. Setting an alias fixes
this:
import encodings
encodings.aliases.aliases['roman8']='latin_1'
'hello'.encode('roman8')

'hello'

You can add those first two lines to a sitecustomize.py file, located
somewhere in your Python path (generally ~/site-packages/ or
/usr/local/lib/python2.X/ should work).

Jul 18 '05 #3
Christopher T King <sq******@WPI.EDU> writes:
My guess is roman8 is HP-UX's version of latin_1. Setting an alias fixes
this:


No, HP Roman-8 is *not* the same as Latin 1 (ISO 8859-1). You can
find a table at, e.g., <http://www.kostis.net/charsets/hproman8.htm>.

--
Michael Piotrowski, M.A. <mx*@dynalabs.de>
Public key at <http://www.dynalabs.de/mxp/pubkey.txt>
Jul 18 '05 #4
On Wed, 21 Jul 2004, Michael Piotrowski wrote:
Christopher T King <sq******@WPI.EDU> writes:
My guess is roman8 is HP-UX's version of latin_1. Setting an alias fixes
this:


No, HP Roman-8 is *not* the same as Latin 1 (ISO 8859-1). You can
find a table at, e.g., <http://www.kostis.net/charsets/hproman8.htm>.


Well, the first 128 characters are the same. I'd say that's close enough,
right?
;)

Jul 18 '05 #5
Christopher T King <sq******@WPI.EDU> writes:
On Wed, 21 Jul 2004, Michael Piotrowski wrote:
Christopher T King <sq******@WPI.EDU> writes:
My guess is roman8 is HP-UX's version of latin_1. Setting an alias fixes
this:


No, HP Roman-8 is *not* the same as Latin 1 (ISO 8859-1). You can
find a table at, e.g., <http://www.kostis.net/charsets/hproman8.htm>.


Well, the first 128 characters are the same. I'd say that's close enough,
right?
;)


For Americans, perhaps ;-)

--
Michael Piotrowski, M.A. <mx*@dynalabs.de>
Public key at <http://www.dynalabs.de/mxp/pubkey.txt>
Jul 18 '05 #6
Hi Christopher,

Thanks for your suggestion, however it produces two problems for me.

1. If I execute the code in the interpreter, it still fails like this:
import encodings
encodings.aliases.aliases['roman8']='latin_1'
'hello'.encode('roman8')


Traceback (most recent call last):
File "<pyshell#3>", line 1, in -toplevel-
'hello'.encode('roman8')
LookupError: unknown encoding: roman8
2. If I put the code in site-packages/sitecustomize.py, it fails like
this:

capulet:home/richardt > python -v
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# /opt/python/lib/python2.3/site.pyc matches /opt/python/lib/python2.3/s
ite.py
import site # precompiled from /opt/python/lib/python2.3/site.pyc
# /opt/python/lib/python2.3/os.pyc matches /opt/python/lib/python2.3/os.
py
import os # precompiled from /opt/python/lib/python2.3/os.pyc
import posix # builtin
# /opt/python/lib/python2.3/posixpath.pyc matches /opt/python/lib/python
2.3/posixpath.py
import posixpath # precompiled from /opt/python/lib/python2.3/posixpath.
pyc
# /opt/python/lib/python2.3/stat.pyc matches /opt/python/lib/python2.3/s
tat.py
import stat # precompiled from /opt/python/lib/python2.3/stat.pyc
# /opt/python/lib/python2.3/UserDict.pyc matches /opt/python/lib/python2
..3/UserDict.py
import UserDict # precompiled from /opt/python/lib/python2.3/UserDict.py
c
# /opt/python/lib/python2.3/copy_reg.pyc matches /opt/python/lib/python2
..3/copy_reg.py
import copy_reg # precompiled from /opt/python/lib/python2.3/copy_reg.py
c
# /opt/python/lib/python2.3/types.pyc matches /opt/python/lib/python2.3/
types.py
import types # precompiled from /opt/python/lib/python2.3/types.pyc
# /opt/python/lib/python2.3/site-packages/sitecustomize.pyc matches
/opt/python/lib/python2.3/site-packages/sitecustomize.py
import sitecustomize # precompiled from /opt/python/lib/python2.3/site-
packages/sitecustomize.pyc
import encodings # directory /opt/python/lib/python2.3/encodings
# /opt/python/lib/python2.3/encodings/__init__.pyc matches /opt/python/l
ib/python2.3/encodings/__init__.py
import encodings # precompiled from /opt/python/lib/python2.3/encodings/
__init__.pyc
# /opt/python/lib/python2.3/codecs.pyc matches /opt/python/lib/python2.3
/codecs.py
import codecs # precompiled from /opt/python/lib/python2.3/codecs.pyc
import _codecs # builtin
'import site' failed; traceback:
Traceback (most recent call last):
File "/opt/python/lib/python2.3/site.py", line 355, in ?
import sitecustomize
File "/opt/python/lib/python2.3/site-packages/sitecustomize.py", line
7, in ?
encodings.aliases.aliases['roman8']='latin_1'
AttributeError: 'module' object has no attribute 'aliases'
# /opt/python/lib/python2.3/warnings.pyc matches /opt/python/lib/python2
..3/warnings.py
import warnings # precompiled from /opt/python/lib/python2.3/warnings.py
c
# /opt/python/lib/python2.3/linecache.pyc matches /opt/python/lib/python
2.3/linecache.py
import linecache # precompiled from /opt/python/lib/python2.3/linecache.
pyc
# /opt/python/lib/python2.3/encodings/aliases.pyc matches /opt/python/li
b/python2.3/encodings/aliases.py
import encodings.aliases # precompiled from /opt/python/lib/python2.3/en
codings/aliases.pyc
Python 2.3.4 (#3, May 28 2004, 13:24:19) [C] on hp-ux11
Type "help", "copyright", "credits" or "license" for more information.

--
Richard Townsend
Jul 18 '05 #7
Further, if I put the following in sitecustomize.py:

import encodings
print dir(encodings)

I get:

['CodecRegistryError', '__builtins__', '__doc__', '__file__',
'__name__', '__path__', '_cache', '_import_tail',
'_norm_encoding_map', '_unknown', 'codecs', 'exceptions',
'normalize_encoding', 'search_function', 'types']

Notice there is no 'aliases' attribute.

But if I then run this interactively:
import encodings
dir(encodings)

['CodecRegistryError', '__builtins__', '__doc__', '__file__',
'__name__', '__path__', '_cache', '_import_tail',
'_norm_encoding_map', '_unknown', 'aliases', 'codecs', 'exceptions',
'normalize_encoding', 'search_function', 'types']

then the 'aliases' attribute is there.
--
Richard Townsend
Jul 18 '05 #8
On Thu, 22 Jul 2004, Richard Townsend wrote:
Further, if I put the following in sitecustomize.py:

import encodings
print dir(encodings)

I get:

['CodecRegistryError', '__builtins__', '__doc__', '__file__',
'__name__', '__path__', '_cache', '_import_tail',
'_norm_encoding_map', '_unknown', 'codecs', 'exceptions',
'normalize_encoding', 'search_function', 'types']

Notice there is no 'aliases' attribute.


Oops, I had only tested it at the prompt :P I had assumed sitecustomize.py
was run after everything was set up.

This code uses a more defined interface for altering the codecs registry,
uses 'ascii' instead of 'latin_1' (to prevent some confusion), and
I've actually tested it in sitecustomize.py:

import codecs

def roman8(n):
if n=='roman8':
return codecs.lookup('ascii')

codecs.register(roman8)

This achieves the same effect as the aliases trick (which I'm surprised
didn't work for you at the prompt), but is less tricksy and should
therefore work better.

Jul 18 '05 #9

"Christopher T King" <sq******@WPI.EDU> wrote in message
news:Pi**************************************@ccc2 .wpi.edu...
This achieves the same effect as the aliases trick (which I'm surprised
didn't work for you at the prompt), but is less tricksy and should
therefore work better.


I've put what should be a proper codec as a patch on SF. Really needs
testing on HP-UX though....
Jul 18 '05 #10
In article <Pi**************************************@ccc2.wpi .edu>,
Christopher T King <sq******@WPI.EDU> writes
Oops, I had only tested it at the prompt :P I had assumed sitecustomize.py
was run after everything was set up.

This code uses a more defined interface for altering the codecs registry,
uses 'ascii' instead of 'latin_1' (to prevent some confusion), and
I've actually tested it in sitecustomize.py:

import codecs

def roman8(n):
if n=='roman8':
return codecs.lookup('ascii')

codecs.register(roman8)

This achieves the same effect as the aliases trick (which I'm surprised
didn't work for you at the prompt), but is less tricksy and should
therefore work better.


Hi Christopher,

Thanks for your new suggestion. I have tested it on HP-UX and it doesn't
raise the exception anymore.

regards,
Richard

--
Richard Townsend
Jul 18 '05 #11
In article <cd********@newton.cc.rl.ac.uk>, Richard Brodie
<R.******@rl.ac.uk> writes


I've put what should be a proper codec as a patch on SF. Really needs
testing on HP-UX though....


Hi Richard,

I copied your hp_roman.py file to ../lib/python2.3/encodings and added
the line

'roman8' : 'hp_roman'

to aliases.py and string.encode('roman8') now runs without raising an
exception.

I called string.printable.encode('roman8') and the returned string
matches string.printable.

Are there any other tests you want me to do with this on HP-UX?

--
Richard Townsend
Jul 18 '05 #12
Richard Townsend wrote:
Should I be able to pass the value returned by nl_langinfo() to the
string.encode call?


I believe all of "yes", "no", and "perhaps not" are valid answers. Yes,
it is intentional that the strings returned by nl_langinfo are
understood as codec names. However, the string is returned from the OS,
and the codec is provided by Python, so it is perhaps not accepted.

But no, you should never ever invoce string.encode with a character
encoding. Instead, you should use string.decode to use encodings in
a meaningful way. It is an unfortunate "feature" that string.encode
is available and does "something".

Regards,
Martin
Jul 18 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Peter | last post by:
Hi, I try to make up a javascript string which contains numeric numbers in any positions. For example, I want to make a string: secretcode, where secretcode.charAt(0)==(-21),...
2
by: charlie_M | last post by:
I have the following code: <script type=text/javascript> function hide_tooltip(){ var hp = document.getElementById("tooltipper"); hp.style.left=0; hp.style.top=0; hp.style.width=1;...
1
by: olsongt | last post by:
I was going to submit to sourceforge, but my unicode skills are weak. I was trying to strip characters from a string that contained values outside of ASCII. I though I could just encode as 'ascii'...
2
by: aurora | last post by:
I have some unicode string with some characters encode using python notation like '\n' for LF. I need to convert that to the actual LF character. There is a 'unicode_escape' codec that seems to...
4
by: Dirk Hagemann | last post by:
Hi! When I receive data from Microsoft Active Directory it is an "ad_object" and has the type unicode. When I try to convert it to a string I get this error: UnicodeEncodeError: 'ascii' codec...
5
by: wanghz | last post by:
Hello, everyone. I have a problem when I'm processing unicode strings. Is it possible to get the 8bit-string representation of any unicode string? Suppose I get a unicode string: a =...
4
by: J Peyret | last post by:
Well, as usual I am confused by unicode encoding errors. I have a string with problematic characters in it which I'd like to put into a postgresql table. That results in a postgresql error so I...
5
by: sniipe | last post by:
Hi, I have a problem with unicode string in Pylons templates(Mako). I will print first char from my string encoded in UTF-8 and urllib.quote(), for example string '£ukasz': ...
19
by: est | last post by:
From python manual str( ) Return a string containing a nicely printable representation of an object. For strings, this returns the string itself. The difference with repr(object) is that...
5
by: erictheone | last post by:
so here is my code. My getlines for the strings keyword and phrase at lines 44 and 79 respectively don't work. Please help!!! #include <cstdlib> #include <string> #include <iostream> #include...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.