By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,797 Members | 1,169 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,797 IT Pros & Developers. It's quick & easy.

system(...) and unicode

P: n/a
Hi,

I'm seeing the following error:

...
system(cmd)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 57: ordinal not in range(128)

and I think I vaguely understand what's going on - "cmd" is constructed
to include a file name that is UTF-8 encoded (I think - it includes
accents when I "ls" the file - this is on a recent Suse Linux with
Python 2.4.2). So I guess I need to specify the encoding used, right?
But (1) I don't know how to do this; (2) this string came from the
filesystem in the first place, so how come it isn't managed in an
internally consistent way?; and (3) I have no explicit uncode strings
in my program.

Looking at the docs (unicode howto) it seems like maybe I need to do
system(cmd.encode(...))
but how do I know which locale and what if cmd isn't a unicode string
(I didn't make it so!)? I could force an encoding as in the unicode
howto ("filename.decode(encoding)"), but that seems to already be
happening (or is it not - am I wrong in assuming that?).

So can someone help me or point me to some more detailed instructions,
please? At the CL "locale" says en_GB.UTF-8, but I'd like this code to
work whatever the locale is, if that makes sense.

Sorry for being stupid,
Andrew

May 22 '06 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Hmmm. After reading
http://kofoto.rosdahl.net/trac/wiki/UnicodeInPython I tried:

system(cmd.encode(getfilesystemencoding()))

which works (nothing else changed). But that seems odd - is this a bug
(the asymmetry - I read files with os.listdir with no explicit unicode
handling, but need to do something explicitly on output - seems wrong),
or am I going to be bitten by other errors later?

Thanks,
Andrew

May 22 '06 #2

P: n/a
an****@acooke.org wrote:
Hmmm. After reading
http://kofoto.rosdahl.net/trac/wiki/UnicodeInPython I tried:

system(cmd.encode(getfilesystemencoding()))

which works (nothing else changed). But that seems odd - is this a bug
(the asymmetry - I read files with os.listdir with no explicit unicode
handling, but need to do something explicitly on output - seems wrong),
or am I going to be bitten by other errors later?


Whether or not listdir returns a Unicode string depends on whether you
pass a Unicode string as the directory name. So if you change the
directory name to be a byte string, the file name should be a byte
string, too.

And yes, it would be desirable to enhance system() to support Unicode
strings; contributions in that direction are welcome (although one
should then also support exec*(), spawn*(), popen*(), and the subprocess
module).

Regards,
Martin
May 22 '06 #3

P: n/a
The impression I got from the link I gave was that exec et al already
had the appropriate unicode support; system seems to be the exception.

Anyway, thanks for the info - that directory name is coming from a DOM
call, and I'm pretty sure it's returning Unicode, so that makes sense.

Andrew

May 22 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.