472,958 Members | 2,148 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,958 software developers and data experts.

How to get Python to default to UTF8

I'm developing a cgi-bin application that must be unicode sensitive. I'm
striving for a UTF8 implementation. I'm running python 2.3 on a development
machine (windows xp) and a server (windows xp server). Both environments are
running Apache 2.2 with the same configuration file.

The problem is this. On my development machine I get the following unicode
error:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 4-6: invalid
data
args = ('utf8', 'adem\xe3\xa1s', 4, 7, 'invalid data')
encoding = 'utf8'
end = 7
object = 'adem\xe3\xa1s'
reason = 'invalid data'
start = 4
On my server, running exactly the same python code, I see the following
unicode error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 4:
ordinal not in range(128)
args = ('ascii', 'adem\xe3\xa1s', 4, 5, 'ordinal not in range(128)')
encoding = 'ascii'
end = 5
object = 'adem\xe3\xa1s'
reason = 'ordinal not in range(128)'
start = 4

Note the differences in the encoding -- on the development machine it's utf8
but on the server it's ascii.

I was under the impression that Python assumed ascii encoding by default.
I'm wondering how did my development machine get to be utf8? And since my
python code is the same on both machines, what is it about my configuration
that could be causing a difference in default encoding? I checked site.py on
both machines and both files default to ASCII, so I assume it's something
else.

Thanks in advance.
Dec 22 '07 #1
4 4437
weheh wrote:
I'm developing a cgi-bin application that must be unicode sensitive. I'm
striving for a UTF8 implementation. I'm running python 2.3 on a development
machine (windows xp) and a server (windows xp server). Both environments are
running Apache 2.2 with the same configuration file.

The problem is this. On my development machine I get the following unicode
error:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 4-6: invalid
data
args = ('utf8', 'adem\xe3\xa1s', 4, 7, 'invalid data')
encoding = 'utf8'
end = 7
object = 'adem\xe3\xa1s'
reason = 'invalid data'
start = 4
Could be that sys.stdin.encoding differs between the setups.

*Where* do you get this exception? In the database layer? When the
script is trying to read things from a file? When it's trying to output
things? Somewhere else?

</F>

Dec 22 '07 #2
Hi Fredrik,

Thanks again for your feedback. I am much obliged.

Indeed, I am forced to be exteremely rigorous about decoding on the way in
and encoding on the way out everywhere in my program, just as you say. Your
advice is excellent and concurs with other sources of unicode expertise.
Following this approach is the only thing that has made it possible for me
to get my program to work.

However, the situation is still unacceptable to me because I often make
mistakes and it is easy for me to miss places where encoding is necessary. I
rely on testing to find my faults. On my development environment, I get no
error message and it seems that everything works perfectly. However, once
ported to the server, I see a crash. But this is too late a stage to catch
the error since the app is already live.

I assume that the default encoding that you mention shouldn't ever be
changed is stored in the site.py file. I've checked this file and it's set
to ascii in both machines (development and server). I haven't touched
site.py. However, a week or so ago, following the advice of someone I read
on the web, I did create a file in my cgi-bin directory called something
like site-config.py, wherein encoding was set to utf8. I ran my program a
few times, but then reading elsewhere that the site-config.py approach was
outmoded, I decided to remove this file. I'm wondering whether it made a
permanent change somewhere in the bowels of python while I wasn't looking?

Can you elaborate on where to look to see what stdin/stdout encodings are
set to? All inputs are coming at my app either via html forms or input
files. All output goes either to the browser via html or to an output file.

>
to fix this, figure out from where you got the encoded (8-bit) string, and
make sure you decode it properly on the way in. only use Unicode strings
on the "inside".

(Python does have two encoding defaults; there's a default encoding that
*shouldn't* ever be changed from the "ascii" default, and there's also a
stdin/stdout encoding that's correctly set if you run the code in an
ordinary terminal window. if you get your data from anywhere else, you
cannot trust any of these, so you should do your own decoding on the way
in, and encoding things on the way out).

</F>

Dec 23 '07 #3
However, the situation is still unacceptable to me because I often make
mistakes and it is easy for me to miss places where encoding is necessary. I
rely on testing to find my faults. On my development environment, I get no
error message and it seems that everything works perfectly. However, once
ported to the server, I see a crash. But this is too late a stage to catch
the error since the app is already live.
If you want to check whether there is indeed no place where you forgot
to properly .encode, you can set the default encoding on your
development machine to "undefined" (see site.py). This will give you an
exception whenever the default encoding is invoked, even if the encoding
would have succeeded under the default default encoding (ie. "ascii")

Such a setting should not be applied a production environment.
Can you elaborate on where to look to see what stdin/stdout encodings are
set to?
Just print out sys.stdin.encoding and sys.stdout.encoding. Or were you
asking for the precise source in the interpreter that sets them?
All inputs are coming at my app either via html forms or input
files. All output goes either to the browser via html or to an output file.
Then sys.stdout.encoding will not be set to anything.

Regards,
Martin
Dec 23 '07 #4
weheh wrote:
Hi Fredrik,

Thanks again for your feedback. I am much obliged.
Bear in mind that in Python, ASCII currently means ASCII, values
0..127. Type "str" will accept values 127. However, the default
conversion from "str" to "unicode" requires true ASCII values, in
0..127. So if you take in data from some source which might have
a byte value 127, the default conversion to Unicode won't work.

There are conversion functions for specifying the meaning of
values 128..255, (the input might be "latin1" encoding, for
example), or ignoring unexpected characters, or converting them
to "?".

John Nagle
Dec 24 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: gabor | last post by:
hi, today i made some tests... i tested some unicode symbols, that are above the 16bit limit (gothic:http://www.unicode.org/charts/PDF/U10330.pdf) .. i played around with iconv and so on,...
0
by: Nobody | last post by:
I have an application that processes MIME messages. It reads a message from a file, looks for a text/html and text/plain parts in it, performs some processing on these parts, and outputs the new...
9
by: thijs.braem | last post by:
Hi everyone, I'm having quite some troubles trying to convert Unicode to String (for use in psycopg, which apparently doesn't know how to cope with unicode strings). The error I keep having...
20
by: weheh | last post by:
Dear web gods: After much, much, much struggle with unicode, many an hour reading all the examples online, coding them, testing them, ripping them apart and putting them back together, I am...
0
by: damonwischik | last post by:
I use emacs 22 and python-mode. Emacs can display utf8 characters (e.g. when I open a utf8-encoded file with Chinese, those characters show up fine), and I'd like to see utf8-encoded output from my...
3
by: dmitrey | last post by:
hi all, what's the best way to write Python dictionary to a file? (and then read) There could be unicode field names and values encountered. Thank you in advance, D.
3
by: kettle | last post by:
Hi, I was wondering how I ought to be handling character range translations in python. What I want to do is translate fullwidth numbers and roman alphabet characters into their halfwidth ascii...
6
by: ogtheterror | last post by:
Hi I have a very limited understanding of Python and have given this the best shot i have but still have not been able to get it working. Is there anyone that knows how to get this into a .net...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.