472,952 Members | 2,137 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,952 software developers and data experts.

How to display unicode with the CGI module?

Hi!

I am using the built-in Python web server (CGIHTTPServer) to serve
pages via CGI.
The problem I am having is that I get an error while trying to display
Unicode UTF-8 characters via a Python CGI script.

The error goes like this: "UnicodeEncodeError: 'ascii' codec can't
encode character u'\u026a' in position 12: ordinal not in range(128)".

My question is: (1 ) how and (2) where do I set the encoding for the
page?

I have tried adding <meta http-equiv="content-type" content="text/
html; charset=utf-8" /but this does not seem to help, as this is an
instruction for the browser, not for the webserver and/or CGI script.

Do I have to set the encoding in the server script? On in the Python
CGI script?

The data that I want to display comes from a sqlite3 database and is
already in Unicode format.

The webserver script looks like this:

Expand|Select|Wrap|Line Numbers
  1. #
  2. import CGIHTTPServer, BaseHTTPServer
  3. httpd=BaseHTTPServer.HTTPServer(('',8080),
  4. CGIHTTPServer.CGIHTTPRequestHandler)
  5. httpd.serve_forever()
  6. #
  7.  
A simplified version of my Python CGI script would be:
Expand|Select|Wrap|Line Numbers
  1. import cgi
  2.  
  3. print "text/html"
  4. print
  5.  
  6. print "<html>"
  7. print " <body>"
  8. print   "my UTF8 string: Français æ—¥æœ¬èªž Español Português Română"
  9. print " </body>"
  10. print "</html>"
  11.  
  12.  
Where and what do I need to add to these scripts to get proper display
of UTF8 content?
Nov 25 '07 #1
6 4500
On Sat, 24 Nov 2007 15:58:56 -0800, coldpizza wrote:
The problem I am having is that I get an error while trying to display
Unicode UTF-8 characters via a Python CGI script.

The error goes like this: "UnicodeEncodeError: 'ascii' codec can't
encode character u'\u026a' in position 12: ordinal not in range(128)".
Unicode != UTF-8. You are not trying to send an UTF-8 encoded byte string
but an *unicode string*. That's not possible. If unicode strings should
"leave" your program they must be encoded into byte strings. If you don't
do this explicitly Python tries to encode as ASCII and fails if there's
anything non-ASCII in the string. The `encode()` method is your friend.

Ciao,
Marc 'BlackJack' Rintsch
Nov 25 '07 #2
Marc 'BlackJack' Rintsch schrieb:
On Sat, 24 Nov 2007 15:58:56 -0800, coldpizza wrote:
>The problem I am having is that I get an error while trying to display
Unicode UTF-8 characters via a Python CGI script.

The error goes like this: "UnicodeEncodeError: 'ascii' codec can't
encode character u'\u026a' in position 12: ordinal not in range(128)".

Unicode != UTF-8. You are not trying to send an UTF-8 encoded byte string
but an *unicode string*.
Just to expand on this... It helps thinking of "unicode objects" and
"strings" as seperate types (which they are). So there is no such thing
like "unicode string" and you always need to think about when to
encode() your unicode objects. However, this will change in py3k...,
what's the new rule of thumb?

cheers
Paul

Nov 25 '07 #3
Unicode != UTF-8.
....
>`encode()` method is your friend.
Thanks a lot for help!

I am always confused as to which one to use: encode() or decode(); I
have initially tried decode() and it did not work.

It is funny that encode() and decode() omit the name of the other
encoding (Unicode ucs2?), which makes it far less readable than a
s.recode('ucs2','utf8').

Another wierd thing is that by default Python converts internal
Unicode to ascii. Will it be the same in Py3k? string*.
Just to expand on this... It helps thinking of "unicode objects" and
"strings" as seperate types (which they are). So there is no such thing
like "unicode string" and you always need to think about when to
encode() your unicode objects. However, this will change in py3k...,
what's the new rule of thumb?

cheers
Paul
Nov 25 '07 #4
Op Sun, 25 Nov 2007 13:02:26 -0800, schreef coldpizza:
It is funny that encode() and decode() omit the name of the other
encoding (Unicode ucs2?), which makes it far less readable than a
s.recode('ucs2','utf8').
The internal encoding/representation of a "string" of Unicode characters
is considered an implementation detail and is in fact not always the same
(e.g. a cpython build parameter selects UCS2 or UCS4, and it might be
something else in other implementations).

See the 'Py_UNICODE' paragraph in:
<http://docs.python.org/api/unicodeObjects.html>
--
JanC
Nov 26 '07 #5
paul wrote:
However, this will change in py3k...,
what's the new rule of thumb?
In py3k, the str type will be what unicode is now, and there
will be a new type called bytes for holding binary data --
including text in some external encoding. These two types
will not be compatible.

At the lowest level, reading a file will return bytes, which
then have to be decoded to produce a (unicode) str, and a str
will have to be encoded into bytes before being written to a
file.

There will be wrappers for text files that perform the
decoding and encoding automatically, but they will need to
be set up to use a specified encoding if you're dealing
with anything other than ascii. (It may be possible to
set up a system-wide default, I'm not sure.)

So you won't be able to get away with ignoring encoding
issues in py3k. On the plus side, it should all be handled
in a much more consistent and less error-prone way. If
you mistakenly try to use encoded data as though it were
decoded data or vice versa, you'll get a type error.

--
Greg
Nov 26 '07 #6
greg schrieb:
paul wrote:
>However, this will change in py3k...,
what's the new rule of thumb?
[snipp]
So you won't be able to get away with ignoring encoding
issues in py3k. On the plus side, it should all be handled
in a much more consistent and less error-prone way. If
you mistakenly try to use encoded data as though it were
decoded data or vice versa, you'll get a type error.
Thanks for your detailed answer. In fact, having encode() only for <str>
and decode() for <bytewill simplify things a lot. I guess implicit
encode() of <strwhen using print() will stay but having utf-8 as the
new default encoding will reduce the number of UnicodeError. You'll get
weird characters instead ;)

cheers
Paul

Nov 26 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: David Opstad | last post by:
Hi, all! I'm relatively new to Python, but have definitely fallen in love with it. It reminds me of Mesa (old Xerox development language) and LISP a bit. Anyway, on to the question. Now that...
6
by: Chris | last post by:
hi, to convert excel files via csv to xml or whatever I frequently use the csv module which is really nice for quick scripts. problem are of course non ascii characters like german umlauts, EURO...
19
by: Svennglenn | last post by:
I'm working on a program that is supposed to save different information to text files. Because the program is in swedish i have to use unicode text for ÅÄÖ letters. When I run the following...
10
by: Larry Hastings | last post by:
I'm an indie shareware Windows game developer. In indie shareware game development, download size is terribly important; conventional wisdom holds that--even today--your download should be 5MB or...
8
by: Richard Schulman | last post by:
Sorry to be back at the goodly well so soon, but... ....when I execute the following -- variable mean_eng_txt being utf-16LE and its datatype nvarchar2(79) in Oracle: cursor.execute("""INSERT...
2
by: John Nagle | last post by:
Here's a strange little bug. "socket.getaddrinfo" blows up if given a bad domain name containing ".." in Unicode. The same string in ASCII produces the correct "gaierror" exception. Actually,...
13
by: mario | last post by:
Hello! i stumbled on this situation, that is if I decode some string, below just the empty string, using the mcbs encoding, it succeeds, but if I try to encode it back with the same encoding it...
1
by: jackbenimble999 | last post by:
Hello! What is the best way to display a EUC-encoded field with Access 2007? Or, failing that, how do you display a Unicode field as the character instead of the number? Do I need to use a...
1
by: Mudcat | last post by:
In short what I'm trying to do is read a document using an xml parser and then upload that data back into a database. I've got the code more or less completed using xml.etree.ElementTree for the...
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.