473,396 Members | 1,972 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Internationalised email subjects

I am writing a simple email program in Python that will send out
emails containing Chinese characters in the message subject and body.
I am not having any trouble getting the email body displayed correctly
in Chinese inside the email client, however the email subject and
sender name (which are also in Chinese) are garbled and are not
displayed correctly in the email client.

Here is the code snippet:

writer = MimeWriter.MimeWriter(out)
headers = {"From": senderName + ' <' + senderName + '>', "To":
recipientEmail, "Reply-to": senderEmail}

writer.addheader("Subject", subject)
writer.addheader("MIME-Version", "1.0")
writer.addheader('From', headers['From'])
writer.addheader('To', headers['To'])
writer.addheader('Reply-to', headers['Reply-to'])

I'm quite new to Python (and programming in general) and am having a
hard time wrapping my head around the internationalization functions
of Python, so was hoping someone could point me in the right
direction. Is there a different method I need to use in order for
the sender name and subject to be displayed correctly? Is there an
extra step I am missing? Some sample code would be very helpful.

Thanks!

Jun 20 '07 #1
14 2251
From:
http://docs.python.org/lib/module-email.header.html
>>from email.message import Message
from email.header import Header
msg = Message()
h = Header('p\xf6stal', 'iso-8859-1')
msg['Subject'] = h
print msg.as_string()
Subject: =?iso-8859-1?q?p=F6stal?=

/Martin
Jun 20 '07 #2
Thanks Martin, I actually have read that page before. The part that
confuses me is the line:

h = Header('p\xf6stal', 'iso-8859-1')

I have tried using:

h = Header(' ', 'GB2312')

but when I run the code, I get the following error:

UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
illegal multibyte sequence

Is there something I need to do in order to encode the Chinese
characters into the GB2312 character set?

Jun 20 '07 #3
Seems some characters are missing from my last post. The line that
says:

h = Header(' ', 'GB2312')

should say:

h = Header(' ', 'GB2312')
Jun 21 '07 #4
That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:

h = Header('(Some Chinese characters inserted here', 'GB2312')

And when I run this code, I receive the following error message:

UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
illegal multibyte sequence

Any idea what I may be doing wrong? How do I convert Chinese
characters into something like p\xf6stal in the original code posted
by Martin? Can someone point me in the right direction? I'm not even
sure what class/method to look into for this.

Jun 21 '07 #5
En Thu, 21 Jun 2007 06:23:43 -0300, <bu*******@gmail.comescribió:
That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:

h = Header('(Some Chinese characters inserted here', 'GB2312')

And when I run this code, I receive the following error message:

UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
illegal multibyte sequence
If you execute: print "some chinese characters", do you get the right
results?
Are you sure your system is using gb2312? In case you don't know and don't
trust autodetection, try something like this:

pyfrom unicodedata import *
pyname("á".decode("latin-1"))
'NO-BREAK SPACE'
pyname("á".decode("cp850"))
'LATIN SMALL LETTER A WITH ACUTE'

The first attempt shows the wrong name, so my console *cannot* be using
latin-1. With cp850 I got the right results, so it *might* be cp850 (it
may also be another encoding that happens to match this single character).
Further tests may reveal that it is actually cp850.
You should try with "some chinese characters" and see if your encoding is
actually gb2312.

--
Gabriel Genellina

Jun 21 '07 #6
On 6/21/07, bu*******@gmail.com <bu*******@gmail.comwrote:
That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:
You're not sending your email in UTF-8 (or another encoding that would
permit Chinese characters). Your email header shows:

Content-Type: text/plain; charset="us-ascii"

You probably need to reconfigure your mail client to send Chinese characters.

--
Evan Klitzke <ev**@yelp.com>
Jun 21 '07 #7
bu*******@gmail.com writes:
Seems some characters are missing from my last post. The line that
says:

h = Header(' ', 'GB2312')

should say:

h = Header(' ', 'GB2312')
Your message has this field in the header:

Content-Type: text/plain; charset="us-ascii"

which is why the non-ASCII characters don't appear. This is the fault
of Google's charset munging.

Please, people who use Google for mail and Usenet, kick them until
they present "utf-8" as the default encoding, instead of downgrading
to "us-ascii".

--
\ "I lost a button-hole." -- Steven Wright |
`\ |
_o__) |
Ben Finney
Jun 21 '07 #8
On 6/21/07, Ben Finney <bi****************@benfinney.id.auwrote:
bu*******@gmail.com writes:
Seems some characters are missing from my last post. The line that
says:

h = Header(' ', 'GB2312')

should say:

h = Header(' ', 'GB2312')

Your message has this field in the header:

Content-Type: text/plain; charset="us-ascii"

which is why the non-ASCII characters don't appear. This is the fault
of Google's charset munging.

Please, people who use Google for mail and Usenet, kick them until
they present "utf-8" as the default encoding, instead of downgrading
to "us-ascii".
Ironically, you're sending out us-ascii encoded emails as well. Like
it or not, 7-bit ASCII is the standard for SMTP, so it's a reasonable
default character encoding to send MIME encoded messages in -- and
it's trivial to change the outgoing character set to UTF-8 in
Gmail/Google Apps.

--
Evan Klitzke <ev**@yelp.com>
Jun 22 '07 #9
"Evan Klitzke" <ev**@yelp.comwrites:
Ironically, you're sending out us-ascii encoded emails as well.
Yes, because I was (a) replying to a message already in that encoding,
and (b) that encoding was sufficient to encode all the characters in
my message.

Where the original poster's message says that he posted a message with
Chinese characters, and the message was munged by Google to the
"us-ascii" charset.

--
\ "It is seldom that liberty of any kind is lost all at once." |
`\ -- David Hume |
_o__) |
Ben Finney
Jun 22 '07 #10
bu*******@gmail.com schrieb:
That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:

h = Header('(Some Chinese characters inserted here', 'GB2312')
What encoding do "Some Chinese characters" have at that point?

1. Don't try this at the interactive prompt. It will completely confuse
you. Instead, use IDLE.
2. In IDLE, put
# -*- coding: utf-8 -*-
into the top of the source code file.
3. Write the header as a Unicode string, i.e. with a u prefix
4. Explicitly encode it, such as

h = Header(u'(Some Chinese characters inserted here'.encode('GB2312'),
'GB2312')

If you are *not* inserting the characters from the Python source
code directly, go back to my original question: What are the
characters encoded in?

HTH,
Martin
Jun 22 '07 #11
Thanks Martin,

The "Some Chinese characters" are loaded from a MySQL table and are
encoded in GB2312 format.

I've added the following line at the top of the code:

# -*- coding: GB2312 -*-

I've also added the following line into the code:

h = Header(subject.encode('GB2312'), 'GB2312')

Note that the 'subject' variable consists of GB2312 encoded text, so I
am not sure if it is necessary to call the subject.encode('GB2312')
method. When I try to execute this code, I get the following error:

File "/home/web88/html/app/test.py", line 17,
in Header(subject.encode('GB2312'), 'GB2312')
LookupError: unknown encoding: GB2312

Any idea what may be wrong?
Jun 22 '07 #12
Thanks Richie,

I've tried removing the encode('GB2312') line, so the code looks like
this:

h = Header(subject, 'GB2312')

However, this line still causes the following error message:

Traceback (most recent call last):
File "/home/web88/html/app/sendmail.py", line 314, in
h = Header(subject, 'GB2312')
File "/usr/lib/python2.2/email/Header.py", line 188, in __init__
self.append(s, charset, errors)
File "/usr/lib/python2.2/email/Header.py", line 272, in append
ustr = unicode(s, incodec, errors)
LookupError: unknown encoding: gb2312 )

Any ideas?

Jun 22 '07 #13
En Fri, 22 Jun 2007 06:49:22 -0300, <bu*******@gmail.comescribió:
I've tried removing the encode('GB2312') line, so the code looks like
this:

h = Header(subject, 'GB2312')

However, this line still causes the following error message:

Traceback (most recent call last):
File "/home/web88/html/app/sendmail.py", line 314, in
h = Header(subject, 'GB2312')
File "/usr/lib/python2.2/email/Header.py", line 188, in __init__
self.append(s, charset, errors)
File "/usr/lib/python2.2/email/Header.py", line 272, in append
ustr = unicode(s, incodec, errors)
LookupError: unknown encoding: gb2312 )
It appears that you don't have the gb2312 codec - maybe it was not
available with your rather old Python version (2.2). Upgrading to a newer
version may help.

--
Gabriel Genellina

Jun 23 '07 #14
I'm an idiot! Gabriel, you're right! Turns out the ISP was running
Python 2.3, which has known issues with the GB2312 codec. They've
upgraded to 2.4 and now everything runs smoothly!

Jun 25 '07 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Jonas Galvez | last post by:
Hi, I need a function to parse badly encoded 'Subject' headers from e-mails, such as the following: =?ISO-8859-1?Q?Murilo_Corr=EAa?= =?ISO-8859-1?Q?Marcos_Mendon=E7a?= I tried using the...
6
by: Jed Parsons | last post by:
What headers to I have to know about to build thread trees from Unix mailboxes? Is it enough to get the In-Reply-To header for each message and build a dictionary of { Message-ID: message }...
7
by: Rob Meade | last post by:
Hi all, Ok - at work (NHS) we currently send out emails to everyone in the Trust (approx 2500 people) whenever there's something to say, perhaps a D&V update to let people know which wards are...
1
by: Bruce W.1 | last post by:
I'm new to ASP (but not ASP.NET) and I'm trying to setup a simple ASP web form to send an email to me. So I try this code: http://www.library.unr.edu/subjects/guides/mailplay.asp I upload the...
1
by: James | last post by:
Hi! I'm looking for an ASP .NET book aimed at the intermediate/expert level. Now I wonder if there is one book that covers all subjects(Web Controls (standard, custom & mobile), Web services,...
3
by: Laangen_LU | last post by:
Dear Group, my first post to this group, so if I'm on the wrong group, my apologies. I'm trying to send out an email in Chinese lanuage using the mail() function in PHP. Subject and...
4
by: lucavilla | last post by:
If you go to http://europe.nokia.com/A4305060, fill the "Enter your product code:" field with the value "0523183" and press "Go" (the ending page URL varies because there's a variable session-ID in...
1
by: Laszlo Nagy | last post by:
Hi All, 'm in trouble with decoding email subjects. Here are some examples: I know that "=?UTF-8?B" means UTF-8 + base64 encoding, but I wonder if there is a standard method in the...
1
by: psyvanz | last post by:
HOW TO MAKE A STUDENT SCHOOL SUBJECTS IN COLLEGE USING Visual Basic 6? WITH FIXED SUBJECTS AND ASSIGNING THE SUBJECT TO A STUDENT USING "DATAENVIRONMENT"? THIS IS IN "FORM1" THIS IS IN...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.