By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,092 Members | 1,597 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,092 IT Pros & Developers. It's quick & easy.

Email headers and non-ASCII characters

P: n/a
Hello, everyone...

I'm trying to send an email to people with non-ASCII characters in their
names. A recpient's address may look like:

"Jörg Nørgens" <joerg@nowhere>

My example code:

=================================
def sendmail(sender, recipient, body, subject):
message = MIMEText(body)
message['Subject'] = Header(subject, 'iso-8859-1')
message['From'] = Header(sender, 'iso-8859-1')
message['To'] = Header(recipient, 'iso-8859-1')

s = smtplib.SMTP()
s.connect()
s.sendmail(sender, recipient, message.as_string())
s.close()
=================================

However the Header() method encodes the whole expression in ISO-8859-1:

=?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowher e=3E?=

However I had expected something like:

"=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg@nowhere>

Of course my mail transfer agent is not happy with the first string
although I see that Header() is just doing its job. I'm looking for a way
though to encode just the non-ASCII parts like any mail client does. Does
anyone have a recipe on how to do that? Or is there a method in
the "email" module of the standard library that does what I need? Or
should I split by regular expression to extract the email address
beforehand? Or a list comprehension to just look for non-ASCII character
and Header() them? Sounds dirty.

Hints welcome.

Regards
Christoph
Nov 23 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Christoph Haas skrev:
Hello, everyone...

I'm trying to send an email to people with non-ASCII characters in their
names. A recpient's address may look like:

"Jörg Nørgens" <joerg@nowhere>

My example code:

=================================
def sendmail(sender, recipient, body, subject):
message = MIMEText(body)
message['Subject'] = Header(subject, 'iso-8859-1')
message['From'] = Header(sender, 'iso-8859-1')
message['To'] = Header(recipient, 'iso-8859-1')

s = smtplib.SMTP()
s.connect()
s.sendmail(sender, recipient, message.as_string())
s.close()
=================================

However the Header() method encodes the whole expression in ISO-8859-1:

=?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowher e=3E?=

However I had expected something like:

"=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg@nowhere>

Of course my mail transfer agent is not happy with the first string

Why offcourse? But it seems that you are passing the Header object a
utf-8 encoded string, not a latin-1 encoded.

You are telling the header the encoding. Not asking it to encode.

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
Nov 23 '06 #2

P: n/a
On Thursday 23 November 2006 16:31, Max M wrote:
Christoph Haas skrev:
Hello, everyone...

I'm trying to send an email to people with non-ASCII characters in
their names. A recpient's address may look like:

"Jörg Nørgens" <joerg@nowhere>

My example code:

=================================
def sendmail(sender, recipient, body, subject):
message = MIMEText(body)
message['Subject'] = Header(subject, 'iso-8859-1')
message['From'] = Header(sender, 'iso-8859-1')
message['To'] = Header(recipient, 'iso-8859-1')

s = smtplib.SMTP()
s.connect()
s.sendmail(sender, recipient, message.as_string())
s.close()
=================================

However the Header() method encodes the whole expression in
ISO-8859-1:

=?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowher e=3E?=

However I had expected something like:

"=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg@nowhere>

Of course my mail transfer agent is not happy with the first string

Why offcourse?
Because my MTA doesn't care about MIME. It just transports the email. And
it expects an email address in <...but doesn't decode =?iso...? strings.
But it seems that you are passing the Header object a
utf-8 encoded string, not a latin-1 encoded.
You are telling the header the encoding. Not asking it to encode.
Uhm, okay. Let's see:

u'"Jörg Nørgens" <joerg@nowhere>'.encode('latin-1')

='"J\xc3\xb6rg N\xc3\xb8rgens" <joerg@nowhere>'

So far so good. Now run Header() on it:

='=?utf-8?b?IkrDtnJnIE7DuHJnZW5zIiA8am9lcmdAbm93aGVyZT4=?= '

Still nothing like <...in it and my MTA is unhappy again. What am I
missing? Doesn't anyone know how mail clients handle that encoding?

Desperately,
Christoph
Nov 24 '06 #3

P: n/a

Christoph Haas wrote:
Hello, everyone...

I'm trying to send an email to people with non-ASCII characters in their
names. A recpient's address may look like:

"Jrg Nrgens" <joerg@nowhere>

My example code:

=================================
def sendmail(sender, recipient, body, subject):
message = MIMEText(body)
message['Subject'] = Header(subject, 'iso-8859-1')
message['From'] = Header(sender, 'iso-8859-1')
message['To'] = Header(recipient, 'iso-8859-1')

s = smtplib.SMTP()
s.connect()
s.sendmail(sender, recipient, message.as_string())
s.close()
=================================

However the Header() method encodes the whole expression in ISO-8859-1:

=?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowher e=3E?=

However I had expected something like:

"=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg@nowhere>

Of course my mail transfer agent is not happy with the first string
although I see that Header() is just doing its job. I'm looking for a way
though to encode just the non-ASCII parts like any mail client does. Does
anyone have a recipe on how to do that? Or is there a method in
the "email" module of the standard library that does what I need? Or
should I split by regular expression to extract the email address
beforehand? Or a list comprehension to just look for non-ASCII character
and Header() them? Sounds dirty.
Why dirty?

from email.Header import Header
from itertools import groupby
h = Header()
addr = u'"Jrg Nrgens" <joerg@nowhere>'
def is_ascii(char):
return ord(char) < 128
for ascii, group in groupby(addr, is_ascii):
h.append(''.join(group),"latin-1")

print h
=>
"J =?iso-8859-1?q?=F6?= rg N =?iso-8859-1?q?=F8?= rgens"
<joerg@nowhere>

-- Leo

Nov 24 '06 #4

P: n/a
Christoph Haas skrev:
On Thursday 23 November 2006 16:31, Max M wrote:
>Christoph Haas skrev:
>>Hello, everyone...

I'm trying to send an email to people with non-ASCII characters in
their names. A recpient's address may look like:

"Jörg Nørgens" <joerg@nowhere>

My example code:

=================================
def sendmail(sender, recipient, body, subject):
message = MIMEText(body)
message['Subject'] = Header(subject, 'iso-8859-1')
message['From'] = Header(sender, 'iso-8859-1')
message['To'] = Header(recipient, 'iso-8859-1')

s = smtplib.SMTP()
s.connect()
s.sendmail(sender, recipient, message.as_string())
s.close()
=================================

However the Header() method encodes the whole expression in
ISO-8859-1:

=?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowher e=3E?=

However I had expected something like:

"=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg@nowhere>

Of course my mail transfer agent is not happy with the first string
Why offcourse?

Because my MTA doesn't care about MIME. It just transports the email. And
it expects an email address in <...but doesn't decode =?iso...? strings.
>But it seems that you are passing the Header object a
utf-8 encoded string, not a latin-1 encoded.
You are telling the header the encoding. Not asking it to encode.

Uhm, okay. Let's see:

u'"Jörg Nørgens" <joerg@nowhere>'.encode('latin-1')

='"J\xc3\xb6rg N\xc3\xb8rgens" <joerg@nowhere>'

So far so good. Now run Header() on it:

='=?utf-8?b?IkrDtnJnIE7DuHJnZW5zIiA8am9lcmdAbm93aGVyZT4=?= '

Still nothing like <...in it and my MTA is unhappy again. What am I
missing? Doesn't anyone know how mail clients handle that encoding?
>>address = u'"Jörg Nørgens" <joerg@nowhere>'.encode('latin-1')
address
'"J\xf6rg N\xf8rgens" <joerg@nowhere>'
>>from email.Header import Header
hdr = str(Header(address, 'latin-1'))
hdr
'=?iso-8859-1?q?=22J=F6rg_N=F8rgens=22_=3Cjoerg=40nowhere=3E?= '

Is this not correct?

At least roundtripping works:
>>from email.Header import decode_header
encoded, coding = decode_header(hdr)[0]
encoded, coding
('"J\xf6rg N\xf8rgens" <joerg@nowhere>', 'iso-8859-1')
>>encoded.decode(coding)
u'"J\xf6rg N\xf8rgens" <joerg@nowhere>'

And parsing the address works too.
>>from email.Utils import parseaddr
parseaddr(encoded.decode(coding))
(u'J\xf6rg N\xf8rgens', u'joerg@nowhere')
>>>
--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
Nov 24 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.