By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,096 Members | 1,568 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,096 IT Pros & Developers. It's quick & easy.

email header decoding fails

P: n/a
It seems that the decode_header function in email.Header fails when
the string is in the following form,

'=?gb2312?Q?=D0=C7=C8=FC?=(revised)'

That's when a non-encoded string follows the encoded string without
any whitespace. In this case, decode_header function treats the whole
string as non-encoded. Is there a work around for this problem?

Thanks.
Apr 10 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
En Thu, 10 Apr 2008 05:45:41 -0300, ZeeGeek <Ze*****@gmail.comescribió:
On Apr 10, 4:31 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
>En Wed, 09 Apr 2008 23:12:00 -0300, ZeeGeek <ZeeG...@gmail.com>
escribió:
It seems that the decode_header function in email.Header fails when
the string is in the following form,
'=?gb2312?Q?=D0=C7=C8=FC?=(revised)'
> An 'encoded-word' that appears within a
'phrase' MUST be separated from any adjacent 'word', 'text' or
'special' by 'linear-white-space'.

Thank you very much, Gabriel.
The above just says "why" decode_header refuses to decode it, and why it's
not a bug. But if you actually have to deal with those malformed headers,
some heuristics may help. By example, if you *know* your mails typically
specify gb2312 encoding, or iso-8859-1, you may look for things that look
like the example above and "fix" it.

--
Gabriel Genellina

Apr 10 '08 #2

P: n/a
On Apr 10, 5:18 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Thu, 10 Apr 2008 05:45:41 -0300, ZeeGeek <ZeeG...@gmail.comescribió:
On Apr 10, 4:31 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Wed, 09 Apr 2008 23:12:00 -0300, ZeeGeek <ZeeG...@gmail.com>
escribió:
It seems that the decode_header function in email.Header fails when
the string is in the following form,
'=?gb2312?Q?=D0=C7=C8=FC?=(revised)'
An 'encoded-word' that appears within a
'phrase' MUST be separated from any adjacent 'word', 'text' or
'special' by 'linear-white-space'.
Thank you very much, Gabriel.

The above just says "why" decode_header refuses to decode it, and why it's
not a bug. But if you actually have to deal with those malformed headers,
some heuristics may help. By example, if you *know* your mails typically
specify gb2312 encoding, or iso-8859-1, you may look for things that look
like the example above and "fix" it.
Right now what I'm doing is to use re.sub(r'(=\?([^\?]*\?){3}=)', r'
\1 ', orig_string) to detect and place an extra white space before and
after every occurrence of an encoded string. Then the whole string is
compliant with the standard and decode_header can decode it properly.
Apr 11 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.