Connecting Tech Pros Worldwide Forums | Help | Site Map

email header decoding fails

ZeeGeek
Guest
 
Posts: n/a
#1: Apr 10 '08
It seems that the decode_header function in email.Header fails when
the string is in the following form,

'=?gb2312?Q?=D0=C7=C8=FC?=(revised)'

That's when a non-encoded string follows the encoded string without
any whitespace. In this case, decode_header function treats the whole
string as non-encoded. Is there a work around for this problem?

Thanks.

Gabriel Genellina
Guest
 
Posts: n/a
#2: Apr 10 '08

re: email header decoding fails


En Thu, 10 Apr 2008 05:45:41 -0300, ZeeGeek <ZeeGeek@gmail.comescribió:
Quote:
On Apr 10, 4:31 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
Quote:
>En Wed, 09 Apr 2008 23:12:00 -0300, ZeeGeek <ZeeG...@gmail.com>
>escribió:
>>
Quote:
It seems that the decode_header function in email.Header fails when
the string is in the following form,
>>
Quote:
'=?gb2312?Q?=D0=C7=C8=FC?=(revised)'
Quote:
Quote:
> An 'encoded-word' that appears within a
> 'phrase' MUST be separated from any adjacent 'word', 'text' or
> 'special' by 'linear-white-space'.
>
Thank you very much, Gabriel.
The above just says "why" decode_header refuses to decode it, and why it's
not a bug. But if you actually have to deal with those malformed headers,
some heuristics may help. By example, if you *know* your mails typically
specify gb2312 encoding, or iso-8859-1, you may look for things that look
like the example above and "fix" it.

--
Gabriel Genellina

ZeeGeek
Guest
 
Posts: n/a
#3: Apr 11 '08

re: email header decoding fails


On Apr 10, 5:18 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
Quote:
En Thu, 10 Apr 2008 05:45:41 -0300, ZeeGeek <ZeeG...@gmail.comescribió:
>
Quote:
On Apr 10, 4:31 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
Quote:
En Wed, 09 Apr 2008 23:12:00 -0300, ZeeGeek <ZeeG...@gmail.com>
escribió:
>
Quote:
Quote:
It seems that the decode_header function in email.Header fails when
the string is in the following form,
>
Quote:
Quote:
'=?gb2312?Q?=D0=C7=C8=FC?=(revised)'
An 'encoded-word' that appears within a
'phrase' MUST be separated from any adjacent 'word', 'text' or
'special' by 'linear-white-space'.
>
Quote:
Thank you very much, Gabriel.
>
The above just says "why" decode_header refuses to decode it, and why it's
not a bug. But if you actually have to deal with those malformed headers,
some heuristics may help. By example, if you *know* your mails typically
specify gb2312 encoding, or iso-8859-1, you may look for things that look
like the example above and "fix" it.
Right now what I'm doing is to use re.sub(r'(=\?([^\?]*\?){3}=)', r'
\1 ', orig_string) to detect and place an extra white space before and
after every occurrence of an encoded string. Then the whole string is
compliant with the standard and decode_header can decode it properly.
Closed Thread


Similar Python bytes