Stuart D. Gathman:
Content-Type: image/pjpeg; name="Jim&&Jill"
What IE apparently gets is:
[('image/pjpeg', ''), ('name', '"Jim&&Jill"')]
Is this a bug (in the email package, I mean - obviously IE is buggy)?
Do I have to write my own custom param parsing routines to handle this?
BTW, I verified this in 2.3.
Looks like the Content-Type syntax is defined in
http://www.faqs.org/rfcs/rfc2045.html
5.1. Syntax of the Content-Type Header Field
content := "Content-Type" ":" type "/" subtype
*(";" parameter)
parameter := attribute "=" value
value := token / quoted-string
token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
or tspecials>
tspecials := "(" / ")" / "<" / ">" / "@" /
"," / ";" / ":" / "\" / <">
"/" / "[" / "]" / "?" / "="
; Must be in quoted-string,
; to use within parameter values
So the ";" must be in a quoted string. That's defined in
RFC 822,
http://www.faqs.org/rfcs/rfc822.html
(now obsolete)
quoted-string = <"> *(qtext/quoted-pair) <">
qtext = <any CHAR excepting <">, ; => may be folded
"\" & CR, and including
linear-white-space>
CHAR = <any ASCII character>
The ';' is in CHAR and is not "\" nor CR so it's in qtext,
so it's part of quoted-string, so it's allowed in a value
without extra interpretation.
I looks like 2822 (the updated version of 822) a
http://www.faqs.org/rfcs/rfc2822.html agrees.
So I think it's a bug in the email module's parser.
The actual bug is in email/Parser.py with
# Regular expression used to split header parameters. BAW: this may be too
# simple. It isn't strictly RFC 2045 (section 5.1) compliant, but it
catches
# most headers found in the wild. We may eventually need a full fledged
# parser eventually.
paramre = re.compile(r'\s*;\s*')
A quick scan of the code suggests that it isn't a quick fix (eg,
not just a matter of tweaking that regexp.
Could you file a bug report against it?
Andrew
da***@dalkescientific.com