By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,050 Members | 1,019 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,050 IT Pros & Developers. It's quick & easy.

How do I Extract Attachment from Newsgroup Message

P: n/a
I'm parsing NNTP messages that have XML file attachments. How can I
extract the encoded text back into a file? I looked for a solution
with mimetools (the way I'd approach it for email) but found nothing.

Here's a long snippet of the message:
>>n.article('116431')
('220 116431 <D8*******@news.ap.orgarticle', '116431',
'<D8*******@news.ap.org>', ['MIME-Version: 1.0', 'Message-ID:
<D8*******@news.ap.org>', 'Content-Type: Multipart/Mixed;', '
boundary="------------Boundary-00=_A5NJCP3FX6Y5BI3BH890"', 'Date: Thu,
24 May 2007 07:41:34 -0400 (EDT)', 'From: Newsclip <ne******@ap.org>',
'Path: newsclip.ap.org!flounder.ap.org!flounder', 'Newsgroups:
ap.spanish.online,ap.spanish.online.business', 'Keywords: MUN ECO
PETROLEO PRECIOS', 'Subject: MUN ECO PETROLEO PRECIOS', 'Summary: ',
'Lines: 108', 'Xref: newsclip.ap.org ap.spanish.online:938298
ap.spanish.online.business:116431', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Plain',
'Content-Transfer-Encoding: 8bit', 'Content-Description: text,
unencoded', '', '(AP) Precios del crudo se mueven sin rumbo claro',
'Por GEORGE JAHN', 'VIENA', 'Los precios

.... (truncated for length) ...

'', '___', '', 'Editores: Derrick Ho, periodista de la AP en Singapur,
contribuy\xf3 con esta informaci\xf3n.', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Xml', 'Content-
Transfer-Encoding: base64', 'Content-Description: text, base64
encoded', '',
'PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiP z4KPCFET0NUWVBFIG5pdGYgU1lT',
'VEVNICJuaXRmLmR0ZCI+CjxuaXRmPgogPGhlYWQ
+CiAgPG1ldGEgbmFtZT0iYXAtdHJhbnNyZWYi',
'IGNvbnRlbnQ9IlNQMTQ3MiIvPgogIDxtZXRhIG5hbWU9ImFwL W9yaWdpbiIgY29udGVudD0ic3Bh',
'bm9sIi8+CiAgPG1ldGEgbmFtZT0iYXAtc2VsZWN0b3IiIGNvb n

May 31 '07 #1
Share this Question
Share on Google+
2 Replies


P: n/a
On May 31, 8:54 am, "snewma...@gmail.com" <snewma...@gmail.comwrote:
I'm parsing NNTP messages that have XML file attachments. How can I
extract the encoded text back into a file? I looked for a solution
with mimetools (the way I'd approach it for email) but found nothing.

Here's a long snippet of the message:
>n.article('116431')

('220 116431 <D8PANK...@news.ap.orgarticle', '116431',
'<D8PANK...@news.ap.org>', ['MIME-Version: 1.0', 'Message-ID:
<D8PANK...@news.ap.org>', 'Content-Type: Multipart/Mixed;', '
boundary="------------Boundary-00=_A5NJCP3FX6Y5BI3BH890"', 'Date: Thu,
24 May 2007 07:41:34 -0400 (EDT)', 'From: Newsclip <newsc...@ap.org>',
'Path: newsclip.ap.org!flounder.ap.org!flounder', 'Newsgroups:
ap.spanish.online,ap.spanish.online.business', 'Keywords: MUN ECO
PETROLEO PRECIOS', 'Subject: MUN ECO PETROLEO PRECIOS', 'Summary: ',
'Lines: 108', 'Xref: newsclip.ap.org ap.spanish.online:938298
ap.spanish.online.business:116431', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Plain',
'Content-Transfer-Encoding: 8bit', 'Content-Description: text,
unencoded', '', '(AP) Precios del crudo se mueven sin rumbo claro',
'Por GEORGE JAHN', 'VIENA', 'Los precios

... (truncated for length) ...

'', '___', '', 'Editores: Derrick Ho, periodista de la AP en Singapur,
contribuy\xf3 con esta informaci\xf3n.', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Xml', 'Content-
Transfer-Encoding: base64', 'Content-Description: text, base64
encoded', '',
'PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiP z4KPCFET0NUWVBFIG5pdGYgU1lT',
'VEVNICJuaXRmLmR0ZCI+CjxuaXRmPgogPGhlYWQ
+CiAgPG1ldGEgbmFtZT0iYXAtdHJhbnNyZWYi',
'IGNvbnRlbnQ9IlNQMTQ3MiIvPgogIDxtZXRhIG5hbWU9ImFwL W9yaWdpbiIgY29udGVudD0ic3Bh',
'bm9sIi8+CiAgPG1ldGEgbmFtZT0iYXAtc2VsZWN0b3IiIGNvb n
This looks like what you might be looking for:
http://mail.python.org/pipermail/pyt...ne/265018.html

Not sure if you'll need this or not, but here's some info on encoding/
decoding files:
http://www.jorendorff.com/articles/unicode/python.html

There are lots of ways to parse xml. I use the minidom module myself.

Mike

May 31 '07 #2

P: n/a
I looked for a solution
with mimetools (the way I'd approach it for email) but found nothing.
....
<D8*******@news.ap.org>', 'Content-Type: Multipart/Mixed;', '
boundary="------------Boundary-00=_A5NJCP3FX6Y5BI3BH890"', 'Date: Thu,
....

Playing with

data = n.article('116431')[3]

and email.message_from_string, there seems to be a problem with the
content type being split up. I was able to get a multipart message by
using

msg = email.message_from_string('\n'.join(data).replace( ';\n', ';'))

(and adding an ending boundary to your sample data).
This is a bit hackish and could cause problems if there are
semicolons inside the message body (no warranties expressed or
implied, etc.)

Hope this helps,
-Dave
May 31 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.