473,382 Members | 1,733 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

How do I Extract Attachment from Newsgroup Message

I'm parsing NNTP messages that have XML file attachments. How can I
extract the encoded text back into a file? I looked for a solution
with mimetools (the way I'd approach it for email) but found nothing.

Here's a long snippet of the message:
>>n.article('116431')
('220 116431 <D8*******@news.ap.orgarticle', '116431',
'<D8*******@news.ap.org>', ['MIME-Version: 1.0', 'Message-ID:
<D8*******@news.ap.org>', 'Content-Type: Multipart/Mixed;', '
boundary="------------Boundary-00=_A5NJCP3FX6Y5BI3BH890"', 'Date: Thu,
24 May 2007 07:41:34 -0400 (EDT)', 'From: Newsclip <ne******@ap.org>',
'Path: newsclip.ap.org!flounder.ap.org!flounder', 'Newsgroups:
ap.spanish.online,ap.spanish.online.business', 'Keywords: MUN ECO
PETROLEO PRECIOS', 'Subject: MUN ECO PETROLEO PRECIOS', 'Summary: ',
'Lines: 108', 'Xref: newsclip.ap.org ap.spanish.online:938298
ap.spanish.online.business:116431', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Plain',
'Content-Transfer-Encoding: 8bit', 'Content-Description: text,
unencoded', '', '(AP) Precios del crudo se mueven sin rumbo claro',
'Por GEORGE JAHN', 'VIENA', 'Los precios

.... (truncated for length) ...

'', '___', '', 'Editores: Derrick Ho, periodista de la AP en Singapur,
contribuy\xf3 con esta informaci\xf3n.', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Xml', 'Content-
Transfer-Encoding: base64', 'Content-Description: text, base64
encoded', '',
'PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiP z4KPCFET0NUWVBFIG5pdGYgU1lT',
'VEVNICJuaXRmLmR0ZCI+CjxuaXRmPgogPGhlYWQ
+CiAgPG1ldGEgbmFtZT0iYXAtdHJhbnNyZWYi',
'IGNvbnRlbnQ9IlNQMTQ3MiIvPgogIDxtZXRhIG5hbWU9ImFwL W9yaWdpbiIgY29udGVudD0ic3Bh',
'bm9sIi8+CiAgPG1ldGEgbmFtZT0iYXAtc2VsZWN0b3IiIGNvb n

May 31 '07 #1
2 1880
On May 31, 8:54 am, "snewma...@gmail.com" <snewma...@gmail.comwrote:
I'm parsing NNTP messages that have XML file attachments. How can I
extract the encoded text back into a file? I looked for a solution
with mimetools (the way I'd approach it for email) but found nothing.

Here's a long snippet of the message:
>n.article('116431')

('220 116431 <D8PANK...@news.ap.orgarticle', '116431',
'<D8PANK...@news.ap.org>', ['MIME-Version: 1.0', 'Message-ID:
<D8PANK...@news.ap.org>', 'Content-Type: Multipart/Mixed;', '
boundary="------------Boundary-00=_A5NJCP3FX6Y5BI3BH890"', 'Date: Thu,
24 May 2007 07:41:34 -0400 (EDT)', 'From: Newsclip <newsc...@ap.org>',
'Path: newsclip.ap.org!flounder.ap.org!flounder', 'Newsgroups:
ap.spanish.online,ap.spanish.online.business', 'Keywords: MUN ECO
PETROLEO PRECIOS', 'Subject: MUN ECO PETROLEO PRECIOS', 'Summary: ',
'Lines: 108', 'Xref: newsclip.ap.org ap.spanish.online:938298
ap.spanish.online.business:116431', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Plain',
'Content-Transfer-Encoding: 8bit', 'Content-Description: text,
unencoded', '', '(AP) Precios del crudo se mueven sin rumbo claro',
'Por GEORGE JAHN', 'VIENA', 'Los precios

... (truncated for length) ...

'', '___', '', 'Editores: Derrick Ho, periodista de la AP en Singapur,
contribuy\xf3 con esta informaci\xf3n.', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Xml', 'Content-
Transfer-Encoding: base64', 'Content-Description: text, base64
encoded', '',
'PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiP z4KPCFET0NUWVBFIG5pdGYgU1lT',
'VEVNICJuaXRmLmR0ZCI+CjxuaXRmPgogPGhlYWQ
+CiAgPG1ldGEgbmFtZT0iYXAtdHJhbnNyZWYi',
'IGNvbnRlbnQ9IlNQMTQ3MiIvPgogIDxtZXRhIG5hbWU9ImFwL W9yaWdpbiIgY29udGVudD0ic3Bh',
'bm9sIi8+CiAgPG1ldGEgbmFtZT0iYXAtc2VsZWN0b3IiIGNvb n
This looks like what you might be looking for:
http://mail.python.org/pipermail/pyt...ne/265018.html

Not sure if you'll need this or not, but here's some info on encoding/
decoding files:
http://www.jorendorff.com/articles/unicode/python.html

There are lots of ways to parse xml. I use the minidom module myself.

Mike

May 31 '07 #2
I looked for a solution
with mimetools (the way I'd approach it for email) but found nothing.
....
<D8*******@news.ap.org>', 'Content-Type: Multipart/Mixed;', '
boundary="------------Boundary-00=_A5NJCP3FX6Y5BI3BH890"', 'Date: Thu,
....

Playing with

data = n.article('116431')[3]

and email.message_from_string, there seems to be a problem with the
content type being split up. I was able to get a multipart message by
using

msg = email.message_from_string('\n'.join(data).replace( ';\n', ';'))

(and adding an ending boundary to your sample data).
This is a bit hackish and could cause problems if there are
semicolons inside the message body (no warranties expressed or
implied, etc.)

Hope this helps,
-Dave
May 31 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Selen | last post by:
I would like to be able to extract a BLOB from the database (SqlServer) and pass it to a browser without writing it to a file. (The BLOB's are word doc's, MS project doc's, and Excel spreadsheets....
2
by: Chris Kane | last post by:
We have written a class that enumerates the items in a WSS list and then attemptes to open the attachment for each item. We have written two classes, one to impersonate a user and read in the list...
2
by: CTDev Team | last post by:
Hi, We are using Exchange Server 5.5, and have applications written in VB6 and C# that read and process emails. We are experiencing intermittent errors similar to C# Application ...
1
by: JohnRHarlow | last post by:
Hi: I am looking for advice on the best way to set up a process to read incoming emails (from a normal unix mailbox on the same host) containing a gzipped telemetry attachment. I'd like the...
7
by: erikcw | last post by:
Hi all, I'm trying to extract zip file (containing an xml file) from an email so I can process it. But I'm running up against some brick walls. I've been googling and reading all afternoon, and...
1
by: suis | last post by:
Hi Everybody, I have a big dought about, how to read meta data information in a specific file type like .MSG , anyway thanks to this URL bellow . Edanmo has done a great job for that. ...
1
by: =?Utf-8?B?RGl2ZXJzaXR5IE1hbg==?= | last post by:
My problem is when I send an attachment to a created E-Mail file that is not a Microsoft Word attachment it still attempts to open the attachment in word. How can I correct this problem? -- JW
1
by: Edwin.Madari | last post by:
from each line separate out url and request parts. split the request into key-value pairs, use urllib to unquote key-value pairs......as show below... import urllib line = "GET...
7
by: =?Utf-8?B?QmVu?= | last post by:
Hi I am looking for a way to extraxt an icon from a .exe file an save it as an icon not a bitmap or jpeg to a file? The code below extracts the icon but only as a bitmap PictureBox1.Image =...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.