473,804 Members | 2,985 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Python:Email and Header Parsing: Some Help

Hi,
I have written this small piece of code. I am a brand
new player of Python. I had asked some people for
help, unfortunately not many helped.
Here is the code I have:

import email
import os
import sys
fread = open('email_mes sage', 'r')
msg=email.messa ge_from_file(fr ead)
print msg
#fwrite = open('output',' w')
#fwrite.write(m sg)

This way I am able to print the entire email message
on the stdout. The program generates an error If I try
to write the output to a file-- It says the argument
(here msg) should be a string but not as an instance
like here. How to write the message to another file
then?

2. I have so many headers in the email message

To:
From:
X Received:
X Priority:
Subject:
etc etc.
I want to parse the headers separtely and message
separately. Does anyone has an example code to deal
with Parser?
Also I want to remove the redundant words and all html
tags. Any advise on that?
I saw some examples using HTMLGen But I dont have
HTMLGen with python on my machine. I have Python
2.3.3. on my machine.

All help is greatly appreciated.
Thanks
Dont

_______________ _______________ ____
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools

Jul 18 '05 #1
3 3835
"dont bother" <do************ *@yahoo.com> wrote in message
news:ma******** *************** **************@ python.org...
I want to parse the headers separtely and message
separately. Does anyone has an example code to deal
with Parser?

Here is a spam cleaner that I run several times a day. My ISP run Symantec
on their end, and tag suspect e-mails with header virus tags. This program
looks for those tags, and autodeletes any Klez or Swen infected e-mails.
import poplib, re

# Change this to your needs
POPHOST = "pop-server.austin.r r.com"
POPUSER = "xyzzy"
POPPASS = "ajsdlfjslf kj"

# reg expressions for extracting header data
re_from = re.compile( "^From: (.*)" )
re_to = re.compile( "^To: (.*)" )
re_subject = re.compile( "^Subject: (.*)" )
re_virusresult = re.compile( "^X-Virus-Scan-Result: (.*)" )

def showMessage( msgHdr ):
out = ( msgHdr["msgnum"], msgHdr["From"], msgHdr["Subject"],
msgHdr["Virus"] )
print "%3d. %-30.30s %-24.24s %-24.24s" % out

def scanMailboxMsgs ():
"refresh window contents"
global deleteCount

try:
# log in to mail box
pop = poplib.POP3(POP HOST)
pop.user(POPUSE R)
pop.pass_( POPPASS)
connected = True

# retrieve msg headers
msgCount, msgTotalSize = pop.stat()

emptyHdr = {
"From" : "",
"To" : "",
"Subject" : "",
"Virus" : "none"
}
matchREs = [
( re_from, "From" ),
( re_to, "To" ),
( re_subject, "Subject" ),
( re_virusresult, "Virus" )
]

# for each message, display header info
for n in range( msgCount ):
msgnum = n+1 # msg nums are 1-based, not 0-based

# Retrieve message header
response, headerLines, bytes = pop.top(msgnum, 0)

hdr = emptyHdr.copy()
hdr["msgnum"] = msgnum
hdr["size"] = bytes
for line in headerLines:
for reExpr,hdrField in matchREs:
match = reExpr.match( line )
if match:
hdr[ hdrField ] = match.group(1). strip('"')

# auto-delete any msgs that had the W32.Swen virus
if hdr["Virus"].count("W32.Swe n") > 0 or \
hdr["Virus"].count("W32.Kle z") > 0:
showMessage( hdr )
pop.dele(msgnum )
deleteCount += 1

except poplib.error_pr oto, detail:
print "POP3 error:", detail

if connected :
pop.quit()
# ============= main script ===============
deleteCount = 0
scanMailboxMsgs ()
print "Deleted", deleteCount, "messages"

raw_input( "Press <return> to continue" )
Jul 18 '05 #2
At some point, dont bother <do************ *@yahoo.com> wrote:
Hi,
I have written this small piece of code. I am a brand
new player of Python. I had asked some people for
help, unfortunately not many helped.
Here is the code I have:

import email
import os
import sys
fread = open('email_mes sage', 'r')
msg=email.messa ge_from_file(fr ead)
print msg
#fwrite = open('output',' w')
#fwrite.write(m sg)

This way I am able to print the entire email message
on the stdout. The program generates an error If I try
to write the output to a file-- It says the argument
(here msg) should be a string but not as an instance
like here. How to write the message to another file
then?
msg here isn't a string; it's an email.Message object. The print
statement works because print call str() on the objects passed.

You want
fwrite = open('output', 'w')
fwrite.write( msg.as_string() )

I didn't use str(msg) here, as that defaults to
msg.as_string(u nixfrom=True). Depends whether or not you want the
'From <whoosit>' line at the top (which you do if you're writing an
mbox).
2. I have so many headers in the email message

To:
From:
X Received:
X Priority:
Subject:
etc etc.
I want to parse the headers separtely and message
separately. Does anyone has an example code to deal
with Parser?
I'm not sure what you want -- email.message_f rom_file produces a Message
object, which already splits out the headers from the body. You can
then iterate over the headers. For example, to strip out the optional
headers (those starting with 'X-'):

for hdr in msg.keys():
if hdr.startswith( 'X-'):
del msg[hdr]
Also I want to remove the redundant words and all html
tags. Any advise on that?
I saw some examples using HTMLGen But I dont have
HTMLGen with python on my machine. I have Python
2.3.3. on my machine.


HTMLGen won't work, as that generates HTML (hence the name...). To
strip out the HTML tags, probably a regular expression would be
sufficient. Otherwise, have a look at HTMLParser (in the standard library).

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)phy sics(dot)mcmast er(dot)ca
Jul 18 '05 #3
> HTMLGen won't work, as that generates HTML (hence the name...). To
strip out the HTML tags, probably a regular expression would be
sufficient. Otherwise, have a look at HTMLParser (in the standard library).


To strip out html, use sgml:
http://flangy.com/dev/python/striphtml.html

- Josiah
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2568
by: Tlo | last post by:
hello, i would like to do the following, and as i had never used python in a network framework i would like to have opinions on this : i would like to provide some kind of network quizz game, each player logged in and can then join 'playing rooms', in each rooms they had to answers many quizz-like questions, after a game finished each players involved in it are ranked from their corrects answers and the time they spent to answer....
0
1067
by: Chuck Amadi | last post by:
Howto to run email-dir.py from the example with the correct parameters . chuck@sevenofnine:~>python email-dir.py -d /home/chuck/Mail/Inbox msgfile get the following error . chuck@sevenofnine:~> python email-dir.py msgfile Traceback (most recent call last): File "email-dir.py", line 83, in ? main()
1
1246
by: Lad | last post by:
On my website I allow users to upload files. I would like a user to see how much time is left before a file is uploaded. So, I would like to have a progress bar during a file uploading. Can Python help me with that?Or how can be a progress bar made? Thank you for ideas. La.
1
3434
by: mneagul | last post by:
Hello, I want to use email.Parser for parsing some email messages but I have a small problem: it is very, very slow... For 110MB of email messages it takes ~29-30 seconds for parsing. Do you know some other email parser for python? Marian Neagul
0
9577
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10325
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10315
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9140
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6847
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5519
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5651
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3815
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2990
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.