473,216 Members | 2,078 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,216 software developers and data experts.

email parsing

It is my first script on python. Don't know is it correctly uses
modules, but it is working fine with specially with russian code pages
and mime formated messages. Also quoted-printable and base64
encoded....

It will be very good if anybody post any comments on this script. Is
it good or bad...
import email
import mailbox
from email.Header import decode_header
from email.Header import make_header
import string
import sys

outEnc="cp866"
infile=sys.argv[1]

subStrObrez = []
subStrObrez.append("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~")
subStrObrez.append("""~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~
To UNSUBSCRIBE from this forum, send an email to:""")
subStrObrez.append("~~~~~~~~~~~~~~~~~~")

# Cut yahoo info at the end of message
def obrez(strMsg):
for s in subStrObrez:
n = string.rfind(strMsg,s)
if n != -1:
return strMsg[0:n]
return strMsg

# Convert message header
def my_get_header(str):
str2=""
for val,encoding in decode_header(str):
if encoding:
str2 = str2+ val.decode(encoding)+" "
else:
str2 = str2+ val+" "
return str2

# Process the message
def proc(msg):
print 'From : '+ my_get_header(msg['From']).encode(outEnc)
print 'To : '+ my_get_header(msg['To']).encode(outEnc)
print 'Subject: '+ my_get_header(msg['Subject']).encode(outEnc)
print

if msg.is_multipart():
for part in msg.walk():
if part.get_content_type() == "text/plain":
if part.get_content_charset():
print
obrez(part.get_payload(None,True).decode(part.get_ content_charset()).encode(outEnc))
else:
print obrez(part.get_payload(None,True))

else:
if msg.get_content_type() == "text/plain":
if msg.get_content_charset():
print
obrez( (msg.get_payload(None,True)).decode(msg.get_conten t_charset()) ).encode(outEnc)
else:
print obrez( msg.get_payload(None,True) )
else:
if msg.get_content_type() == "text/html":
if msg.get_content_charset():
print
(msg.get_payload(None,True)).decode(msg.get_conten t_charset()).encode(outEnc)
else:
print msg.get_payload(None,True)
################################################## ##################################
# The main program

f = open(infile, "rb")
m1 = mailbox.UnixMailbox(f)

RubLst=[]
RubLst.append(["[contestru]","FOTSTR"])
RubLst.append(["[russiandx]","FORUDX"])

for msg in mailbox.UnixMailbox(f,email.message_from_file):
for rub in RubLst:
if string.find(my_get_header(msg['Subject']),rub[0]) != -1 :
print "SB "+rub[1]+"@FORUM < INET"
print my_get_header(msg['Subject']).encode(outEnc)
print
proc(msg)
print
print "powered by Python"
print "/EX"
Aug 27 '08 #1
0 841

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...
3
by: dont bother | last post by:
Hey, I have been trying to parse emails: But I could not find any examples or snippets of parsing emails in python from the documentation. Google did not help me much too. I am trying to...
0
by: Barry Warsaw | last post by:
Python 2.4 final will probably be released in a few hours so this seems like a good time to release the standalone email package, version 3.0 final. Unless there's some last second snafu, this...
19
by: 叮叮当当 | last post by:
hi, all when a email body consist with multipart/alternative, i must know when the boundary ends to parse it, but the email lib have not provide some function to indicate the boundary end,...
0
by: Li-fan Chen | last post by:
Hi, We work with email in a large CRM solution and one of the email-related tasks that has plagued us is our decision to make use of a 3rd-party local-sourcer to work on the parsing of inbound...
9
by: Jerim79 | last post by:
I am no PHP programmer. At my current job I made it known that I was no PHP programmer during the interview. Still they have given me a script to write with the understanding that it will take me a...
1
by: mneagul | last post by:
Hello, I want to use email.Parser for parsing some email messages but I have a small problem: it is very, very slow... For 110MB of email messages it takes ~29-30 seconds for parsing. Do you...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
1
by: Gerardo Herzig | last post by:
Hi all. Im trying to develop yet another email filter. Just for fun for now. Im having a little trouble parsing long 'To' and 'Cc' headers. Sometimes p.e. the 'To' header comes like ...
0
by: Ahmed, Shakir | last post by:
Thanks everyone who tried to help me to parse incoming email from an exchange server: Now, I am getting following error; I am not sure where I am doing wrong. I appreciate any help how to resolve...
1
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
0
by: veera ravala | last post by:
ServiceNow is a powerful cloud-based platform that offers a wide range of services to help organizations manage their workflows, operations, and IT services more efficiently. At its core, ServiceNow...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: mar23 | last post by:
Here's the situation. I have a form called frmDiceInventory with subform called subfrmDice. The subform's control source is linked to a query called qryDiceInventory. I've been trying to pick up the...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.