473,411 Members | 2,031 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,411 software developers and data experts.

Parsing Raw email data

Chrisatnetronix
Currently I need a email piping to a php script then I need to separate into variables

$from
$subject
$message


I then need to take the above variables and insert them into a mysql database

I wrote a parser php script that does ok, but it seems to work in

thunderbird email client
comcast webmail

if I use outlook it includes a bunch of encryption code

and yahoo and gmail make the message include a bunch of things like this:

--000e0cd24e8a6d9440048d2e4f27
Content-Type: text/plain; charset=ISO-8859-1

messtest

--000e0cd24e8a6d9440048d2e4f27
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable


if the $message is "messtest" that's all i need not the other stuff for message

the same is with the $from i need:

just test@gmail.com (example output)

I instead get:

tester <test@gmail.com>


and it seems the output from each variable varies via different mail servers.

I need a universal parser so I can get the variables I need no matter what mail server they use.

here is the code I have made using it to parse then testing it by sending it to a text file to view.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/php -q
  2. <?php
  3. // read from stdin
  4. $fp = fopen("php://stdin", "r");
  5. $email = "";
  6. while (!feof($fp)) {
  7. $email .= fgets($fp, 1024);
  8. }
  9. fclose($fp);
  10. // handle email
  11. $lines = explode("\n", $email);
  12. // empty vars
  13. $from = "";
  14. $subject = "";
  15. $headers = "";
  16. $message = "";
  17. $splittingheaders = true;
  18.  
  19. for ($i=0; $i < count($lines); $i++) {
  20. if ($splittingheaders) {
  21. // this is a header
  22. $headers .= $lines[$i]."\n";
  23. // look out for special headers
  24. if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) {
  25. $subject = $matches[1];
  26. }
  27. if (preg_match("/^From: (.*)/", $lines[$i], $matches)) {
  28. $from = $matches[1];
  29. }
  30. } else {
  31. // not a header, but message
  32. $message .= $lines[$i]."\n";
  33. }
  34. if (trim($lines[$i])=="") {
  35. // empty line, header section has ended
  36. $splittingheaders = false;
  37.  
  38.  
  39. }
  40. }
  41.  
  42.  
  43.  
  44.  
  45. //write mail to file
  46. //emails.txt is chmod 777
  47. $out = fopen("emails.txt","a+");
  48. fwrite($out, $message);
  49. fclose($out);
  50.  
  51.  
  52.  
  53. ?>
  54.  
Attached Files
File Type: zip itupdate.zip (590 Bytes, 167 views)
Aug 7 '10 #1
1 6753
Here is a resolution that fixes 99 percent of the parsing issues:

this works and has been tested in AOL, Yahoo, MSN, Gmail OutLook, and Thunderbird.

The only one that still has some raw code is gmail and it only shows this in the message:

--0016367658309708d3048d428b34
Content-Type: text/plain; charset=ISO-8859-1

no ohters show any thing:

simply add this code under but before the text file write part of the code I supplied above.

Expand|Select|Wrap|Line Numbers
  1. preg_match("/boundary=\".*?\"/i", $headers, $boundary);
  2. $boundaryfulltext = $boundary[0];
  3.  
  4. if ($boundaryfulltext!="")
  5. {
  6. $find = array("/boundary=\"/i", "/\"/i");
  7. $boundarytext = preg_replace($find, "", $boundaryfulltext);
  8. $splitmessage = explode("--" . $boundarytext, $message);
  9. $fullmessage = ltrim($splitmessage[1]);
  10. preg_match('/\n\n(.*)/is', $fullmessage, $splitmore);
  11.  
  12. if (substr(ltrim($splitmore[0]), 0, 2)=="--")
  13. {
  14. $actualmessage = $splitmore[0];
  15. }
  16. else
  17. {
  18. $actualmessage = ltrim($splitmore[0]);
  19. }
  20.  
  21. }
  22. else
  23. {
  24. $actualmessage = ltrim($message);
  25. }
  26.  
  27. $clean = array("/\n--.*/is", "/=3D\n.*/s");
  28. $cleanmessage = trim(preg_replace($clean, "", $actualmessage)); 
then after that you can install your insert into mysql code or whatever you like.


I must admit parsing raw email universally ain't easy.....
Aug 7 '10 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: madhununna | last post by:
I'm trying to parse string data using castor and it looks like the new lines are getting lost. I'm using castor 0.9.4.3 For example: Input <INPUT> First line
1
by: Steve | last post by:
Hi, I am new to XML and PHP and have a question that I hope someone could answer. Some background on my problem. I am receiving an XML message over TCP/IP and need to access data within the...
3
by: Ronnie | last post by:
Hi All, A newbie here having a hard time figuring out how to parse out the City, State and Zip from a text field. I have a text field called "Registration" with a size of 40. In this field...
1
by: Bart de Visser | last post by:
Hello to all, i'm a newby with Access, and that is probably the reason that i need your help. I made a database for a facilities department, and i like to email data from the input screen to a...
10
by: Tony | last post by:
I'm wondering if anyone has run any tests to compare the speed of parsing XML vs text in simple lists - such as: <?xml version="1.0" encoding="ISO-8859-1"?> <users> <user>User 1</user>...
2
by: irishdudeinusa | last post by:
Hello Everyone, I have been working a webservice where I can use it in other applications that I am working on. However, I am running into a problem with the data returned and I was wondering if...
2
by: RG | last post by:
I am having trouble parsing the data I need from a Serial Port Buffer. I am sending info to a microcontroller that is being echoed back that I need to remove before I start the actual important...
3
by: Steven Allport | last post by:
I am working on processing eml email message using the email module (python 2.5), on files exported from an Outlook PST file, to extract the composite parts of the email. In most instances this...
2
by: mvlt | last post by:
I would like to set-up an email data collection to gather volunteer hours from our volunteers. Is there a way to auto-populate a field if I use the HTML format. Each volunteer has a Contact ID, but...
0
by: Ahmed, Shakir | last post by:
Thanks everyone who tried to help me to parse incoming email from an exchange server: Now, I am getting following error; I am not sure where I am doing wrong. I appreciate any help how to resolve...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.