Hi,
I've seen some postings on this but not exactly relating to this
posting. I'm reading in a large mail message as a string. In the
string is an xml attachment that I need to parse out and remove from
the message once processed. I have to do this as a string and not
using any CDO libraries. My problem is that there's normally a large
pdf in the file so when I read the file in it's massive and I don't
knwo if the XML is at the start/middle or end of the string. My regex
is as follows:
Regex rXMLPart = new Regex(
@"(?<Start>.*)(?<Middle>Content-Type:[^.*?]text\/xml.*?finaldistributeinformation.*?\<\/distributionList\>)(?<End>.*)",
RegexOptions.IgnoreCase |
RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace);
and a sample of the string is:
-----------------------
Message-ID: <00****************************@csfb.csgroup.com >
From: "Test" <te**@test.com>
To: <>
Subject: This is a test subject
Date: Thu, 2 Sep 2004 16:58:12 +0100
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_NextPart_000_0005_01C4910E.083D9600"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1409
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409
This is a multi-part message in MIME format.
------=_NextPart_000_0005_01C4910E.083D9600
Content-Type: multipart/alternative;
boundary="----=_NextPart_001_0006_01C4910E.083D9600"
------=_NextPart_001_0006_01C4910E.083D9600
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
This is some body text.
-mark.
------=_NextPart_001_0006_01C4910E.083D9600
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1458" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>This is some body text.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>***</FONT></DIV></BODY></HTML>
------=_NextPart_000_0005_01C4910E.083D9600
Content-Type: text/xml;
name="DO_NOT_DELETE_EMAIL_ATTACHMENT.XML"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="DO_NOT_DELETE_EMAIL_ATTACHMENT.XML"
<?xml version="1.0" encoding="UTF-8"?>
<distributionList>
</distributionList>
------=_NextPart_000_0005_01C4910E.083D9600
Content-Type: application/pdf;
name="Reader.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="Reader.pdf"
JVBERi0xLjUNJeLjz9MNCjkxOTUgMCBvYmo8PC9IWzQzMzk2ID M5MzJdL0xpbmVhcml6ZWQgMS9F
IDEyMjMzNi9MIDE1NTUzMDcvTiAxNzkvTyA5MTk5L1QgMTM3MT M2Mz4+DWVuZG9iag0gICAgICAg
IA14cmVmDTkxOTUgMzYNMDAwMDAwMDAxNiAwMDAwMCBuDQowMD AwMDQ3Njc5IDAwMDAwIG4NCjAw
MDAwNDMzOTYgMDAwMDAgbg0KMDAwMDA0NzkzNSAwMDAwMCBuDQ owMDAwMDQ3OTk5IDAwMDAwIG4N
CjAwMDAwNDgyNzYgMDAwMDAgbg0KMDAwMDA0ODMyNyAwMDAwMC BuDQowMDAwMDQ4NjMwIDAwMDAw
IG4NCjAwMDAwNTM0ODAgMDAwMDAgbg0KMDAwMDA1MzUxNiAwMD AwMCBuDQowMDAwMDUzOTUyIDAw
.......
------------------------
I've cut the string short but that is the jist of it. If I were to run
against this attached string it all works fine but when really large
(with the rest of the pdf in) the match hangs:
Match mXMLPersonalisation = rXMLPart.Match(data);
Could anyone suggest a better way that I should do this. I need to get
the first part and the last part and join thus removing the XML part.
I also need to work on the XML to creat the new messages.
i.e.
string sStartPartOfEmailMessage =
mXMLPersonalisation.Groups["Start"].ToString();
string sXMLPartOfMessage =
mXMLPersonalisation.Groups["Middle"].ToString();;
string sEndPartOfEmailMessage =
mXMLPersonalisation.Groups["End"].ToString();;
SendXMLEmail(sStartPartOfEmailMessage, sXMLPartOfMessage,
sEndPartOfEmailMessage);
Any help would be much appreciated.
-mark.