473,396 Members | 1,827 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Finding links in email

I'd like to find URLs inside of an email message. If there is anything
between the <a></a>, I'd like to also get that and associate it with the URL
in the <a> tag.

Content between the <a></a> might be text or an image. The <img> tag will
also have a URL, which I need to get. It will be associated with the <a>
tag's URL. When I say associate, I'll just store the two with some
relational ID into a database.

Besides brute regular expression parsing of the text, is there a better way
to extract this content?

Thanks,
Brett
Nov 21 '05 #1
4 1050
Brett,

The most nice however difficult way is in my opinon MSHTML it covers
completly the DOM.
(Do not set an import to it when you use it)

mshtml
http://msdn.microsoft.com/library/de...ng/hosting.asp

I hope this helps a little bit?

Cor
Nov 21 '05 #2
"Brett" <no@spam.com> schrieb:
I'd like to find URLs inside of an email message. If there is anything
between the <a></a>, I'd like to also get that and associate it with the
URL in the <a> tag.

Content between the <a></a> might be text or an image. The <img> tag will
also have a URL, which I need to get. It will be associated with the <a>
tag's URL. When I say associate, I'll just store the two with some
relational ID into a database.

Besides brute regular expression parsing of the text, is there a better
way to extract this content?


For parsing the HTML file:

MSHTML Reference
<URL:http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/mshtml/reference/reference.asp>

- or -

..NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML...
<URL:http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>

Download:

<URL:http://www.codefluent.com/smourier/download/htmlagilitypack.zip>

If the file read is in XHTML format, you can use the classes contained in
the 'System.Xml' namespace for reading information from the file.

As you already said, regular expressions can be used to do what you try to
archieve:

..NET Framework Developer's Guide -- Example: Scanning for 'HREF's
<URL:http://msdn.microsoft.com/library/en-us/cpguide/html/cpconexamplescanningforhrefs.asp>

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://dotnet.mvps.org/dotnet/faqs/>

Nov 21 '05 #3
Herfried,

That link was I searching for, I could not find it.

I have saved it now in my HKW system.

Thanks

Cor
Nov 21 '05 #4

"Herfried K. Wagner [MVP]" <hi***************@gmx.at> wrote in message
news:un**************@TK2MSFTNGP14.phx.gbl...
"Brett" <no@spam.com> schrieb:
I'd like to find URLs inside of an email message. If there is anything
between the <a></a>, I'd like to also get that and associate it with the
URL in the <a> tag.

Content between the <a></a> might be text or an image. The <img> tag
will also have a URL, which I need to get. It will be associated with
the <a> tag's URL. When I say associate, I'll just store the two with
some relational ID into a database.

Besides brute regular expression parsing of the text, is there a better
way to extract this content?


For parsing the HTML file:

MSHTML Reference
<URL:http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/mshtml/reference/reference.asp>

- or -

.NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML...
<URL:http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>

Download:

<URL:http://www.codefluent.com/smourier/download/htmlagilitypack.zip>

If the file read is in XHTML format, you can use the classes contained in
the 'System.Xml' namespace for reading information from the file.

As you already said, regular expressions can be used to do what you try to
archieve:

.NET Framework Developer's Guide -- Example: Scanning for 'HREF's
<URL:http://msdn.microsoft.com/library/en-us/cpguide/html/cpconexamplescanningforhrefs.asp>

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://dotnet.mvps.org/dotnet/faqs/>


I don't see how MSHTML, without about as much work as regular expressions,
is going to extract URLs and the text between <a> and </a>, along with that
association.

Thanks,
Brett
Nov 21 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Fuzzyman | last post by:
What's the best, cross platform, way of finding out the directory a script is run from ? I've googled a bit, but can't get a clear answer. On sys.argv the docs say : argv is the script name...
1
by: Deforgel | last post by:
Hello, Can anyone help me, I have a website using a SQL database that is called by ASP pages and I want the search engines to crawl through and index each of the items in the database. I've...
1
by: talyabn | last post by:
Hi, I'm trying to invoke the 'Broken Hyperlinks' option in the FrontPage application. The problem is that I get all the links in a given HTML page instead of getting only the broken links. ...
6
by: John Baker | last post by:
Hi: I wish to remove a field from a record, however when I try I am told that I cant remove it until I remove links to other records, and that the links will be showin in the links will show up...
9
by: Lauren Wilson | last post by:
Hi Folks, We've been using Crypto ++32 to control licensed access to our widely distributed Access 2K app. Unfortunately, Sampson Multimedia appears to be out of business. Does anyone out...
10
by: tshad | last post by:
I have a Datagrid with a column: <asp:HyperLinkColumn DataTextField="JobTitle" DataNavigateUrlField="PositionID" DataNavigateUrlFormatString="AddNewPositions.aspx?PositionID={0}"...
8
by: sristhrashguy | last post by:
Hi everyone, i want .net(VB or C#) code for finding broken links in a website. The requirement is that the user will be able to type the url in a text box so once the button is...
7
by: Nick | last post by:
Hi there, I have a website that functions fine locally, but when published to the server it develops a bottleneck during loading some of the pages. Basically what happens is the page loads to...
275
by: Astley Le Jasper | last post by:
Sorry for the numpty question ... How do you find the reference name of an object? So if i have this bob = modulename.objectname() how do i find that the name is 'bob'
5
by: BobRoyAce | last post by:
There is a web page that has a bunch of links on it (i.e. "<A HREF=..."). I am trying to automate clicking on one in a WebBrowser control. However when I execute the following code: Dim...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.