Finding links in email - Visual Basic .NET

Brett

I'd like to find URLs inside of an email message. If there is anything
between the <a></a>, I'd like to also get that and associate it with the URL
in the <a> tag.

Content between the <a></a> might be text or an image. The <img> tag will
also have a URL, which I need to get. It will be associated with the <a>
tag's URL. When I say associate, I'll just store the two with some
relational ID into a database.

Besides brute regular expression parsing of the text, is there a better way
to extract this content?

Thanks,
Brett

Nov 21 '05 #1

Subscribe Post Reply

1050

Cor Ligthert

Brett,

The most nice however difficult way is in my opinon MSHTML it covers
completly the DOM.
(Do not set an import to it when you use it)

mshtml
http://msdn.microsoft.com/library/de...ng/hosting.asp

I hope this helps a little bit?

Cor

Nov 21 '05 #2

Herfried K. Wagner [MVP]

"Brett" <no@spam.com> schrieb:

I'd like to find URLs inside of an email message. If there is anything
between the <a></a>, I'd like to also get that and associate it with the
URL in the <a> tag.

Content between the <a></a> might be text or an image. The <img> tag will
also have a URL, which I need to get. It will be associated with the <a>
tag's URL. When I say associate, I'll just store the two with some
relational ID into a database.

Besides brute regular expression parsing of the text, is there a better
way to extract this content?

For parsing the HTML file:

MSHTML Reference
<URL:http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/mshtml/reference/reference.asp>

- or -

..NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML...
<URL:http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>

Download:

<URL:http://www.codefluent.com/smourier/download/htmlagilitypack.zip>

If the file read is in XHTML format, you can use the classes contained in
the 'System.Xml' namespace for reading information from the file.

As you already said, regular expressions can be used to do what you try to
archieve:

..NET Framework Developer's Guide -- Example: Scanning for 'HREF's
<URL:http://msdn.microsoft.com/library/en-us/cpguide/html/cpconexamplescanningforhrefs.asp>

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://dotnet.mvps.org/dotnet/faqs/>

Nov 21 '05 #3

Cor Ligthert

Herfried,

That link was I searching for, I could not find it.

I have saved it now in my HKW system.

Thanks

Cor

Nov 21 '05 #4

Brett

"Herfried K. Wagner [MVP]" <hi***************@gmx.at> wrote in message
news:un**************@TK2MSFTNGP14.phx.gbl...

"Brett" <no@spam.com> schrieb:
I'd like to find URLs inside of an email message. If there is anything
between the <a></a>, I'd like to also get that and associate it with the
URL in the <a> tag.

Content between the <a></a> might be text or an image. The <img> tag
will also have a URL, which I need to get. It will be associated with
the <a> tag's URL. When I say associate, I'll just store the two with
some relational ID into a database.

Besides brute regular expression parsing of the text, is there a better
way to extract this content?

For parsing the HTML file:

MSHTML Reference
<URL:http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/mshtml/reference/reference.asp>

- or -

.NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML...
<URL:http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>

Download:

<URL:http://www.codefluent.com/smourier/download/htmlagilitypack.zip>

If the file read is in XHTML format, you can use the classes contained in
the 'System.Xml' namespace for reading information from the file.

As you already said, regular expressions can be used to do what you try to
archieve:

.NET Framework Developer's Guide -- Example: Scanning for 'HREF's
<URL:http://msdn.microsoft.com/library/en-us/cpguide/html/cpconexamplescanningforhrefs.asp>

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://dotnet.mvps.org/dotnet/faqs/>

I don't see how MSHTML, without about as much work as regular expressions,
is going to extract URLs and the text between <a> and </a>, along with that
association.

Thanks,
Brett

Nov 21 '05 #5

Similar topics

Finding Script Directory

by: Fuzzyman | last post by:

What's the best, cross platform, way of finding out the directory a script is run from ? I've googled a bit, but can't get a clear answer. On sys.argv the docs say : argv is the script name...

Python

Search Engines Finding ASP Pages?!?!?

by: Deforgel | last post by:

Hello, Can anyone help me, I have a website using a SQL database that is called by ASP pages and I want the search engines to crawl through and index each of the items in the database. I've...

ASP / Active Server Pages

finding broken links using FrontPage automation

by: talyabn | last post by:

Hi, I'm trying to invoke the 'Broken Hyperlinks' option in the FrontPage application. The problem is that I get all the links in a given HTML page instead of getting only the broken links. ...

HTML / CSS

Finding Links that are not in Links Window

by: John Baker | last post by:

Hi: I wish to remove a field from a record, however when I try I am told that I cant remove it until I remove links to other records, and that the links will be showin in the links will show up...

Microsoft Access / VBA

Crypto is out of business! Need help finding another locking system.

by: Lauren Wilson | last post by:

Hi Folks, We've been using Crypto ++32 to control licensed access to our widely distributed Access 2K app. Unfortunately, Sampson Multimedia appears to be out of business. Does anyone out...

Microsoft Access / VBA

Finding Hyperlinkcolumn

by: tshad | last post by:

I have a Datagrid with a column: <asp:HyperLinkColumn DataTextField="JobTitle" DataNavigateUrlField="PositionID" DataNavigateUrlFormatString="AddNewPositions.aspx?PositionID={0}"...

ASP.NET

Finding Broken Link in WebSite

by: sristhrashguy | last post by:

Hi everyone, i want .net(VB or C#) code for finding broken links in a website. The requirement is that the user will be able to type the url in a text box so once the button is...

.NET Framework

Finding Bottlenecks

by: Nick | last post by:

Hi there, I have a website that functions fine locally, but when published to the server it develops a bottleneck during loading some of the pages. Basically what happens is the page loads to...

ASP.NET

275

Finding the instance reference of an object

by: Astley Le Jasper | last post by:

Sorry for the numpty question ... How do you find the reference name of an object? So if i have this bob = modulename.objectname() how do i find that the name is 'bob'

Python

Finding all <A> elements on a page

by: BobRoyAce | last post by:

There is a web page that has a bunch of links on it (i.e. "<A HREF=..."). I am trying to automate clicking on one in a WebBrowser control. However when I execute the following code: Dim...

Visual Basic .NET

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General