471,852 Members | 1,054 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,852 software developers and data experts.

Extract HTML + Reg Ex

Ori
Hi,

I have a HTML text which I need to parse in order to extract data from
it.

My html contain a table contains few rows and two columns. I want to
extract the data from the 2nd column in the most efficient way (using
Reg Ex.) either than using the "indexOf" function of String.

Thanks,

Ori.

Here is the HTML table:

<table BORDER="1" CELLSPACING="0" CELLPADDING="1">
<tr>
<td>Licensee Name</td>
<td BGCOLOR="#ffffcc">JOHN Doo</td>
</tr>
<tr>
<td><a HREF=>Primary Status</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>License Number</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td><a >License Type</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Header</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Address</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>City State State Zip </td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
</table>
Nov 15 '05 #1
1 3359
Hi!

Try this:

// First split the HTML into Table Lines
string[] arrLines = Regex.Split(strContent, @"<tr.*?>",
RegexOptions.IgnoreCase);

// Go through each line
forearch (string strLine in arrLines)
{
// Split into Rows Array
string[] strCol = Regex.Split(strLine, @"<td.*?>",
RegexOptions.IgnoreCase);
// Remove HTML Tags?
strCol[1] = Regex.Replace(strCol[1], @"<[^>]*>", "");
// second Column
MessageBox.Show(strCol[1]);
}
Hope thats what you want!

Greetings

Matthias

or*******@hotmail.com (Ori) wrote in news:b431a203.0402111057.442f4545
@posting.google.com:
Hi,

I have a HTML text which I need to parse in order to extract data from
it.

My html contain a table contains few rows and two columns. I want to
extract the data from the 2nd column in the most efficient way (using
Reg Ex.) either than using the "indexOf" function of String.

Thanks,

Ori.

Here is the HTML table:

<table BORDER="1" CELLSPACING="0" CELLPADDING="1">
<tr>
<td>Licensee Name</td>
<td BGCOLOR="#ffffcc">JOHN Doo</td>
</tr>
<tr>
<td><a HREF=>Primary Status</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>License Number</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td><a >License Type</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Header</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Address</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>City State State Zip </td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
</table>


Nov 15 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Phong Ho | last post: by
1 post views Thread by Tim Smith | last post: by
10 posts views Thread by mark4 | last post: by
3 posts views Thread by rahman | last post: by
9 posts views Thread by flit | last post: by
1 post views Thread by rcamarda | last post: by
reply views Thread by YellowAndGreen | last post: by
aboka
reply views Thread by aboka | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.