469,643 Members | 1,847 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,643 developers. It's quick & easy.

Extract HTML + Reg Ex

Ori
Hi,

I have a HTML text which I need to parse in order to extract data from
it.

My html contain a table contains few rows and two columns. I want to
extract the data from the 2nd column in the most efficient way (using
Reg Ex.) either than using the "indexOf" function of String.

Thanks,

Ori.

Here is the HTML table:

<table BORDER="1" CELLSPACING="0" CELLPADDING="1">
<tr>
<td>Licensee Name</td>
<td BGCOLOR="#ffffcc">JOHN Doo</td>
</tr>
<tr>
<td><a HREF=>Primary Status</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>License Number</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td><a >License Type</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Header</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Address</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>City State State Zip </td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
</table>
Nov 15 '05 #1
1 3273
Hi!

Try this:

// First split the HTML into Table Lines
string[] arrLines = Regex.Split(strContent, @"<tr.*?>",
RegexOptions.IgnoreCase);

// Go through each line
forearch (string strLine in arrLines)
{
// Split into Rows Array
string[] strCol = Regex.Split(strLine, @"<td.*?>",
RegexOptions.IgnoreCase);
// Remove HTML Tags?
strCol[1] = Regex.Replace(strCol[1], @"<[^>]*>", "");
// second Column
MessageBox.Show(strCol[1]);
}
Hope thats what you want!

Greetings

Matthias

or*******@hotmail.com (Ori) wrote in news:b431a203.0402111057.442f4545
@posting.google.com:
Hi,

I have a HTML text which I need to parse in order to extract data from
it.

My html contain a table contains few rows and two columns. I want to
extract the data from the 2nd column in the most efficient way (using
Reg Ex.) either than using the "indexOf" function of String.

Thanks,

Ori.

Here is the HTML table:

<table BORDER="1" CELLSPACING="0" CELLPADDING="1">
<tr>
<td>Licensee Name</td>
<td BGCOLOR="#ffffcc">JOHN Doo</td>
</tr>
<tr>
<td><a HREF=>Primary Status</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>License Number</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td><a >License Type</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Header</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Address</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>City State State Zip </td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
</table>


Nov 15 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Phong Ho | last post: by
1 post views Thread by Tim Smith | last post: by
10 posts views Thread by mark4 | last post: by
3 posts views Thread by rahman | last post: by
9 posts views Thread by flit | last post: by
1 post views Thread by rcamarda | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.