By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,242 Members | 1,090 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,242 IT Pros & Developers. It's quick & easy.

Extract HTML + Reg Ex

P: n/a
Ori
Hi,

I have a HTML text which I need to parse in order to extract data from
it.

My html contain a table contains few rows and two columns. I want to
extract the data from the 2nd column in the most efficient way (using
Reg Ex.) either than using the "indexOf" function of String.

Thanks,

Ori.

Here is the HTML table:

<table BORDER="1" CELLSPACING="0" CELLPADDING="1">
<tr>
<td>Licensee Name</td>
<td BGCOLOR="#ffffcc">JOHN Doo</td>
</tr>
<tr>
<td><a HREF=>Primary Status</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>License Number</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td><a >License Type</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Header</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Address</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>City State State Zip </td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
</table>
Nov 15 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Hi!

Try this:

// First split the HTML into Table Lines
string[] arrLines = Regex.Split(strContent, @"<tr.*?>",
RegexOptions.IgnoreCase);

// Go through each line
forearch (string strLine in arrLines)
{
// Split into Rows Array
string[] strCol = Regex.Split(strLine, @"<td.*?>",
RegexOptions.IgnoreCase);
// Remove HTML Tags?
strCol[1] = Regex.Replace(strCol[1], @"<[^>]*>", "");
// second Column
MessageBox.Show(strCol[1]);
}
Hope thats what you want!

Greetings

Matthias

or*******@hotmail.com (Ori) wrote in news:b431a203.0402111057.442f4545
@posting.google.com:
Hi,

I have a HTML text which I need to parse in order to extract data from
it.

My html contain a table contains few rows and two columns. I want to
extract the data from the 2nd column in the most efficient way (using
Reg Ex.) either than using the "indexOf" function of String.

Thanks,

Ori.

Here is the HTML table:

<table BORDER="1" CELLSPACING="0" CELLPADDING="1">
<tr>
<td>Licensee Name</td>
<td BGCOLOR="#ffffcc">JOHN Doo</td>
</tr>
<tr>
<td><a HREF=>Primary Status</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>License Number</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td><a >License Type</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Header</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Address</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>City State State Zip </td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
</table>


Nov 15 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.