473,385 Members | 1,353 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Regex Matching Question

Consider this excerpt from some HTML. (This is a copy from View->Source,
except for the comment)

<TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0 border=0>
<?xml version="1.0" encoding="UTF-16"?>
<!-- need to extract whatever is here -->
</TABLE>

I need to extract all the HTML that would be in the <!-- need to extract
whatever is here --> section. So I did the following.

1. Retrieve the HTML into a string variable
Interesting observation: when I look at the contents of the string, every
double quote has been escaped, so they all show as \" instead of "

2. Remove carriage returns and newlines from the string
ResultHtml = ResultHtml.Replace("\r", string.Empty);
ResultHtml = ResultHtml.Replace("\n", string.Empty);

3. Use a Regex to try and find a match

string sFind = "<TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0 border=0><?xml
version=\"1.0\" encoding=\"UTF-16\"?>" + ((.|\n)*?) + "</TABLE>";
Regex rx = new Regex(sFind,
RegexOptions.IgnoreCase|RegexOptions.IgnorePattern Whitespace);
Match m1 = rx.Match(ResultHtml);
if (m1.Success)
// do something
I never get a match ... I tried this with some simpler HTML and the regex
works fine to retrieve what was between two table tags

I also tried stripping all double quotes from ResultHtml, and them trying:

string sFind = "<TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0 border=0><?xml
version=1.0 encoding=UTF-16?>" + ((.|\n)*?) + "</TABLE>";

Still no match..

The string in my HTML which I'm trying to match exists exactly as in sFind.

Any idea?

Nov 18 '05 #1
1 1584
George, try this
using System.Text.RegularExpressions;

Regex regex = new Regex(
@"(?<=.*?<!--\s*)(.*?)(?=\s*-->)",
RegexOptions.IgnoreCase
| RegexOptions.Multiline
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
Alexey

"George Durzi" <gd****@hotmail.com> wrote in message
news:%2******************@TK2MSFTNGP11.phx.gbl...
Consider this excerpt from some HTML. (This is a copy from View->Source,
except for the comment)

<TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0 border=0>
<?xml version="1.0" encoding="UTF-16"?>
<!-- need to extract whatever is here -->
</TABLE>

I need to extract all the HTML that would be in the <!-- need to extract
whatever is here --> section. So I did the following.

1. Retrieve the HTML into a string variable
Interesting observation: when I look at the contents of the string, every
double quote has been escaped, so they all show as \" instead of "

2. Remove carriage returns and newlines from the string
ResultHtml = ResultHtml.Replace("\r", string.Empty);
ResultHtml = ResultHtml.Replace("\n", string.Empty);

3. Use a Regex to try and find a match

string sFind = "<TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0 border=0><?xml version=\"1.0\" encoding=\"UTF-16\"?>" + ((.|\n)*?) + "</TABLE>";
Regex rx = new Regex(sFind,
RegexOptions.IgnoreCase|RegexOptions.IgnorePattern Whitespace);
Match m1 = rx.Match(ResultHtml);
if (m1.Success)
// do something
I never get a match ... I tried this with some simpler HTML and the regex
works fine to retrieve what was between two table tags

I also tried stripping all double quotes from ResultHtml, and them trying:

string sFind = "<TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0 border=0><?xml version=1.0 encoding=UTF-16?>" + ((.|\n)*?) + "</TABLE>";

Still no match..

The string in my HTML which I'm trying to match exists exactly as in sFind.
Any idea?

Nov 18 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: aeuglein | last post by:
Hello! I have this RegEx: /(+:\/\/+)/i Now, I want to exlude on the end of a String the formats .gif / .jpg / ..png / .exe / .zip / .rar How I can this add to my regex ?
7
by: alphatan | last post by:
Is there relative source or document for this purpose? I've searched the index of "Mastering Regular Expression", but cannot get the useful information for C. Thanks in advanced. -- Learning...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
5
by: Kofi | last post by:
Any takers? Got a string of DNA as an input sequence GGATGGATG, apply the simple regex "GGATG" as in Regex r = new Regex("GGATG", (RegexOptions.Compiled)); MatchCollection matches =...
8
by: Bob | last post by:
I need to create a Regex to extract all strings (including quotations) from a C# or C++ source file. After being unsuccessful myself, I found this sample on the internet: ...
7
by: CB | last post by:
Trying to match the entire following object literal code using a RegEx. var Punctuators = { '{' : 'LeftCurly', '}' : 'RightCurly' } Variations on the idea of using /var.*{.*}/ of course stops...
6
by: Martin Evans | last post by:
Sorry, yet another REGEX question. I've been struggling with trying to get a regular expression to do the following example in Python: Search and replace all instances of "sleeping" with "dead"....
0
by: Tidane | last post by:
Visual Basic.NET Framework 2.0 I've created a program to parse out text as the program recieved it and use Regex matching to decide what should be done. My problem is that the text is matching when...
4
by: pedrito | last post by:
I have a regex question and it never occurred to me to ask here, until I saw Jesse Houwing's quick response to Phil for his Regex question. I have some filenames that I'm trying to parse out of...
3
by: Jeff | last post by:
I'm parsing this: name="value" and sometimes it looks like this: name2="value2 without the closing '"'. I don't want to capture the end quote.
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.