473,386 Members | 1,736 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Regex: Capturing HTML

I am trying to strip the outermost html tag by capturing this tag with regex
and then using the string replace function to replace it with an empty
string. while stepping through the code, RegEx returns the entire input
string although testing this in The Regulator returns just what I want.

What am I doing wrong here?
***********************************************
Regex regX;
RegexOptions options = (RegexOptions.Multiline | RegexOptions.IgnoreCase);
Match rMatch;
string sX,sTag;
string regexOpening = "(?:^)(<[a-zA-Z]*\\s*[a-zA-Z0-9=\'\" ]*>).*$";
string regexClosing = "(<\\s*/[a-zA-Z0-9]*\\s*>)\\s*$\r\n";
//code removed for clarity: return a datareader here...
{
while(r.Read()){
sX = r["HeaderHTML"].ToString().Trim();
regX = new Regex(regexOpening,options);
rMatch= regX.Match(sX);
sTag = rMatch.Value.ToString(); //this returns the entire string!!
sX = sX.Replace(sTag,"");
some sample input:
1. <TH colspan=2 align="left"><IMG src="Images/KCbanner_header.jpg"
width="800" height="90"></TH>

Nov 17 '05 #1
1 3403
Here's an easy one for you:

(?i)(?s)(?:<html>)(.*)(?=</html>)

This matches both the <html> tag (non-case-sensitive), and the body, prior
to the </html> tag. The body (after the <html> tag is in Group 1.

string HtmlDocument = "..."; // whatever it is
Regex r = new RegeX(@"(?i)(?s)(?:<html>)(.*)(?=</html>)");
return r.Match(HtmlDocument).Groups[1].Value;
--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"kevin" <ke***@discussions.microsoft.com> wrote in message
news:45**********************************@microsof t.com...
I am trying to strip the outermost html tag by capturing this tag with
regex
and then using the string replace function to replace it with an empty
string. while stepping through the code, RegEx returns the entire input
string although testing this in The Regulator returns just what I want.

What am I doing wrong here?
***********************************************
Regex regX;
RegexOptions options = (RegexOptions.Multiline | RegexOptions.IgnoreCase);
Match rMatch;
string sX,sTag;
string regexOpening = "(?:^)(<[a-zA-Z]*\\s*[a-zA-Z0-9=\'\" ]*>).*$";
string regexClosing = "(<\\s*/[a-zA-Z0-9]*\\s*>)\\s*$\r\n";
//code removed for clarity: return a datareader here...
{
while(r.Read()){
sX = r["HeaderHTML"].ToString().Trim();
regX = new Regex(regexOpening,options);
rMatch= regX.Match(sX);
sTag = rMatch.Value.ToString(); //this returns the entire string!!
sX = sX.Replace(sTag,"");
some sample input:
1. <TH colspan=2 align="left"><IMG src="Images/KCbanner_header.jpg"
width="800" height="90"></TH>

Nov 17 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Andrew Dixon | last post by:
Hi Everyone. Ok I have a problem getting the following regex to work in Java. <script*>(.|\r|\n)+?</script> It works fine in EditPad Pro but in Java it causes the following error message...
33
by: Joerg Schuster | last post by:
Hello, Python regular expressions must not have more than 100 capturing groups. The source code responsible for this reads as follows: # XXX: <fl> get rid of this limitation! if...
6
by: Dave | last post by:
I'm struggling with something that should be fairly simple. I just don't know the regext syntax very well, unfortunately. I'd like to parse words out of what is basically a boolean search...
7
by: lgbjr | last post by:
Hi All, I'm trying to split a string on every character. The string happens to be a representation of a hex number. So, my regex expression is (). Seems simple, but for some reason, I'm not...
6
by: Talin | last post by:
I've run in to this problem a couple of times. Say I have a piece of text that I want to test against a large number of regular expressions, where a different action is taken based on which regex...
17
by: Mark | last post by:
I must create a routine that finds tokens in small, arbitrary VB code snippets. For example, it might have to find all occurrences of {Formula} I was thinking that using regular expressions...
7
by: Nightcrawler | last post by:
Hi all, I am trying to use regular expressions to parse out mp3 titles into three different groups (artist, title and remix). I currently have three ways to name a mp3 file: Artist - Title ...
2
by: GS | last post by:
How can one avoid capturing leading empty or blank lines? the data I deal with look like this "will be paid on the dates you specified. xyz supplier amount: $100.52 when: September 07,...
1
by: =?Utf-8?B?QWxCcnVBbg==?= | last post by:
I have a regular expression for capturing all occurrences of words contained between {{ and }} in a file. My problem is I need to capture what is between those symbols. For instance, if I have...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.