473,383 Members | 1,725 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Regex question

Hi all, i was trying to parse some HTML and found myself in trouble with
some regex processing (which i have never done before).

What i am trying to do is to get content between two tags, including any
html code. I can do stuff like this:
"<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutely not</a>" obviously only
gets regular text content but no html tags, i wonder if someone could
enlighten me on which regex to use in order to get results "<really>Really
not<cool/><at>all</at>" and "Absolutely not" on the string
"<tag><tag2><a><really>Really
not<cool/><at>all</at></a></tag2>...<tag3><a>Absolutely
not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure whether
the site is XHTML compliant or not (as the example is no xml))

Should i process the content twice, or give up the regex approach for a
regular 'string index' parsing?
Thanks in advance
Jul 19 '05 #1
2 2165
remy,

How bout <a>(?<1>.+?)</a>
Ron
"remy rakic" <li****@spamhole.com> wrote in message
news:ea**************@TK2MSFTNGP11.phx.gbl...
Hi all, i was trying to parse some HTML and found myself in trouble with
some regex processing (which i have never done before).

What i am trying to do is to get content between two tags, including any
html code. I can do stuff like this:
"<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutely not</a>" obviously only gets regular text content but no html tags, i wonder if someone could
enlighten me on which regex to use in order to get results "<really>Really
not<cool/><at>all</at>" and "Absolutely not" on the string
"<tag><tag2><a><really>Really
not<cool/><at>all</at></a></tag2>...<tag3><a>Absolutely
not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure whether the site is XHTML compliant or not (as the example is no xml))

Should i process the content twice, or give up the regex approach for a
regular 'string index' parsing?
Thanks in advance

Jul 19 '05 #2
Aaah the non greedy option, now i know what it is used for. Thx ron, it
works like a charm !

"Ron Bullman" <ro********@mail.com> wrote in message
news:O5**************@TK2MSFTNGP11.phx.gbl...
remy,

How bout <a>(?<1>.+?)</a>
Ron
"remy rakic" <li****@spamhole.com> wrote in message
news:ea**************@TK2MSFTNGP11.phx.gbl...
Hi all, i was trying to parse some HTML and found myself in trouble with
some regex processing (which i have never done before).

What i am trying to do is to get content between two tags, including any
html code. I can do stuff like this:
"<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutely not</a>" obviously

only
gets regular text content but no html tags, i wonder if someone could
enlighten me on which regex to use in order to get results "<really>Really not<cool/><at>all</at>" and "Absolutely not" on the string
"<tag><tag2><a><really>Really
not<cool/><at>all</at></a></tag2>...<tag3><a>Absolutely
not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure

whether
the site is XHTML compliant or not (as the example is no xml))

Should i process the content twice, or give up the regex approach for a
regular 'string index' parsing?
Thanks in advance


Jul 19 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: engwar1 | last post by:
Not sure where to ask this. Please suggest another newsgroup if this isn't the best place for this question. I'm new to both vb.net and regex. I need a regular expression that will validate what...
4
by: aevans1108 | last post by:
expanding this message to microsoft.public.dotnet.xml Greetings Please direct me to the right group if this is an inappropriate place to post this question. Thanks. I want to format a...
2
by: Tim Conner | last post by:
Hi, Thanks to Peter, Chris and Steven who answered my previous answer about regex to split a string. Actually, it was as easy as create a regex with the pattern "/*-+()," and most of my string...
6
by: Du Dang | last post by:
Text: ===================== <script1> ***stuff A </script1> ***more stuff <script2> ***stuff B
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
5
by: Chris | last post by:
How Do I use the following auto-generated code from The Regulator? '------------------------------------------------------------------------------ ' <autogenerated> ' This code was generated...
6
by: Martin Evans | last post by:
Sorry, yet another REGEX question. I've been struggling with trying to get a regular expression to do the following example in Python: Search and replace all instances of "sleeping" with "dead"....
7
by: Extremest | last post by:
I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward...
6
by: Phil Barber | last post by:
I am using Regex to validate a file name. I have everything I need except I would like the dot(.) in the filename only to appear once. My question is it possible to allow one instance of character...
6
by: | last post by:
Hi all, Sorry for the lengthy post but as I learned I should post concise-and-complete code. So the code belows shows that the execution of ValidateAddress consumes a lot of time. In the test...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.