473,378 Members | 1,449 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Regex question

Hi all, i was trying to parse some HTML and found myself in trouble with
some regex processing (which i have never done before).

What i am trying to do is to get content between two tags, including any
html code. I can do stuff like this:
"<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutely not</a>" obviously only
gets regular text content but no html tags, i wonder if someone could
enlighten me on which regex to use in order to get results "<really>Really
not<cool/><at>all</at>" and "Absolutely not" on the string
"<tag><tag2><a><really>Really
not<cool/><at>all</at></a></tag2>...<tag3><a>Absolutely
not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure whether
the site is XHTML compliant or not (as the example is no xml))

Should i process the content twice, or give up the regex approach for a
regular 'string index' parsing?
Thanks in advance
Jul 19 '05 #1
2 2164
remy,

How bout <a>(?<1>.+?)</a>
Ron
"remy rakic" <li****@spamhole.com> wrote in message
news:ea**************@TK2MSFTNGP11.phx.gbl...
Hi all, i was trying to parse some HTML and found myself in trouble with
some regex processing (which i have never done before).

What i am trying to do is to get content between two tags, including any
html code. I can do stuff like this:
"<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutely not</a>" obviously only gets regular text content but no html tags, i wonder if someone could
enlighten me on which regex to use in order to get results "<really>Really
not<cool/><at>all</at>" and "Absolutely not" on the string
"<tag><tag2><a><really>Really
not<cool/><at>all</at></a></tag2>...<tag3><a>Absolutely
not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure whether the site is XHTML compliant or not (as the example is no xml))

Should i process the content twice, or give up the regex approach for a
regular 'string index' parsing?
Thanks in advance

Jul 19 '05 #2
Aaah the non greedy option, now i know what it is used for. Thx ron, it
works like a charm !

"Ron Bullman" <ro********@mail.com> wrote in message
news:O5**************@TK2MSFTNGP11.phx.gbl...
remy,

How bout <a>(?<1>.+?)</a>
Ron
"remy rakic" <li****@spamhole.com> wrote in message
news:ea**************@TK2MSFTNGP11.phx.gbl...
Hi all, i was trying to parse some HTML and found myself in trouble with
some regex processing (which i have never done before).

What i am trying to do is to get content between two tags, including any
html code. I can do stuff like this:
"<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutely not</a>" obviously

only
gets regular text content but no html tags, i wonder if someone could
enlighten me on which regex to use in order to get results "<really>Really not<cool/><at>all</at>" and "Absolutely not" on the string
"<tag><tag2><a><really>Really
not<cool/><at>all</at></a></tag2>...<tag3><a>Absolutely
not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure

whether
the site is XHTML compliant or not (as the example is no xml))

Should i process the content twice, or give up the regex approach for a
regular 'string index' parsing?
Thanks in advance


Jul 19 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: engwar1 | last post by:
Not sure where to ask this. Please suggest another newsgroup if this isn't the best place for this question. I'm new to both vb.net and regex. I need a regular expression that will validate what...
4
by: aevans1108 | last post by:
expanding this message to microsoft.public.dotnet.xml Greetings Please direct me to the right group if this is an inappropriate place to post this question. Thanks. I want to format a...
2
by: Tim Conner | last post by:
Hi, Thanks to Peter, Chris and Steven who answered my previous answer about regex to split a string. Actually, it was as easy as create a regex with the pattern "/*-+()," and most of my string...
6
by: Du Dang | last post by:
Text: ===================== <script1> ***stuff A </script1> ***more stuff <script2> ***stuff B
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
5
by: Chris | last post by:
How Do I use the following auto-generated code from The Regulator? '------------------------------------------------------------------------------ ' <autogenerated> ' This code was generated...
6
by: Martin Evans | last post by:
Sorry, yet another REGEX question. I've been struggling with trying to get a regular expression to do the following example in Python: Search and replace all instances of "sleeping" with "dead"....
7
by: Extremest | last post by:
I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward...
6
by: Phil Barber | last post by:
I am using Regex to validate a file name. I have everything I need except I would like the dot(.) in the filename only to appear once. My question is it possible to allow one instance of character...
6
by: | last post by:
Hi all, Sorry for the lengthy post but as I learned I should post concise-and-complete code. So the code belows shows that the execution of ValidateAddress consumes a lot of time. In the test...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.