473,387 Members | 3,801 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Repost: Can anyone help with this Regex problem?

I'm trying to figure out a regular expression that will match the
innermost tag and the contents in between. Specifically, the string
that I am attempting to match looks as follows:

....<table>...<table>...>Final<...</table>...</table>...

I want to match: <table>...>Final<...</table> from this example.

The string could also, of course, look like the following:

....<table>...<table>...</table>...<table>...>Final<...</table>...<table>...</table>...</table>...

I am looking for the innermost <table> </table> tags that have a
specific string in that table - in this case >Final<.

Any help would be greatly appreciated. If there are other newsgroups
dedicated to regular expressions I would be happy to redirect my post
there.

Thanks in advance,
Greg
Nov 20 '05 #1
5 1612
Hi Greg,

Why are you not using mshtml and process the text directly?

Cor
Nov 20 '05 #2

Try using or modifying the following expression:

<table>(?><table>(?<level>)|(?<contents-level>)</table>|.)*(?(level)(?!))</t
able>

This will give you the contents of the innermost table tags in the Captures
collection of the named group "contents". You could then just iterate
through them and find the ones that contain the string you are looking for.
You could probably modify this expression to match exactly what you want
without this step.
Hope this helps,

Brian Davis
http://www.knowdotnet.com

"Greg Vereschagin" <gr****@optonline.net> wrote in message
news:ko********************************@4ax.com...
I'm trying to figure out a regular expression that will match the
innermost tag and the contents in between. Specifically, the string
that I am attempting to match looks as follows:

...<table>...<table>...>Final<...</table>...</table>...

I want to match: <table>...>Final<...</table> from this example.

The string could also, of course, look like the following:

....<table>...<table>...</table>...<table>...>Final<...</table>...<table>...<
/table>...</table>...
I am looking for the innermost <table> </table> tags that have a
specific string in that table - in this case >Final<.

Any help would be greatly appreciated. If there are other newsgroups
dedicated to regular expressions I would be happy to redirect my post
there.

Thanks in advance,
Greg

Nov 20 '05 #3
Cor,

1) I want to learn about regular expressions. I wrote a lot of code
to extract data from HTML before I got that chapter in Balena's book,
using the VB string processing commands and now find that a few lines
of regex does the job of dozens lines of my current code.
2) A few months ago, I asked a more general question along the same
lines as the one you have responded to and it was suggested that
regex's were the way to go.
3) Please give me a suggestion as to how to use mshtl. I'm learning
VB.net partly as a hobby (although I have some things I would like to
use it for in my day job). I once was a professional programmer, and
here I'm really going to date myself, I spent 6 years at IBM writing
tons of Fortran. So....some aspects of programming I can hang in
there with anyone, but in other aspects (anything that's become
mainstream in the last 20 years say) I'm a newbie.

I am very appreciative of any help and guidance.

Greg

On Thu, 13 May 2004 15:24:44 +0200, "Cor Ligthert"
<no**********@planet.nl> wrote:
Hi Greg,

Why are you not using mshtml and process the text directly?

Cor


Nov 20 '05 #4
Greg,
The following sites provide a wealth of information on regular expressions.

A tutorial & reference on using regular expressions:
http://www.regular-expressions.info/

The MSDN's documentation on regular expressions:
http://msdn.microsoft.com/library/de...geElements.asp

Instead of writing your own parser or using RegEx, have you considered using
mshtml as Cor suggested or a SgmlReader (HTML reader)?

http://www.gotdotnet.com/Community/U...4-C3BD760564BC

Hope this helps
Jay

"Greg Vereschagin" <gr****@optonline.net> wrote in message
news:ko********************************@4ax.com...
I'm trying to figure out a regular expression that will match the
innermost tag and the contents in between. Specifically, the string
that I am attempting to match looks as follows:

...<table>...<table>...>Final<...</table>...</table>...

I want to match: <table>...>Final<...</table> from this example.

The string could also, of course, look like the following:

....<table>...<table>...</table>...<table>...>Final<...</table>...<table>...<
/table>...</table>...
I am looking for the innermost <table> </table> tags that have a
specific string in that table - in this case >Final<.

Any help would be greatly appreciated. If there are other newsgroups
dedicated to regular expressions I would be happy to redirect my post
there.

Thanks in advance,
Greg

Nov 20 '05 #5
Hi Greg,

I am one of those in this newsgroup who knows someting more about the
document object model.
DOM

When you are acting with HTML or better to say DHTML you have to know have
to know more about DHTML.

Using the DOM you can do OOP programming, while with the reged it is more in
a classic procedural way. (The regex is more something you find back in
scripting languages).

I have no problem to guide you a little bit, however before you see the
tools I think it is better to have a look at that Document Object Model.

The document object model is described by W3C however looking at that site
is in my opinion a endless way to go and you never find something because of
the impossible way everything is everytimge by someone described in his own
way.

On/in Msdn it is also hard to find however better. You can search using
always the keyword "Object".

This is the document object itself
http://msdn.microsoft.com/library/de...j_document.asp

The head object
http://msdn.microsoft.com/library/de...jects/head.asp
This is the body object
http://msdn.microsoft.com/library/de...jects/body.asp

Mshtml are the classes to access those objects in a OOP way. However it are
endless classes which when referenced in your program have endless members.

You never should import it in your IDE but always do the reference direct
before what you need as example mshtml.document2 bla bla

When you are busy with these classes in VS net you have to set at the help
the search path to all.

Have a look at those links

I hope this helps?

Cor
Nov 20 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Programatix | last post by:
Hi, I'm working on a project which includes WebServices and Windows Form application. The Windows Form application will call the WebServices to retrieve data from database. The data will be...
2
by: Ohaya | last post by:
Hi, We've been having a problem with one particular page that has a button on it, and a "tall" image (top-to-bottom). The button calls some simple Javascript to print the frame in which the...
14
by: Steve McLellan | last post by:
Hi, Sorry to repost, but this is becoming aggravating, and causing me a lot of wasted time. I've got a reasonably large mixed C++ project, and after a number of builds (but not a constant...
67
by: Scott M. | last post by:
Can anyone give me any ideas on why VS.NET 2003 running on XP Pro. (P4's with 1GB RAM) would take over 3 minutes to simply create a new ASP.NET Web Application on http://localhost? It seems that...
10
by: Extremest | last post by:
I know there are ways to make this a lot faster. Any newsreader does this in seconds. I don't know how they do it and I am very new to c#. If anyone knows a faster way please let me know. All...
3
by: Beavis | last post by:
I hate to repost a message, but I am still at the same point where I was when I originally posted, and hopefully someone else will see this one... Ok, so I have gone off and documented the...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.