473,770 Members | 1,952 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Repost: Can anyone help with this Regex problem?

I'm trying to figure out a regular expression that will match the
innermost tag and the contents in between. Specifically, the string
that I am attempting to match looks as follows:

....<table>...< table>...>Final <...</table>...</table>...

I want to match: <table>...>Fina l<...</table> from this example.

The string could also, of course, look like the following:

....<table>...< table>...</table>...<table >...>Final<.. .</table>...<table >...</table>...</table>...

I am looking for the innermost <table> </table> tags that have a
specific string in that table - in this case >Final<.

Any help would be greatly appreciated. If there are other newsgroups
dedicated to regular expressions I would be happy to redirect my post
there.

Thanks in advance,
Greg
Nov 20 '05 #1
5 1658
Hi Greg,

Why are you not using mshtml and process the text directly?

Cor
Nov 20 '05 #2

Try using or modifying the following expression:

<table>(?><tabl e>(?<level>)|(? <contents-level>)</table>|.)*(?(le vel)(?!))</t
able>

This will give you the contents of the innermost table tags in the Captures
collection of the named group "contents". You could then just iterate
through them and find the ones that contain the string you are looking for.
You could probably modify this expression to match exactly what you want
without this step.
Hope this helps,

Brian Davis
http://www.knowdotnet.com

"Greg Vereschagin" <gr****@optonli ne.net> wrote in message
news:ko******** *************** *********@4ax.c om...
I'm trying to figure out a regular expression that will match the
innermost tag and the contents in between. Specifically, the string
that I am attempting to match looks as follows:

...<table>...<t able>...>Final< ...</table>...</table>...

I want to match: <table>...>Fina l<...</table> from this example.

The string could also, of course, look like the following:

....<table>...< table>...</table>...<table >...>Final<.. .</table>...<table >...<
/table>...</table>...
I am looking for the innermost <table> </table> tags that have a
specific string in that table - in this case >Final<.

Any help would be greatly appreciated. If there are other newsgroups
dedicated to regular expressions I would be happy to redirect my post
there.

Thanks in advance,
Greg

Nov 20 '05 #3
Cor,

1) I want to learn about regular expressions. I wrote a lot of code
to extract data from HTML before I got that chapter in Balena's book,
using the VB string processing commands and now find that a few lines
of regex does the job of dozens lines of my current code.
2) A few months ago, I asked a more general question along the same
lines as the one you have responded to and it was suggested that
regex's were the way to go.
3) Please give me a suggestion as to how to use mshtl. I'm learning
VB.net partly as a hobby (although I have some things I would like to
use it for in my day job). I once was a professional programmer, and
here I'm really going to date myself, I spent 6 years at IBM writing
tons of Fortran. So....some aspects of programming I can hang in
there with anyone, but in other aspects (anything that's become
mainstream in the last 20 years say) I'm a newbie.

I am very appreciative of any help and guidance.

Greg

On Thu, 13 May 2004 15:24:44 +0200, "Cor Ligthert"
<no**********@p lanet.nl> wrote:
Hi Greg,

Why are you not using mshtml and process the text directly?

Cor


Nov 20 '05 #4
Greg,
The following sites provide a wealth of information on regular expressions.

A tutorial & reference on using regular expressions:
http://www.regular-expressions.info/

The MSDN's documentation on regular expressions:
http://msdn.microsoft.com/library/de...geElements.asp

Instead of writing your own parser or using RegEx, have you considered using
mshtml as Cor suggested or a SgmlReader (HTML reader)?

http://www.gotdotnet.com/Community/U...4-C3BD760564BC

Hope this helps
Jay

"Greg Vereschagin" <gr****@optonli ne.net> wrote in message
news:ko******** *************** *********@4ax.c om...
I'm trying to figure out a regular expression that will match the
innermost tag and the contents in between. Specifically, the string
that I am attempting to match looks as follows:

...<table>...<t able>...>Final< ...</table>...</table>...

I want to match: <table>...>Fina l<...</table> from this example.

The string could also, of course, look like the following:

....<table>...< table>...</table>...<table >...>Final<.. .</table>...<table >...<
/table>...</table>...
I am looking for the innermost <table> </table> tags that have a
specific string in that table - in this case >Final<.

Any help would be greatly appreciated. If there are other newsgroups
dedicated to regular expressions I would be happy to redirect my post
there.

Thanks in advance,
Greg

Nov 20 '05 #5
Hi Greg,

I am one of those in this newsgroup who knows someting more about the
document object model.
DOM

When you are acting with HTML or better to say DHTML you have to know have
to know more about DHTML.

Using the DOM you can do OOP programming, while with the reged it is more in
a classic procedural way. (The regex is more something you find back in
scripting languages).

I have no problem to guide you a little bit, however before you see the
tools I think it is better to have a look at that Document Object Model.

The document object model is described by W3C however looking at that site
is in my opinion a endless way to go and you never find something because of
the impossible way everything is everytimge by someone described in his own
way.

On/in Msdn it is also hard to find however better. You can search using
always the keyword "Object".

This is the document object itself
http://msdn.microsoft.com/library/de...j_document.asp

The head object
http://msdn.microsoft.com/library/de...jects/head.asp
This is the body object
http://msdn.microsoft.com/library/de...jects/body.asp

Mshtml are the classes to access those objects in a OOP way. However it are
endless classes which when referenced in your program have endless members.

You never should import it in your IDE but always do the reference direct
before what you need as example mshtml.document 2 bla bla

When you are busy with these classes in VS net you have to set at the help
the search path to all.

Have a look at those links

I hope this helps?

Cor
Nov 20 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2024
by: Programatix | last post by:
Hi, I'm working on a project which includes WebServices and Windows Form application. The Windows Form application will call the WebServices to retrieve data from database. The data will be returned as DataSet. Now, here's the problem. On .NET Framework 1.1, if any rows in the dataset returned contain errors (marked by calling the SetColumnError() method or
2
1811
by: Ohaya | last post by:
Hi, We've been having a problem with one particular page that has a button on it, and a "tall" image (top-to-bottom). The button calls some simple Javascript to print the frame in which the image is located, and what is happening in the field is that the image only gets partly printed. Only the first page gets printed, and the bottom of the image, which should get printed on a 2nd page, is not printed. Also, the bottom of the first...
14
2837
by: Steve McLellan | last post by:
Hi, Sorry to repost, but this is becoming aggravating, and causing me a lot of wasted time. I've got a reasonably large mixed C++ project, and after a number of builds (but not a constant number) linking (and sometimes compiling) becomes immensely slow, and task manager shows that link.exe (or cl.exe) is barely using any processor time, but an awful lot of RAM (around 150-200MB). I'm going to keep an eye on page faults since I can't...
67
2955
by: Scott M. | last post by:
Can anyone give me any ideas on why VS.NET 2003 running on XP Pro. (P4's with 1GB RAM) would take over 3 minutes to simply create a new ASP.NET Web Application on http://localhost? It seems that the IIS directory gets created right away, but it is not configured as an application directory until several minutes go by. Thanks, Scott M.
10
2184
by: Extremest | last post by:
I know there are ways to make this a lot faster. Any newsreader does this in seconds. I don't know how they do it and I am very new to c#. If anyone knows a faster way please let me know. All I am doing is quering the db for all the headers for a certain group and then going through them to find all the parts of each post. I only want ones that are complete. Meaning all segments for that one file posted are there. using System;
3
3005
by: Beavis | last post by:
I hate to repost a message, but I am still at the same point where I was when I originally posted, and hopefully someone else will see this one... Ok, so I have gone off and documented the lifecycle of a page with a custom composite control on it. You can find that document here: http://www.ats-engineers.com/lifecycle.htm
0
9617
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10254
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10036
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9904
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8929
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7451
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6710
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4007
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3607
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.