473,326 Members | 2,099 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

return match using regex

Gang,

I have been working on this for a few hours and am frustrated
beyond all extent. I have tried to research this on the web as well
with no success. I am trying to match certain contents within a
wrapper div. So for example if the inside of the wrapper div was the
following:

<div id="wrapper">
<a href="#">a great link that contain text and symbols</a>
<div... </div>
<div... </div>
</div>

I would like to strip out all the internal div's. But because there
can be alot of internal div's, I figured it would be less processor
intensive to just match the first 'a' tag and repopulate the wrapper
div with the match. I am trying to use something like the following
regex:
re = /^<a(.+)</a>/;

with the following statment:

$temp = document.getElementById('wrapper').innerHTML.match (re);

but this is returning the entire contents of the wrapper div. I have
tried variations of the regex and either continue to get the entire
contents or null returns. Any help would greatly be appreciated.
BTW, I can't match to the first \n because the contents may be
touching (ie ...</a><div>...).

Thanks,
Dave

May 18 '07 #1
3 2260
On May 18, 11:24 am, hende...@gmail.com wrote:
Gang,

I have been working on this for a few hours and am frustrated
beyond all extent. I have tried to research this on the web as well
with no success. I am trying to match certain contents within a
wrapper div. So for example if the inside of the wrapper div was the
following:

<div id="wrapper">
<a href="#">a great link that contain text and symbols</a>
<div... </div>
<div... </div>
</div>

I would like to strip out all the internal div's. But because there
can be alot of internal div's, I figured it would be less processor
intensive to just match the first 'a' tag and repopulate the wrapper
div with the match. I am trying to use something like the following
regex:

re = /^<a(.+)</a>/;

with the following statment:

$temp = document.getElementById('wrapper').innerHTML.match (re);

but this is returning the entire contents of the wrapper div. I have
tried variations of the regex and either continue to get the entire
contents or null returns. Any help would greatly be appreciated.
BTW, I can't match to the first \n because the contents may be
touching (ie ...</a><div>...).
I'm not a big fan of using regexps for parsing HTML.
Getting a bulletproof expression is a major pain.

For example, here's one from 'Mastering Regular
Expressions 2nd edition' by J. Friedl (publisher
O'Reilly) for matching HTML tags:

/<("[^"]*"|'[^']*'|[^'">])*>/
How about...

var container=document.getElementById('wrapper');
var list=[];
while(container.hasChildNodes())
{
if(!('tagName' in container.lastChild)||
(container.lastChild.tagName.match(/^div$/i)==null))
{
list.push(container.lastChild);
}
container.removeChild(container.lastChild);
}
while(list.length>0)
{
container.appendChild(list.pop());
}

It does a little extra work to avoid
the nastiness of dealing with indexing
into a list that's being resized.

--
Geoff

May 18 '07 #2
On May 18, 1:19 pm, Geoffrey Summerhayes <sumr...@gmail.comwrote:
On May 18, 11:24 am, hende...@gmail.com wrote:
Gang,
I have been working on this for a few hours and am frustrated
beyond all extent. I have tried to research this on the web as well
with no success. I am trying to match certain contents within a
wrapper div. So for example if the inside of the wrapper div was the
following:
<div id="wrapper">
<a href="#">a great link that contain text and symbols</a>
<div... </div>
<div... </div>
</div>
I would like to strip out all the internal div's. But because there
can be alot of internal div's, I figured it would be less processor
intensive to just match the first 'a' tag and repopulate the wrapper
div with the match. I am trying to use something like the following
regex:
re = /^<a(.+)</a>/;
with the following statment:
$temp = document.getElementById('wrapper').innerHTML.match (re);
but this is returning the entire contents of the wrapper div. I have
tried variations of the regex and either continue to get the entire
contents or null returns. Any help would greatly be appreciated.
BTW, I can't match to the first \n because the contents may be
touching (ie ...</a><div>...).

I'm not a big fan of using regexps for parsing HTML.
Getting a bulletproof expression is a major pain.

For example, here's one from 'Mastering Regular
Expressions 2nd edition' by J. Friedl (publisher
O'Reilly) for matching HTML tags:

/<("[^"]*"|'[^']*'|[^'">])*>/

How about...

var container=document.getElementById('wrapper');
var list=[];
while(container.hasChildNodes())
{
if(!('tagName' in container.lastChild)||
(container.lastChild.tagName.match(/^div$/i)==null))
{
list.push(container.lastChild);
}
container.removeChild(container.lastChild);}

while(list.length>0)
{
container.appendChild(list.pop());

}

It does a little extra work to avoid
the nastiness of dealing with indexing
into a list that's being resized.

--
Geoff

Geoff,

Thanks for the reply. I was looking for something less processor
intensive. The inner div's can be in the hundreds. Thats why I was
looking at just isolating the first href tag and replacing the entire
contents of the wrapper div. Any other thoughts would be appreciated.

Thanks,
Dave

May 18 '07 #3
On May 18, 1:36 pm, hende...@gmail.com wrote:
On May 18, 1:19 pm, Geoffrey Summerhayes <sumr...@gmail.comwrote:
On May 18, 11:24 am, hende...@gmail.com wrote:
Gang,
I have been working on this for a few hours and am frustrated
beyond all extent. I have tried to research this on the web as well
with no success. I am trying to match certain contents within a
wrapper div. So for example if the inside of the wrapper div was the
following:
<div id="wrapper">
<a href="#">a great link that contain text and symbols</a>
<div... </div>
<div... </div>
</div>
I would like to strip out all the internal div's. But because there
can be alot of internal div's, I figured it would be less processor
intensive to just match the first 'a' tag and repopulate the wrapper
div with the match. I am trying to use something like the following
regex:
re = /^<a(.+)</a>/;
with the following statment:
$temp = document.getElementById('wrapper').innerHTML.match (re);
but this is returning the entire contents of the wrapper div. I have
tried variations of the regex and either continue to get the entire
contents or null returns. Any help would greatly be appreciated.
BTW, I can't match to the first \n because the contents may be
touching (ie ...</a><div>...).
I'm not a big fan of using regexps for parsing HTML.
Getting a bulletproof expression is a major pain.
For example, here's one from 'Mastering Regular
Expressions 2nd edition' by J. Friedl (publisher
O'Reilly) for matching HTML tags:
/<("[^"]*"|'[^']*'|[^'">])*>/
How about...
var container=document.getElementById('wrapper');
var list=[];
while(container.hasChildNodes())
{
if(!('tagName' in container.lastChild)||
(container.lastChild.tagName.match(/^div$/i)==null))
{
list.push(container.lastChild);
}
container.removeChild(container.lastChild);}
while(list.length>0)
{
container.appendChild(list.pop());
}
It does a little extra work to avoid
the nastiness of dealing with indexing
into a list that's being resized.
--
Geoff

Geoff,

Thanks for the reply. I was looking for something less processor
intensive. The inner div's can be in the hundreds. Thats why I was
looking at just isolating the first href tag and replacing the entire
contents of the wrapper div. Any other thoughts would be appreciated.

Thanks,
Dave


Gang,

I was getting nowhere trying to use regex's, so I just decided to
use a combination of substring's and indexOf's. Everything seems to
be working beautifully. Thanks anyways.

Dave

May 18 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Christian Staffe | last post by:
Hi, I would like to check for a partial match between an input string and a regular expression using the Regex class in .NET. By partial match, I mean that the input string could not yet be...
3
by: Jeff McPhail | last post by:
I am using Regex.Match in a large application and the memory is growing out of control. I have tried several ways to try and release the memory and none of them work. Here are some similar examples...
38
by: Steve Kirsch | last post by:
I need a simple function that can match the number of beginning and ending parenthesis in an expression. Here's a sample expression: ( ( "john" ) and ( "jane" ) and ( "joe" ) ) Does .NET have...
1
by: larry | last post by:
I'm having trouble with a pattern match expression using regex. I need to have the first 4 characters as letters and the next 2 characters as digits. ex... PROJ12 - when trying "\D{4}\d{2}"...
4
by: Chris | last post by:
Hi Everyone, I am using a regex to check for a string. When all the file contains is my test string the regex returns a match, but when I embed the test string in the middle of a text file a...
9
by: dba123 | last post by:
what's the best way to check the incoming URL for an IDand return the ID possible examples of incoming URLs: /somedirectory/4/somepage.aspx /somedirectory/4.aspx // Check requested URL for ID...
8
by: sherifffruitfly | last post by:
Hi, I've been searching as best I can for this - coming up with little. I have a file that is full of lines fitting this pattern: (?<year>\d{4}),(?<amount>\d{6,7}) I'm likely to get a...
4
by: Peter | last post by:
Hi all, I am searching through directories trying to find the prefix to a number of files. Unfortunately the files don't have a standard naming convention yet. So some of them appear as:...
4
by: Dylan Nicholson | last post by:
I can write a regular expression that will only match strings that are NOT the word apple: ^(.*|a.*|ap.*|app.*|apple.+)$ But is there a neater way, and how would I do it to match strings that...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.