469,345 Members | 6,438 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,345 developers. It's quick & easy.

Regex help please

Hi

Can anyone help me match this div below - my regex does not work - if
you could tell me why I would appreciate it.

var aStr = "<div class='feedflare'>dfgdg dg</div>";
var reg = new RegExp("<div class='feedflare'.*?</div>'","gim");
thanks
Tim
Aug 25 '08 #1
8 1261
pr
Tim Nash (aka TMN) wrote:
Can anyone help me match this div below - my regex does not work - if
you could tell me why I would appreciate it.

var aStr = "<div class='feedflare'>dfgdg dg</div>";
var reg = new RegExp("<div class='feedflare'.*?</div>'","gim");
-------------------------------------------------------^
That apostrophe shouldn't be there.

The 'm' flag is unnecessary.
Aug 25 '08 #2
After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used

var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');

Tim
Aug 26 '08 #3
Tim Nash (aka TMN) wrote:
After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used

var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');
Single-escaping the apostrophe within a double-quoted string literal is
useless ("\'" == "'"), and attr=['"]...['"]* is pointless (the star repeats
the previous expression zero or more times; here: ['"]). It would also be a
lot easier to maintain if you used a RegExp literal instead.

var reg = /<div[^>]class\s*=\s*['"]feedflare['"]>(.*?)<\/div>/gi;

That still does not exclude the possibility of e.g.

<divaclass="feedflare'>...</div>

which is not Valid. As for the element type identifier followed by optional
attributes, you should use

<ident(|\s+attr...)>

because whitespace after the identifier is required if there are attributes.
As for the matching quotes, you should use

('foo'|"foo")

However, RegExp literals and non-greedy matching (`.*?') are not universally
supported, with the latter being the more important fact here. See also:

<http://pointedears.de/scripts/es-matrix/>

Also note that a single regular expression cannot be used to parse an
*arbitrary* fragment of an SGML-based markup language; either it is too
greedy or not greedy enough. For example, in

<div class="foo"><div>bar</div></div>

this non-greedy expression would match `<div class="foo"><div>bar</div>'.
with the outer `div' element not being closed.

So, for reliable parsing, you will need to implement a push-down automaton;
however, its parsing algorithm can be made more efficient with regular
expressions.

Unsurprisingly, all this has been discussed here before. Please search
before you post.

<http://jibbering.com/faq/>
PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm>
Aug 26 '08 #4
pr
Tim Nash (aka TMN) wrote:
After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used

var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');
To match a string starting with any of the following common permutations:

<div class="feedflare">
<div style="color: red;" class="feedflare">
<div class="feedflare" id="div1">
<div class="class1 feedflare class3">

you will instead need something like:

/<div\b[^>]+\bclass\s*=\s*(['"])[\w\s]*\bfeedflare\b[\w\s]*\1[^>]*>(.*?)<\/div\s*>/gi

I have simplified it by presuming you won't use the characters '.-:' in
class names. But as PointedEars points out, '.*?' is a problem in old
browsers and you're in trouble if there's a nested div in your string.

Possibly you would be better served by reading the string into the DOM
(using a DOMParser or innerHTML, for e.g.) and extracting information
from it there.
Aug 26 '08 #5
Thank you PointedEars and pr for your input.

Tim

pr wrote:
Tim Nash (aka TMN) wrote:
After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used

var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');

To match a string starting with any of the following common permutations:

<div class="feedflare">
<div style="color: red;" class="feedflare">
<div class="feedflare" id="div1">
<div class="class1 feedflare class3">

you will instead need something like:

/<div\b[^>]+\bclass\s*=\s*(['"])[\w\s]*\bfeedflare\b[\w\s]*\1[^>]*>(.*?)<\/div\s*>/gi

I have simplified it by presuming you won't use the characters '.-:' in
class names. But as PointedEars points out, '.*?' is a problem in old
browsers and you're in trouble if there's a nested div in your string.

Possibly you would be better served by reading the string into the DOM
(using a DOMParser or innerHTML, for e.g.) and extracting information
from it there.
Aug 26 '08 #6
pr
Thomas 'PointedEars' Lahn wrote:
As for the matching quotes, you should use

('foo'|"foo")
Or

(['"])foo\1
>
However, RegExp literals and non-greedy matching (`.*?') are not universally
supported, with the latter being the more important fact here.
Does this seem a reasonable feature test to you?

var ngq = /.+?/.exec("ab");
var hasNonGreedyQuantifiers = ngq && ngq[0].length == 1;

I can only lay hands on one browser old enough to fail. I assume the
presence of literal notation, obviously.
Aug 26 '08 #7
pr wrote:
Thomas 'PointedEars' Lahn wrote:
> As for the matching quotes, you should use

('foo'|"foo")

Or

(['"])foo\1
Correct. To my surprise, this feature, standardized only with ECMAScript
Ed. 3 (like regular expressions in general), appears to be widely supported:

The bookmarklet

javascript:window.alert(/^(["'])a\1b$/.test("'a'b"));

shows `true' in all my test environments, which currently are:

- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.1)
Gecko/2008070208 Firefox/3.0.1
- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14)
Gecko/20080404 Firefox/2.0.0.14
- Mozilla/4.78 [de] (Windows NT 5.0; U)

- Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE)
AppleWebKit/525.19 (KHTML, like Gecko) Version/3.1.2 Safari/525.21
- Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; de-de)
AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.22

- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727) (IE 8 beta 1)
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0;
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 4.01; Windows NT 5.0; {...})

- Opera/9.52 (Windows NT 5.1; U; de)
- Opera/9.51 (Windows NT 5.1; U; de)
- Opera/9.27 (Windows NT 5.1; U; en)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.0
- Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1)
Opera 7.02 [en]
>However, RegExp literals and non-greedy matching (`.*?') are not universally
supported, with the latter being the more important fact here.

Does this seem a reasonable feature test to you?

var ngq = /.+?/.exec("ab");
var hasNonGreedyQuantifiers = ngq && ngq[0].length == 1;
No, it could already throw a (non-catchable) SyntaxError when /.+?/ is
parsed, before execution (you can test that with IE 5.0, for example). And
I have yet to devise a bullet-proof test for possibly unsupported syntax (a
more sophisticated application of eval() comes to mind), one that does not
break the ECMAScript program then.

However,

var ngq = null;

try
{
ngq = new RegExp(".+?");
}
catch (e)
{
}

if (nqg)
{
// ...
}

would work for script engines that support basic exception handling but not
non-greedy quantifiers (such as JScript 5.1 in IE 5.01; tested positive).
PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm>
Aug 26 '08 #8
pr
Thomas 'PointedEars' Lahn wrote:
pr wrote:
>Does this seem a reasonable feature test to you?

var ngq = /.+?/.exec("ab");
var hasNonGreedyQuantifiers = ngq && ngq[0].length == 1;

No, it could already throw a (non-catchable) SyntaxError when /.+?/ is
parsed, before execution (you can test that with IE 5.0, for example).
You're right. IE 5 reports "Unexpected quantifier".
Aug 27 '08 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Greg Collins [InfoPath MVP] | last post: by
17 posts views Thread by clintonG | last post: by
9 posts views Thread by jmchadha | last post: by
10 posts views Thread by igor.kulkin | last post: by
4 posts views Thread by Henrik Dahl | last post: by
7 posts views Thread by Nightcrawler | last post: by
6 posts views Thread by Phil Barber | last post: by
reply views Thread by zhoujie | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.