Connecting Tech Pros Worldwide Help | Site Map

Regex help please

 
LinkBack Thread Tools Search this Thread
  #1  
Old August 25th, 2008, 02:55 PM
Tim Nash (aka TMN)
Guest
 
Posts: n/a
Default Regex help please

Hi

Can anyone help me match this div below - my regex does not work - if
you could tell me why I would appreciate it.

var aStr = "<div class='feedflare'>dfgdg dg</div>";
var reg = new RegExp("<div class='feedflare'.*?</div>'","gim");


thanks
Tim

  #2  
Old August 25th, 2008, 03:15 PM
pr
Guest
 
Posts: n/a
Default Re: Regex help please

Tim Nash (aka TMN) wrote:
Quote:
Can anyone help me match this div below - my regex does not work - if
you could tell me why I would appreciate it.
>
var aStr = "<div class='feedflare'>dfgdg dg</div>";
var reg = new RegExp("<div class='feedflare'.*?</div>'","gim");
-------------------------------------------------------^
That apostrophe shouldn't be there.

The 'm' flag is unnecessary.
  #3  
Old August 26th, 2008, 07:25 AM
Tim Nash (aka TMN)
Guest
 
Posts: n/a
Default Re: Regex help please

After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used

var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');

Tim
  #4  
Old August 26th, 2008, 10:17 AM
Thomas 'PointedEars' Lahn
Guest
 
Posts: n/a
Default Re: Regex help please

Tim Nash (aka TMN) wrote:
Quote:
After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used
>
var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');
Single-escaping the apostrophe within a double-quoted string literal is
useless ("\'" == "'"), and attr=['"]...['"]* is pointless (the star repeats
the previous expression zero or more times; here: ['"]). It would also be a
lot easier to maintain if you used a RegExp literal instead.

var reg = /<div[^>]class\s*=\s*['"]feedflare['"]>(.*?)<\/div>/gi;

That still does not exclude the possibility of e.g.

<divaclass="feedflare'>...</div>

which is not Valid. As for the element type identifier followed by optional
attributes, you should use

<ident(|\s+attr...)>

because whitespace after the identifier is required if there are attributes.
As for the matching quotes, you should use

('foo'|"foo")

However, RegExp literals and non-greedy matching (`.*?') are not universally
supported, with the latter being the more important fact here. See also:

<http://pointedears.de/scripts/es-matrix/>

Also note that a single regular expression cannot be used to parse an
*arbitrary* fragment of an SGML-based markup language; either it is too
greedy or not greedy enough. For example, in

<div class="foo"><div>bar</div></div>

this non-greedy expression would match `<div class="foo"><div>bar</div>'.
with the outer `div' element not being closed.

So, for reliable parsing, you will need to implement a push-down automaton;
however, its parsing algorithm can be made more efficient with regular
expressions.

Unsurprisingly, all this has been discussed here before. Please search
before you post.

<http://jibbering.com/faq/>


PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm>
  #5  
Old August 26th, 2008, 12:15 PM
pr
Guest
 
Posts: n/a
Default Re: Regex help please

Tim Nash (aka TMN) wrote:
Quote:
After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used
>
var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');
To match a string starting with any of the following common permutations:

<div class="feedflare">
<div style="color: red;" class="feedflare">
<div class="feedflare" id="div1">
<div class="class1 feedflare class3">

you will instead need something like:

/<div\b[^>]+\bclass\s*=\s*(['"])[\w\s]*\bfeedflare\b[\w\s]*\1[^>]*>(.*?)<\/div\s*>/gi

I have simplified it by presuming you won't use the characters '.-:' in
class names. But as PointedEars points out, '.*?' is a problem in old
browsers and you're in trouble if there's a nested div in your string.

Possibly you would be better served by reading the string into the DOM
(using a DOMParser or innerHTML, for e.g.) and extracting information
from it there.
  #6  
Old August 26th, 2008, 12:25 PM
Tim Nash (aka TMN)
Guest
 
Posts: n/a
Default Re: Regex help please

Thank you PointedEars and pr for your input.

Tim

pr wrote:
Quote:
Tim Nash (aka TMN) wrote:
Quote:
After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used

var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');
>
To match a string starting with any of the following common permutations:
>
<div class="feedflare">
<div style="color: red;" class="feedflare">
<div class="feedflare" id="div1">
<div class="class1 feedflare class3">
>
you will instead need something like:
>
/<div\b[^>]+\bclass\s*=\s*(['"])[\w\s]*\bfeedflare\b[\w\s]*\1[^>]*>(.*?)<\/div\s*>/gi
>
I have simplified it by presuming you won't use the characters '.-:' in
class names. But as PointedEars points out, '.*?' is a problem in old
browsers and you're in trouble if there's a nested div in your string.
>
Possibly you would be better served by reading the string into the DOM
(using a DOMParser or innerHTML, for e.g.) and extracting information
from it there.
  #7  
Old August 26th, 2008, 12:25 PM
pr
Guest
 
Posts: n/a
Default Re: Regex help please

Thomas 'PointedEars' Lahn wrote:
Quote:
As for the matching quotes, you should use
>
('foo'|"foo")
Or

(['"])foo\1
Quote:
>
However, RegExp literals and non-greedy matching (`.*?') are not universally
supported, with the latter being the more important fact here.
Does this seem a reasonable feature test to you?

var ngq = /.+?/.exec("ab");
var hasNonGreedyQuantifiers = ngq && ngq[0].length == 1;

I can only lay hands on one browser old enough to fail. I assume the
presence of literal notation, obviously.
  #8  
Old August 26th, 2008, 04:45 PM
Thomas 'PointedEars' Lahn
Guest
 
Posts: n/a
Default Re: Regex help please

pr wrote:
Quote:
Thomas 'PointedEars' Lahn wrote:
Quote:
> As for the matching quotes, you should use
>>
> ('foo'|"foo")
>
Or
>
(['"])foo\1
Correct. To my surprise, this feature, standardized only with ECMAScript
Ed. 3 (like regular expressions in general), appears to be widely supported:

The bookmarklet

javascript:window.alert(/^(["'])a\1b$/.test("'a'b"));

shows `true' in all my test environments, which currently are:

- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.1)
Gecko/2008070208 Firefox/3.0.1
- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14)
Gecko/20080404 Firefox/2.0.0.14
- Mozilla/4.78 [de] (Windows NT 5.0; U)

- Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE)
AppleWebKit/525.19 (KHTML, like Gecko) Version/3.1.2 Safari/525.21
- Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; de-de)
AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.22

- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727) (IE 8 beta 1)
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0;
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 4.01; Windows NT 5.0; {...})

- Opera/9.52 (Windows NT 5.1; U; de)
- Opera/9.51 (Windows NT 5.1; U; de)
- Opera/9.27 (Windows NT 5.1; U; en)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.0
- Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1)
Opera 7.02 [en]
Quote:
Quote:
>However, RegExp literals and non-greedy matching (`.*?') are not universally
>supported, with the latter being the more important fact here.
>
Does this seem a reasonable feature test to you?
>
var ngq = /.+?/.exec("ab");
var hasNonGreedyQuantifiers = ngq && ngq[0].length == 1;
No, it could already throw a (non-catchable) SyntaxError when /.+?/ is
parsed, before execution (you can test that with IE 5.0, for example). And
I have yet to devise a bullet-proof test for possibly unsupported syntax (a
more sophisticated application of eval() comes to mind), one that does not
break the ECMAScript program then.

However,

var ngq = null;

try
{
ngq = new RegExp(".+?");
}
catch (e)
{
}

if (nqg)
{
// ...
}

would work for script engines that support basic exception handling but not
non-greedy quantifiers (such as JScript 5.1 in IE 5.01; tested positive).


PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm>
  #9  
Old August 27th, 2008, 11:45 AM
pr
Guest
 
Posts: n/a
Default Re: Regex help please

Thomas 'PointedEars' Lahn wrote:
Quote:
pr wrote:
Quote:
>Does this seem a reasonable feature test to you?
>>
> var ngq = /.+?/.exec("ab");
> var hasNonGreedyQuantifiers = ngq && ngq[0].length == 1;
>
No, it could already throw a (non-catchable) SyntaxError when /.+?/ is
parsed, before execution (you can test that with IE 5.0, for example).
You're right. IE 5 reports "Unexpected quantifier".
 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,989 network members.