By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,516 Members | 1,135 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,516 IT Pros & Developers. It's quick & easy.

regex failing

P: n/a
I'm runing an xmlHttpRequest to get the site's source code and then
applying the regex

xhr.responseText.split(/<body[^>]*>((?:.|\n)*)<\/body>/i)[1]

Works for google.com. Fails on yahoo.com and imdb.com pages (ex:
http://imdb.com/title/tt0482606/ )

Can someone help me tweak this, or give insight as to why its
failing? I can't spot it
Jun 27 '08 #1
Share this Question
Share on Google+
3 Replies


P: n/a
noon schreef:
I'm runing an xmlHttpRequest to get the site's source code and then
applying the regex

xhr.responseText.split(/<body[^>]*>((?:.|\n)*)<\/body>/i)[1]

Works for google.com. Fails on yahoo.com and imdb.com pages (ex:
http://imdb.com/title/tt0482606/ )

Can someone help me tweak this, or give insight as to why its
failing? I can't spot it
Maybe...
You didn't mention what it is you WANT your regex to do.
And you didn't say what 'failing' is. An error? An unexpected result?

Regards,
Erwin Moller
Jun 27 '08 #2

P: n/a
That information might help huh. I want it to strip everything
inbetween body tags. The error was that I was either receiving nothing
or receiving the entire html including the head tags etc. I have since
seem to have got it working with this code:

xhr.responseText.split(/<body[^>]*>((.|\n|\r|\u2028|\u2029)*)<\/body>/
gi)[1];

Though improvement suggestions are welcome
Jun 27 '08 #3

P: n/a
noon wrote:
That information might help huh. I want it to strip everything
inbetween body tags. The error was that I was either receiving nothing
or receiving the entire html including the head tags etc. I have since
seem to have got it working with this code:

xhr.responseText.split(/<body[^>]*>((.|\n|\r|\u2028|\u2029)*)<\/body>/
gi)[1];
With

foo<body>...</body>bar

this would give you

...

But you wanted to *strip* everything *in between*, _not_ split.
Though improvement suggestions are welcome
... = xhr.responseText.match(/<body(|\s+[^>]*)>((.|\s)*)<\/body>/i)[1];

is largely equivalent to your code in this case and more efficient.
However, IMHO that is still _not_ stripping everything in between but
*matching* everything in between, which is probably what you meant to say.

Note that (X)HTML is a context-sensitive language which cannot be parsed
with one regular expression (defining a regular language) alone. In your
case it should work because a Valid (X)HTML document MUST NOT have more
than one `body' element.
PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16
Jun 27 '08 #4

This discussion thread is closed

Replies have been disabled for this discussion.