Damo <cormacdebarra@gmail.comwrote:
Quote:
Quote:
>First, do you really need the whitespace around (.+)?
Does the presence/absence of whitespace make a difference... As I said
I'm new to regex
Yes, it will match that whitespace unless the /x modifier is set.
Quote:
Quote:
>Second,$document must be a string, not a handle on the file.
$document is a handle on a URL taht I was reading in , so ye it was
just a string
Quote:
Quote:
>Third, your regular expression as written is greedy; is this
>intentional?
There was no ? at the end of my regular expression it was just (.+)
Yes, so it's greedy. It will match as much as possible untill the second
match.
consider:
'<a>foo</a>bar<a>baz</a>foz'
'|<a>.+</a>|' will match '<a>foo</a>bar<a>baz</a>'
'|<a>.+?</a>|' will match '<a>foo</a>'
For a lot of info about regular expressions:
<http://www.regularexpressions.info>
In your case, I'd possibly use:
$regexp = "%<table[^>]*>(.+?)<img%si";
(the /i modifier will make the dot match linebreaks, which is possibly the
breaking point for your regex).
Highly depends on the actual markup wether this will work though...
--
Rik Wasmus