Connecting Tech Pros Worldwide Forums | Help | Site Map

Get text between A and B?

Philipp Lenssen
Guest
 
Posts: n/a
#1: Jul 17 '05
I want to read out several strings from an HTML (text) file, like
everything between "<h2>" and "</h2>" to create a table-of-contents
(also, things other than tags).
I do have a function but it's slow, and sometimes doesn't finish for
larger files (600K, not much really!).
Now what would be a nice function to do this job? I suppose some regex
with preg_match_all?

It should have a parameter telling which occurrence of the string
should be used, e.g. the second, third and so on.

------------

Like:

function getTextBetween($allText, $textBefore, $textAfter, $offset = 0)
{
// ?
}

Then I could say:

$s = getTextBetween("<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
"<h2>", "</h2>", 1);
echo $s; // ... would be "bar"

------------

Any help greatly appreciated!


Jedi121
Guest
 
Posts: n/a
#2: Jul 17 '05

re: Get text between A and B?


"Philipp Lenssen" a écrit le 17/11/2003 :[color=blue]
> Then I could say:
>
> $s = getTextBetween("<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
> "<h2>", "</h2>", 1);
> echo $s; // ... would be "bar"
>
> ------------
>
> Any help greatly appreciated![/color]

I would use a combination of explode :
explode( "<h2>", "<h2>foo</h2><p>Hello World</p><h2>bar</h2>")
and for each element explode( "</h2>", element)


Justin Koivisto
Guest
 
Posts: n/a
#3: Jul 17 '05

re: Get text between A and B?


Philipp Lenssen wrote:[color=blue]
> I want to read out several strings from an HTML (text) file, like
> everything between "<h2>" and "</h2>" to create a table-of-contents
> (also, things other than tags).
> I do have a function but it's slow, and sometimes doesn't finish for
> larger files (600K, not much really!).
> Now what would be a nice function to do this job? I suppose some regex
> with preg_match_all?
>
> It should have a parameter telling which occurrence of the string
> should be used, e.g. the second, third and so on.
>
> ------------
>
> Like:
>
> function getTextBetween($allText, $textBefore, $textAfter, $offset = 0)
> {
> // ?
> }
>
> Then I could say:
>
> $s = getTextBetween("<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
> "<h2>", "</h2>", 1);
> echo $s; // ... would be "bar"
>
> ------------
>
> Any help greatly appreciated!
>[/color]

Kinda like this then...

function getTextBetween($allText,$textBefore,$textAfter,$of fset=0){
$pattern='#'.$textBefore.'(.*)'.$textAfter.'#iU';
preg_match_all($pattern, $allText,$matches);
return $matches[1][$offset];
}


--
Justin Koivisto - spam@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.

Matty
Guest
 
Posts: n/a
#4: Jul 17 '05

re: Get text between A and B?


Philipp Lenssen wrote:
[color=blue]
> It should have a parameter telling which occurrence of the string
> should be used, e.g. the second, third and so on.
>
> ------------
>
> Like:
>
> function getTextBetween($allText, $textBefore, $textAfter, $offset = 0)
> {
> // ?
> }
>
> Then I could say:
>
> $s = getTextBetween("<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
> "<h2>", "</h2>", 1);
> echo $s; // ... would be "bar"
>[/color]

function getTextDelims($alltext, $opener, $closer)
{
preg_match_all('/'.preg_quote($opener).'(.+)'.preg_quote($closer).'/mU', $alltext, $allmatches);
if ((count($allmatches) > 0) and (array_key_exists(1, $allmatches)))

{ return $allamtches[1]; }
else { return array(); }
}

Then doing
$myanswers = getTextDelims(.....

$myanswers[1] contains offset 1, etc

If you want to make the matches case-insensitive, change '/mU' to '/mUi'

HTH

Matt
Philipp Lenssen
Guest
 
Posts: n/a
#5: Jul 17 '05

re: Get text between A and B?


Justin Koivisto wrote:
[color=blue]
> Kinda like this then...
>
> function getTextBetween($allText,$textBefore,$textAfter,$of fset=0){
> $pattern='#'.$textBefore.'(.*)'.$textAfter.'#iU';
> preg_match_all($pattern, $allText,$matches);
> return $matches[1][$offset];
> }[/color]

Thanks to that solution, and the other ones as well (I merged two
together). Works very nice.
Closed Thread