By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,745 Members | 1,897 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,745 IT Pros & Developers. It's quick & easy.

Get text between A and B?

P: n/a
I want to read out several strings from an HTML (text) file, like
everything between "<h2>" and "</h2>" to create a table-of-contents
(also, things other than tags).
I do have a function but it's slow, and sometimes doesn't finish for
larger files (600K, not much really!).
Now what would be a nice function to do this job? I suppose some regex
with preg_match_all?

It should have a parameter telling which occurrence of the string
should be used, e.g. the second, third and so on.

------------

Like:

function getTextBetween($allText, $textBefore, $textAfter, $offset = 0)
{
// ?
}

Then I could say:

$s = getTextBetween("<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
"<h2>", "</h2>", 1);
echo $s; // ... would be "bar"

------------

Any help greatly appreciated!

Jul 17 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
"Philipp Lenssen" a écrit le 17/11/2003 :
Then I could say:

$s = getTextBetween("<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
"<h2>", "</h2>", 1);
echo $s; // ... would be "bar"

------------

Any help greatly appreciated!


I would use a combination of explode :
explode( "<h2>", "<h2>foo</h2><p>Hello World</p><h2>bar</h2>")
and for each element explode( "</h2>", element)
Jul 17 '05 #2

P: n/a
Philipp Lenssen wrote:
I want to read out several strings from an HTML (text) file, like
everything between "<h2>" and "</h2>" to create a table-of-contents
(also, things other than tags).
I do have a function but it's slow, and sometimes doesn't finish for
larger files (600K, not much really!).
Now what would be a nice function to do this job? I suppose some regex
with preg_match_all?

It should have a parameter telling which occurrence of the string
should be used, e.g. the second, third and so on.

------------

Like:

function getTextBetween($allText, $textBefore, $textAfter, $offset = 0)
{
// ?
}

Then I could say:

$s = getTextBetween("<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
"<h2>", "</h2>", 1);
echo $s; // ... would be "bar"

------------

Any help greatly appreciated!


Kinda like this then...

function getTextBetween($allText,$textBefore,$textAfter,$of fset=0){
$pattern='#'.$textBefore.'(.*)'.$textAfter.'#iU';
preg_match_all($pattern, $allText,$matches);
return $matches[1][$offset];
}
--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.

Jul 17 '05 #3

P: n/a
Philipp Lenssen wrote:
It should have a parameter telling which occurrence of the string
should be used, e.g. the second, third and so on.

------------

Like:

function getTextBetween($allText, $textBefore, $textAfter, $offset = 0)
{
// ?
}

Then I could say:

$s = getTextBetween("<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
"<h2>", "</h2>", 1);
echo $s; // ... would be "bar"


function getTextDelims($alltext, $opener, $closer)
{
preg_match_all('/'.preg_quote($opener).'(.+)'.preg_quote($closer).'/mU', $alltext, $allmatches);
if ((count($allmatches) > 0) and (array_key_exists(1, $allmatches)))

{ return $allamtches[1]; }
else { return array(); }
}

Then doing
$myanswers = getTextDelims(.....

$myanswers[1] contains offset 1, etc

If you want to make the matches case-insensitive, change '/mU' to '/mUi'

HTH

Matt
Jul 17 '05 #4

P: n/a
Justin Koivisto wrote:
Kinda like this then...

function getTextBetween($allText,$textBefore,$textAfter,$of fset=0){
$pattern='#'.$textBefore.'(.*)'.$textAfter.'#iU';
preg_match_all($pattern, $allText,$matches);
return $matches[1][$offset];
}


Thanks to that solution, and the other ones as well (I merged two
together). Works very nice.
Jul 17 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.