473,325 Members | 2,774 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,325 software developers and data experts.

Pulling a synopsis from text

Greetings,
I am trying to automatically pull a beginning section from submitted
text and return it with a More.. link. The submitted text is in html
created by FckEditor (http://www.fckeditor.net/).
The trouble I am running into is the cutoff point is often inside of a
tag - ie after an opening <div> but the closing div is cut.
The only idea I have come up with is to build an array of all possible
html tags and search for a close for each but I am hoping there is a
cleaner method. Has anyone attempted such a feat previously?

function getSynop($input="", $more_link="", $synop_size='750') {
$tmp_str = substr($input, 0, $synop_size);
$end_val = strrpos($tmp_str, ">") + 1;
if($end_val < ($synop_size)) {
$end_val = strrpos($tmp_str, ".") + 1;
}
if($end_val < ($synop_size)) {
$end_val = strrpos($tmp_str, ">") + 1;
}
Return substr($input, 0, $end_val) ." <a
href='$more_link'>more...</a>";
}

Mar 25 '06 #1
2 1286

crucialmoment wrote:
Greetings,
I am trying to automatically pull a beginning section from submitted
text and return it with a More.. link. The submitted text is in html
created by FckEditor (http://www.fckeditor.net/).
The trouble I am running into is the cutoff point is often inside of a
tag - ie after an opening <div> but the closing div is cut.
The only idea I have come up with is to build an array of all possible
html tags and search for a close for each but I am hoping there is a
cleaner method. Has anyone attempted such a feat previously?

function getSynop($input="", $more_link="", $synop_size='750') {
$tmp_str = substr($input, 0, $synop_size);
$end_val = strrpos($tmp_str, ">") + 1;
if($end_val < ($synop_size)) {
$end_val = strrpos($tmp_str, ".") + 1;
}
if($end_val < ($synop_size)) {
$end_val = strrpos($tmp_str, ">") + 1;
}
Return substr($input, 0, $end_val) ." <a
href='$more_link'>more...</a>";
}


The trick here is to ignore the tags and only operate on what's between
the tags. Say if we have the following:

This is <div>a test</div> and this is only <div>a test.</div>

and we want 10 characters, we would look at "This is " and grab 8
characters. Then we look at "a test" and retain only 2 characters. As
we have want we need, we will retain 0 characters from " and this is
only " and "a test.". The end result will be:

This is <div>a </div><div></div>

Once the empty tags are discarded we end up with

This is <div>a </div>

which is want we want.

Here's an implementation of the technique:

<?

$s = 'This is some <strong>sample text</strong>. You are using <a
href="http://www.fckeditor.net/">FCKeditor</a>.';

function synop_callback($m) {
global $synop_char_to_fetch;
$tag = $m[2];

// got enough characters already, return just the tag
if($synop_char_to_fetch < 0) {
return $tag;
}

// decode HTML entities to avoid undercounting
$inner_html = $m[1];
$inner_text = html_entity_decode($inner_html);

if(strlen($inner_text) > $synop_char_to_fetch) {
// retain up to $synop_char_to_fetch, ending
// at a word boundary
$r = preg_replace("/^(.{0,$synop_char_to_fetch}\b)?.*/", '\1',
$inner_text);
$inner_html = htmlspecialchars(rtrim($r));
}

// substract the number of characters retained
$synop_char_to_fetch -= strlen($inner_text);
return "$inner_html$tag";
}

function synop_chop($s, $num) {
// chop off extra text beyond $num characters
global $synop_char_to_fetch;
$synop_char_to_fetch = $num;
$s = preg_replace_callback('/([^<]*)(<.*?>)?/s', 'synop_callback',
$s);

// collapse empty tags
do {
$r = $s;
$s = preg_replace('/<(\S*?)[^>]*?>\s*<\/\1>/i', '', $r);
} while($r != $s);

// add ellipsis
$s = preg_replace('/\.?$/', '...', trim($s), 1);
return $s;
}

echo synop_chop($s, 20);

?>

Mar 26 '06 #2
d
"crucialmoment" <cr***********@gmail.com> wrote in message
news:11**********************@g10g2000cwb.googlegr oups.com...
Greetings,
I am trying to automatically pull a beginning section from submitted
text and return it with a More.. link. The submitted text is in html
created by FckEditor (http://www.fckeditor.net/).
The trouble I am running into is the cutoff point is often inside of a
tag - ie after an opening <div> but the closing div is cut.
The only idea I have come up with is to build an array of all possible
html tags and search for a close for each but I am hoping there is a
cleaner method. Has anyone attempted such a feat previously?

function getSynop($input="", $more_link="", $synop_size='750') {
$tmp_str = substr($input, 0, $synop_size);
$end_val = strrpos($tmp_str, ">") + 1;
if($end_val < ($synop_size)) {
$end_val = strrpos($tmp_str, ".") + 1;
}
if($end_val < ($synop_size)) {
$end_val = strrpos($tmp_str, ">") + 1;
}
Return substr($input, 0, $end_val) ." <a
href='$more_link'>more...</a>";
}


1. Get text.
2. Remove tags
3. Take first <n> characters.

$text="This is some <div>text</div> isn't it interesting. <b>send
money.</b> <i>and beer</i>";

function getSynop($input="", $more_link="", $synop_size=750) {
$syn=substr(strip_tags($text), 0, $synop_size);
return $syn." <a href='".$more_link.."'>more...</a>";
}

hope that helps!

dave
Mar 27 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: LRW | last post by:
I have an automated process which uploads a comma separated spreadsheet (csv) and inserts it into a database: $sql = "LOAD DATA INFILE '".$uploadfile."' INTO TABLE `tbl_tracking` FIELDS...
1
by: TG | last post by:
I have a problem trying to get a value from textbox 1 and textbox 2. If those two values match the data in the database then it has to return a third corresponding value to a third textbox (or...
2
by: Steven T. Hatton | last post by:
Remember about a year ago when I got flamed for suggesting something like this on c.l.c++? http://synopsis.fresco.org -- If our hypothesis is about anything and not about some one or more...
6
by: Mabden | last post by:
Suggestion to group: Let's have Dan and Keith summarize the month's "going's on" and write a synopsis. I have to work so much more lately, that I can't keep up. Can we just have a "Point -...
0
by: BRINER Cedric | last post by:
Synopsis CREATE { TEMPORARY | TEMP } ] TABLE /table_name/ ( { /column_name/ /data_type/ ] | /table_constraint/ } ) ) ] ... ...
5
by: akelly_image | last post by:
Okay, if anyone could toss me some idea's here, please bare with my noobish questions, I just picked up VB2005 Pro about a week ago. ( no prior VB at all ) Here's my issue.. I'm pulling...
9
by: mvsguy | last post by:
I'm a Notes admin tasked with fixing an Access problem. I hope someone will be gracious enough to help. The database is getting a 3420, object not defined, and I need to find all the places...
1
by: xx75vulcan | last post by:
I have created an ASP page that will "on the fly" create an XML feed from my MS SQL database and the contents within a specified table. The Feed: http://www.rockwood.k12.mo.us/news/rss.asp You...
12
by: Alexnb | last post by:
This is similar to my last post, but a little different. Here is what I would like to do. Lets say I have a text file. The contents look like this, only there is A LOT of the same thing. () A...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.