By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,954 Members | 1,248 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,954 IT Pros & Developers. It's quick & easy.

Safely cut off short preview version of long string

P: n/a
Hi all,

Here`s a problem I have been working on for a while, but can`t seem to
solve satisfactory.

I have a database with blog entries. Because each of those entries has a
variable length which can be quite long, I want to build an overview page.
Of each entry there will be a preview version, say 700 characters max.

My problem has to do with HTML tags. If for example an entry contains a
<BLOCKQUOTE> with a large quote, my function would break off somewhere
halfway in the quote. The end result of course won`t have the
</BLOCKQUOTE>, rendering the resulting page horribly bad.

I would like to build a function that breaks a string up to max X
characters long, but plays it safe when it encounters any HTML tag: it
does not matter if the end result is a string of say 670 characters long,
it only matters that it approximates the max character setting and doesn`t
mess up the HTML tags.

Can anyone point me in the right direction?

Hans
Jul 17 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Following on from Hans Gruber's message. . .
My problem has to do with HTML tags. If for example an entry contains a
<BLOCKQUOTE> with a large quote, my function would break off somewhere
halfway in the quote. The end result of course won`t have the
</BLOCKQUOTE>, rendering the resulting page horribly bad.

I would like to build a function that breaks a string up to max X
characters long, but plays it safe when it encounters any HTML tag: it
does not matter if the end result is a string of say 670 characters long,
it only matters that it approximates the max character setting and doesn`t
mess up the HTML tags.


A simple way would be to decide where your end point was going to be
roughly (not inside <...>) then leave all the remaining tags but remove
the text.

The reason for putting all the following tags in is that you can have
complex nested structures where you'd have to do lots of complicated
parsing - just not worth the effort. Also the entry could start with
say <center> and end with </center> many pages apart.
eg.
1 - split string to get 1st X chars and work with remainder of string
2 - explode remainder by '<' so that tags _except possibly in array[0]_
will be the first part and therefore look like "ATAG>some text" (or
"/ATAG>some text")
3 - if array[0] doesn't contain a '>' this is tail of a tag
(NB /sort of/ there are two exceptions - no more tags at all and this
tag followed immediately by another in which case '>' would appear as
last character if you see what I mean)
4 - Now strip the bits after '>' from the array , implode with '<' and
add to end of text.

--
PETER FOX Not the same since the pancake business flopped
pe******@eminent.demon.co.uk.not.this.bit.no.html
2 Tees Close, Witham, Essex.
Gravity beer in Essex <http://www.eminent.demon.co.uk>
Jul 17 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.