On Tue, 20 Jan 2004 16:22:14 GMT, in comp.lang.javascript Michael
Winter <M.******@blueyonder.co.invalid> wrote:
| On Tue, 20 Jan 2004 08:54:17 GMT, Jeff North
| <jn****@yourpantsbigpond.net.au> wrote:
|
| > On Tue, 20 Jan 2004 00:31:58 GMT, in comp.lang.javascript Michael
| > Winter <M.******@blueyonder.co.invalid> wrote:
| >
| >> | On Tue, 20 Jan 2004 00:12:49 GMT, Michael Winter
| >> | <M.******@blueyonder.co.invalid> wrote:
| >> |
| >> | > string.replace( /<\S+>/g, '' );
| >> |
| >> | Oops. That should be something more like:
| >> |
| >> | string.replace( /<.+>/g, '' );
| >
| > The first example didn't remove all of the tags. It mainly left the
| > font opening tag but successfully removed the closing tag.
|
| The first wouldn't remove tags that contained any whitespace, so tags with
| attributes, or XHTML-style empty tags (<br />, for example) would remain.
| That's what prompted the second suggestion.
|
| > The second example wiped the entire text.
|
| I tested it with strings that I thought would cause unwanted results, but
| they came out fine. I was surprised (with a little more thought after
| posting it) that the entire text wasn't wiped. I just found out why[1].
|
| The best safe result I can get is:
|
| .replace( /<[^<>]+>/g, '' )
|
| The only problem is that if angle brackets appear inside tags, the tag
| won't be removed properly. Such an occurance isn't really likely to occur,
| unless someone wants to explicitly exploit this hole.
|
| > tmp = tmp.replace( /<\S+>/gi, ' ' );
| > tmp = tmp.replace( /<.+>/gi, ' ' );
|
| I think I can explain why this works in your tests. The expression /<.+>/
| matches "<anything>", where "anything" is literally that: letters,
| numbers, punctuation, symbols, etc. If a tag is paired, like this:
|
| <em id="example">This is emphasised</em>
|
| the "em id=....</em" matches the '.' token in the regular expression. The
| earlier expression, /<\S+>/ would remove the closing tag, leaving:
|
| <em id="example">This is emphasised
|
| which is then correctly handled by the greedy second expression. However,
| if you try this:
|
| The word, <em>this</em> is emphasised
|
| you'll only get:
|
| The word, is emphasised
|
| back. That is why you should try the third suggestion, /<[^<>]+>/g,
| despite it's flaw.
|
| What a mess this is becoming. :)
|
| Mike
|
|
| [1] The reason is inconsequential, but it made the testing unfair.
Mike and Evertjan, thanks for all your time and effort it is greatly
appreciated.
Mike, I tried your 3rd suggestion and it appears to work (so I won't
annoy you anymore LOL).
Here is what I've ended up with and some sample text. I know that
there is probably a more elegant way of doing this but I think that
this is almost self-documenting and easily modifiable:
----------------------------------
//--- read data from database
//--- strip out html tags and convert symbols to characters.
//--- var msg is called in client-side script.
var msg = new String( rsDir.Fields.Item("contents").Value );
msg = msg.replace(/\n/g,"");
msg = msg.replace(/\r/g,"");
//--- any double quote -> single quote
msg = msg.replace(/"/gi,"\'");
msg = msg.replace(/–/g,"-");
//--- any left/right quotes to a single quote
msg = msg.replace(/’/g,"\'");
msg = msg.replace(/“/g,"\'");
msg = msg.replace(/”/g,"\'");
//--- remove non-breaking spaces
msg = msg.replace(/ /gi," ");
//-- strip html tags from text (courtesy of Michael Winter at
comp.lang.javascript newsgroup)
msg = msg.replace( /<[^<>]+>/g, '' );
..
..
..
..
<script>
function ShowMsg()
{
//--- display a message. Do not break/split a word.
var ct = 200; //--- max. characters
var msg = new String();
msg = "<%=msg%>";
//--- move back to first space character.
while( ct > 0 && msg.charAt(ct) != " ") ct--;
document.write( msg.substr(0,ct) + "..." );
}
</script>
------------ sample text ------------
<P><FONT face="arial, helvetica, sans-serif">Dear
All,</FONT></P>\r\n<P><FONT face="arial, helvetica,
sans-serif">2003 will soon be nothing more than a memory. But to
my mind, this last year will continue to live on as an "annus
mirabilis" - year of wonders. </FONT></P>\r\n<P><FONT
face="arial, helvetica, sans-serif">And it has been
wonderful - our staff and students really covered themselves in glory
during 2003, with awards and accolades coming from virtually every
quarter. But we all know that awards only tell part of the story. What
made this last year “truly wonderful” was the fact that
the Institute achieved so much, in spite of a host of challenges and
uncertainties. We were able to succeed because of one simple fact
– our fantastic staff. All staff regularly did more with less
and continued to provide the very best in vocational education and
training. Thank you for all your hard work.</FONT></P>\r\n<P><FONT
face="arial, helvetica, sans-serif">In many ways, the coming
year will mark the beginning of profound changes to the way in which
Sydney Institute operates. Staff numbers will increase. Reporting
lines and responsibilities will change. Our business and work culture
will have to adapt to new circumstances, personalities and
opportunities. It will be a challenge. However, I am confident we will
meet these challenges in the same way TAFE has coped with change for
over 110 years – with professionalism and dedication. Those
qualities made 2003 a year to remember and I know that 2004
won’t be any different.</FONT></P>\r\n<P><FONT face="arial,
helvetica, sans-serif">Thank you again for all your efforts
during this last year. I look forward to 2004 with anticipation. I
hope you have a safe and happy holiday
season.</FONT></P>\r\n<P><BR><FONT face="arial, helvetica,
sans-serif">
-------------------------------
---------------------------------------------------------------
jn****@yourpantsbigpond.net.au : Remove your pants to reply
---------------------------------------------------------------