By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,887 Members | 954 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,887 IT Pros & Developers. It's quick & easy.

Stripping MS Word code from my forms once and for all.

P: n/a

Hi,

I have a form that allows users to comment, add entries and so on.
But what a lot of them do is copy and paste directly from MS Word to my
forms.

almost all browsers will accept the post and give the impression that
everything is saved properly.

But, that is not the case when it comes time to displaying the message
in my page.

So how can I strip/replace all the MS Word invalid code from my
$_POSTs?

Thanks

FFMG
--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

Sep 15 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a
I found this on php.net at http://uk2.php.net/strtr which may be of
some help:


After battling with strtr trying to strip out MS word formatting from
things pasted into forms I ended up coming up with this..

it strips ALL non-standard ascii characters, preserving html codes and
such, but gets rid of all the characters that refuse to show in
firefox.

If you look at this page in firefox you will see a ton of "question
mark" characters and so it is not possible to copy and paste those to
remove them from strings.. (this fixes that issue nicely, though I
admit it could be done a bit better)

<?
function fixoutput($str){
$good[] = 9; #tab
$good[] = 10; #nl
$good[] = 13; #cr
for($a=32;$a<127;$a++){
$good[] = $a;
}
$len = strlen($str);
for($b=0;$b < $len+1; $b++){
if(in_array(ord($str[$b]), $good)){
$newstr .= $str[$b];
}//fi
}//rof
return $newstr;
}
?>

Sep 16 '07 #2

P: n/a

Sanders Kaufman;92056 Wrote:
FFMG wrote:
So how can I strip/replace all the MS Word invalid code from my
$_POSTs?

I presume you're referring to all the MS Office XML markup.
That's actually good stuff, sometimes.
No, sorry I was actually talking about some non standard characters
that MS Words inserts.

Some bowser will, (maybe wrongly), not display any invalid characters
in the textarea itself giving the user the impression that everything
is fine.

But when I then try to display the comment/entry I get a bunch of
questions marks for the characters that were invalid.

FFMG
--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

Sep 17 '07 #3

P: n/a
FFMG wrote:
Sanders Kaufman;92056 Wrote:
>FFMG wrote:
>>So how can I strip/replace all the MS Word invalid code from my
$_POSTs?
I presume you're referring to all the MS Office XML markup.
That's actually good stuff, sometimes.

No, sorry I was actually talking about some non standard characters
that MS Words inserts.

Some bowser will, (maybe wrongly), not display any invalid characters
in the textarea itself giving the user the impression that everything
is fine.

But when I then try to display the comment/entry I get a bunch of
questions marks for the characters that were invalid.
Ah, so. You're having a character set problem.
Rather than have a big old off-topic thread about it here, you should
probably take the question to an Office or HTML group.
PHP won't help you much.
Sep 17 '07 #4

P: n/a

Sanders Kaufman;92237 Wrote:
>
No, sorry I was actually talking about some non standard characters
that MS Words inserts.

Some bowser will, (maybe wrongly), not display any invalid
characters
in the textarea itself giving the user the impression that
everything
is fine.

But when I then try to display the comment/entry I get a bunch of
questions marks for the characters that were invalid.

Ah, so. You're having a character set problem.
Rather than have a big old off-topic thread about it here, you should
probably take the question to an Office or HTML group.
PHP won't help you much. No I am not, read the question again, carefully this time.
Textareas of most browsers will, (wrongly), accept MS Word pasted
code.

By the time it gets to my server I have to clean it up.
My PHP code must handle it.

Is that on topic enough for you?

FFMG
--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

Sep 17 '07 #5

P: n/a
FFMG wrote:
Sanders Kaufman;92237 Wrote:
>>No, sorry I was actually talking about some non standard characters
that MS Words inserts.

Some bowser will, (maybe wrongly), not display any invalid
characters
>>in the textarea itself giving the user the impression that
everything
>>is fine.

But when I then try to display the comment/entry I get a bunch of
questions marks for the characters that were invalid.
Ah, so. You're having a character set problem.
Rather than have a big old off-topic thread about it here, you should
probably take the question to an Office or HTML group.
PHP won't help you much.

No I am not, read the question again, carefully this time.
Textareas of most browsers will, (wrongly), accept MS Word pasted
code.

By the time it gets to my server I have to clean it up.
My PHP code must handle it.

Is that on topic enough for you?

FFMG

Yes, this has been asked before - but I don't remember what the answer was.

The easiest way would be to check for non-alphanumeric chars using a
regex. If you find any, tell the user to use plain text editor.

You could use a regex to strip non-alphanumeric characters, but this
might have some problems. For instance, what happens if you have a
control sequence which happens to contain a character - i.e. 0x010231?
The 0x42 would be taken as the character '1', even though it's part of a
control sequence. But you could clean it up fairly well this way.

Try googling this newsgroup for something like "MS WORD". It's been a
few months.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Sep 18 '07 #6

P: n/a
FFMG wrote:
Sanders Kaufman;92371 Wrote:
>That means that the help you need is with Office and HTML, not PHP.

Well, I tend to disagree.
Because I am trying to process data in PHP I think that asking fellow
programmers on the PHP group for input is not as off-topic as you
think.
How's that workin' out for ya, champ?
Have you noticed the roar of silence in response to your original request?

Seriously - you'll get a better response in an HTML or MS Office group.

Is your suggestion to convert to an MS Office charset, (even if the
user did not use MS Word), and then convert it back as needed?
Would stripping the MS chars not be faster/better?
There are no such things as "MS characters" or an MS Office Character Set.
Sep 18 '07 #7

P: n/a

Sanders Kaufman;92428 Wrote:
FFMG wrote:
Sanders Kaufman;92371 Wrote:
That means that the help you need is with Office and HTML, not PHP.
Well, I tend to disagree.
Because I am trying to process data in PHP I think that asking
fellow
programmers on the PHP group for input is not as off-topic as you
think.

How's that workin' out for ya, champ?
...
Read the thread, the answer was given.

I see you could not answer the question so you have to start using
abusive language.

Shame.

FFMG
--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

Sep 22 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.