Connecting Tech Pros Worldwide Forums | Help | Site Map

Stripping MS Word code from my forms once and for all.

FFMG
Guest
 
Posts: n/a
#1: Sep 15 '07

Hi,

I have a form that allows users to comment, add entries and so on.
But what a lot of them do is copy and paste directly from MS Word to my
forms.

almost all browsers will accept the post and give the impression that
everything is saved properly.

But, that is not the case when it comes time to displaying the message
in my page.

So how can I strip/replace all the MS Word invalid code from my
$_POSTs?

Thanks

FFMG


--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

macca
Guest
 
Posts: n/a
#2: Sep 16 '07

re: Stripping MS Word code from my forms once and for all.


I found this on php.net at http://uk2.php.net/strtr which may be of
some help:




After battling with strtr trying to strip out MS word formatting from
things pasted into forms I ended up coming up with this..

it strips ALL non-standard ascii characters, preserving html codes and
such, but gets rid of all the characters that refuse to show in
firefox.

If you look at this page in firefox you will see a ton of "question
mark" characters and so it is not possible to copy and paste those to
remove them from strings.. (this fixes that issue nicely, though I
admit it could be done a bit better)

<?
function fixoutput($str){
$good[] = 9; #tab
$good[] = 10; #nl
$good[] = 13; #cr
for($a=32;$a<127;$a++){
$good[] = $a;
}
$len = strlen($str);
for($b=0;$b < $len+1; $b++){
if(in_array(ord($str[$b]), $good)){
$newstr .= $str[$b];
}//fi
}//rof
return $newstr;
}
?>

FFMG
Guest
 
Posts: n/a
#3: Sep 17 '07

re: Stripping MS Word code from my forms once and for all.



Sanders Kaufman;92056 Wrote:
Quote:
FFMG wrote:
>
Quote:
So how can I strip/replace all the MS Word invalid code from my
$_POSTs?
>
I presume you're referring to all the MS Office XML markup.
That's actually good stuff, sometimes.
>
No, sorry I was actually talking about some non standard characters
that MS Words inserts.

Some bowser will, (maybe wrongly), not display any invalid characters
in the textarea itself giving the user the impression that everything
is fine.

But when I then try to display the comment/entry I get a bunch of
questions marks for the characters that were invalid.

FFMG


--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

Sanders Kaufman
Guest
 
Posts: n/a
#4: Sep 17 '07

re: Stripping MS Word code from my forms once and for all.


FFMG wrote:
Quote:
Sanders Kaufman;92056 Wrote:
Quote:
>FFMG wrote:
>>
Quote:
>>So how can I strip/replace all the MS Word invalid code from my
>>$_POSTs?
>I presume you're referring to all the MS Office XML markup.
>That's actually good stuff, sometimes.
>>
>
No, sorry I was actually talking about some non standard characters
that MS Words inserts.
>
Some bowser will, (maybe wrongly), not display any invalid characters
in the textarea itself giving the user the impression that everything
is fine.
>
But when I then try to display the comment/entry I get a bunch of
questions marks for the characters that were invalid.
Ah, so. You're having a character set problem.
Rather than have a big old off-topic thread about it here, you should
probably take the question to an Office or HTML group.
PHP won't help you much.
FFMG
Guest
 
Posts: n/a
#5: Sep 17 '07

re: Stripping MS Word code from my forms once and for all.



Sanders Kaufman;92237 Wrote:
Quote:
>
Quote:
No, sorry I was actually talking about some non standard characters
that MS Words inserts.

Some bowser will, (maybe wrongly), not display any invalid
characters
Quote:
in the textarea itself giving the user the impression that
everything
Quote:
is fine.

But when I then try to display the comment/entry I get a bunch of
questions marks for the characters that were invalid.[/color]
>
Ah, so. You're having a character set problem.
Rather than have a big old off-topic thread about it here, you should
probably take the question to an Office or HTML group.
PHP won't help you much.
No I am not, read the question again, carefully this time.
Textareas of most browsers will, (wrongly), accept MS Word pasted
code.

By the time it gets to my server I have to clean it up.
My PHP code must handle it.

Is that on topic enough for you?

FFMG


--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

Jerry Stuckle
Guest
 
Posts: n/a
#6: Sep 18 '07

re: Stripping MS Word code from my forms once and for all.


FFMG wrote:
Quote:
Sanders Kaufman;92237 Wrote:
Quote:
Quote:
>>No, sorry I was actually talking about some non standard characters
>>that MS Words inserts.
>>>
>>Some bowser will, (maybe wrongly), not display any invalid
>characters
Quote:
>>in the textarea itself giving the user the impression that
>everything
Quote:
>>is fine.
>>>
>>But when I then try to display the comment/entry I get a bunch of
>>questions marks for the characters that were invalid.[/color]
>Ah, so. You're having a character set problem.
>Rather than have a big old off-topic thread about it here, you should
>probably take the question to an Office or HTML group.
>PHP won't help you much.
>
No I am not, read the question again, carefully this time.
Textareas of most browsers will, (wrongly), accept MS Word pasted
code.
>
By the time it gets to my server I have to clean it up.
My PHP code must handle it.
>
Is that on topic enough for you?
>
FFMG
>
>
Yes, this has been asked before - but I don't remember what the answer was.

The easiest way would be to check for non-alphanumeric chars using a
regex. If you find any, tell the user to use plain text editor.

You could use a regex to strip non-alphanumeric characters, but this
might have some problems. For instance, what happens if you have a
control sequence which happens to contain a character - i.e. 0x010231?
The 0x42 would be taken as the character '1', even though it's part of a
control sequence. But you could clean it up fairly well this way.

Try googling this newsgroup for something like "MS WORD". It's been a
few months.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Sanders Kaufman
Guest
 
Posts: n/a
#7: Sep 18 '07

re: Stripping MS Word code from my forms once and for all.


FFMG wrote:
Quote:
Sanders Kaufman;92371 Wrote:
Quote:
Quote:
>That means that the help you need is with Office and HTML, not PHP.
>
Well, I tend to disagree.
Because I am trying to process data in PHP I think that asking fellow
programmers on the PHP group for input is not as off-topic as you
think.
How's that workin' out for ya, champ?
Have you noticed the roar of silence in response to your original request?

Seriously - you'll get a better response in an HTML or MS Office group.

Quote:
Is your suggestion to convert to an MS Office charset, (even if the
user did not use MS Word), and then convert it back as needed?
Would stripping the MS chars not be faster/better?
There are no such things as "MS characters" or an MS Office Character Set.
FFMG
Guest
 
Posts: n/a
#8: Sep 22 '07

re: Stripping MS Word code from my forms once and for all.



Sanders Kaufman;92428 Wrote:
Quote:
FFMG wrote:
Quote:
Sanders Kaufman;92371 Wrote:
>
Quote:
Quote:
That means that the help you need is with Office and HTML, not PHP.
Well, I tend to disagree.
Because I am trying to process data in PHP I think that asking
fellow
Quote:
programmers on the PHP group for input is not as off-topic as you
think.
>
How's that workin' out for ya, champ?
...
>
Read the thread, the answer was given.

I see you could not answer the question so you have to start using
abusive language.

Shame.

FFMG


--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

Closed Thread