Connecting Tech Pros Worldwide Forums | Help | Site Map

Treating text copied from MS Word

+mrcakey
Guest
 
Posts: n/a
#1: Jul 9 '08
I've built a MySQL database for a client and a web interface to be able to
add/edit/delete records in it. When he's adding stuff to the database he's
copying text from MS Word. I've tried various substitutions that I've found
hanging around the internet, but nothing's working for the "long dash" that
it insists on converting normal hyphens to.

This morning I did a bin2hex to see exactly what was being sent from $_POST:

A - long dash -.

41 20 >>>e2 80 93<<< 20 6c 6f 6e 67 20 64 61 73 68 20 2d 2e 20 20

The offending character is the one I've highlighted. As far as I can tell,
it should be getting found by this -

"\\xe2\\x80\\x93", // long dash

but it isn't, which makes me think there's something wrong with the code
I've copied. How to find the hex string? I've tried "\xe2\x80\x93" and
"\xe2x80x93" in addition, but to no avail.

Is driving me scatty!!!

Any help much appreciated.

$search = array( chr(145),
chr(146),
chr(147),
chr(148),
chr(151),
chr(196),
'?o', // left side double smart quote
'?', // right side double smart quote
'?~', // left side single smart quote
'?T', // right side single smart quote
'?', // elipsis
'?"', // em dash
'?"', // en dash
"\\xe2\\x80\\xa6", // ellipsis
"\\xe2\\x80\\x93", // long dash
"\\xe2\\x80\\x94", // long dash
"\\xe2\\x80\\x9c", // double quote opening
"\\xe2\\x80\\x9d", // double quote closing
"\\xe2\\x80\\xa2" // dot used for bullet points
);
$replace = array( "'",
"'",
'"',
'"',
'-',
'-',
'"',
'"',
"'",
"'",
"&hellip;",
"-",
"-",
'&hellip;',
'-',
'-',
'"',
'"',
'*'
);
ECHO '<p>'.BIN2HEX( $_POST['short_desc'] ).'</p>';
$short_desc = STR_REPLACE($search, $replace, $_POST['short_desc']);

+mrcakey



C. (http://symcbean.blogspot.com/)
Guest
 
Posts: n/a
#2: Jul 10 '08

re: Treating text copied from MS Word


On Jul 9, 12:03*pm, "+mrcakey" <webmas...@listyblue.comwrote:
Quote:
I've built a MySQL database for a client and a web interface to be able to
add/edit/delete records in it. *When he's adding stuff to the database he's
copying text from MS Word. *I've tried various substitutions that I've found
hanging around the internet, but nothing's working for the "long dash" that
it insists on converting normal hyphens to.
>
This morning I did a bin2hex to see exactly what was being sent from $_POST:
>
A - long dash -.
>
41 20 >>>e2 80 93<<< 20 6c 6f 6e 67 20 64 61 73 68 20 2d 2e 20 20
>
The offending character is the one I've highlighted. *As far as I can tell,
it should be getting found by this -
>
"\\xe2\\x80\\x93", // long dash
>
but it isn't, which makes me think there's something wrong with the code
I've copied. *How to find the hex string? *I've tried "\xe2\x80\x93" and
"\xe2x80x93" in addition, but to no avail.
>
<snip>

Not really a PHP question - configure your webserver to use a 7 bit
charset.

C.
I V
Guest
 
Posts: n/a
#3: Jul 11 '08

re: Treating text copied from MS Word


On Wed, 09 Jul 2008 12:03:57 +0100, +mrcakey wrote:
Quote:
The offending character is the one I've highlighted. As far as I can
tell, it should be getting found by this -
>
"\\xe2\\x80\\x93", // long dash
You want to use one backslash here, not two. But, rather than specifying
the search-and-replace yourself, it's probably easier to use
htmlentities. You need to know what encoding your data has been sent in
(it looks, from your post, like you're receiving UTF-8), and do, like so:

$short_desc = htmlentities($_POST['short_desc'], ENT_COMPAT, 'UTF-8');
C. (http://symcbean.blogspot.com/)
Guest
 
Posts: n/a
#4: Jul 13 '08

re: Treating text copied from MS Word


On Jul 10, 5:07*pm, "C. (http://symcbean.blogspot.com/)"
<colin.mckin...@gmail.comwrote:
Quote:
On Jul 9, 12:03*pm, "+mrcakey" <webmas...@listyblue.comwrote:
>
Quote:
I've built a MySQL database for a client and a web interface to be ableto
add/edit/delete records in it. *When he's adding stuff to the database he's
copying text from MS Word. *I've tried various substitutions that I've found
hanging around the internet, but nothing's working for the "long dash" that
it insists on converting normal hyphens to.
>
Quote:
This morning I did a bin2hex to see exactly what was being sent from $_POST:
>
Quote:
A - long dash -.
>
Quote:
41 20 >>>e2 80 93<<< 20 6c 6f 6e 67 20 64 61 73 68 20 2d 2e 20 20
>
Quote:
The offending character is the one I've highlighted. *As far as I cantell,
it should be getting found by this -
>
Quote:
"\\xe2\\x80\\x93", // long dash
>
Quote:
but it isn't, which makes me think there's something wrong with the code
I've copied. *How to find the hex string? *I've tried "\xe2\x80\x93" and
"\xe2x80x93" in addition, but to no avail.
>
<snip>
>
Not really a PHP question - configure your webserver to use a 7 bit
charset.
>
C.
Sorry - bum steer. Apparrently MSIE is (once again) completely broken
in this regard. There is a hack though - see
http://www.crazysquirrel.com/computi...-encoding.jspx

C.
Closed Thread