473,396 Members | 1,702 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Cleaning MS Word input - last resort!!

Dear all,

I have a problem with a form, and I have tried various permutations of
htmlentities() and html_entity_decode() to resolve, but without success.

Here is the workflow.

1: User pastes MS Word formatted text into form field.
2: Server uses mail() to send input text to mail client.
3: Recipient pastes text into html file.

The problem is that MS Word contains peculiar characters for things like
bullets, which come out as tabs, which then come out as different, but
spurious, html characters in the html translation.

Does anyone know of a function(s) that can clean up MS Word input into
something that can be simply pasted as plain text without spurious
characters?

Turner
Feb 21 '06 #1
3 9602
Il se trouve que turnitup a formulé :
Dear all,

I have a problem with a form, and I have tried various permutations of
htmlentities() and html_entity_decode() to resolve, but without success.

Here is the workflow.

1: User pastes MS Word formatted text into form field.
2: Server uses mail() to send input text to mail client.
3: Recipient pastes text into html file.

The problem is that MS Word contains peculiar characters for things like
bullets, which come out as tabs, which then come out as different, but
spurious, html characters in the html translation.

Does anyone know of a function(s) that can clean up MS Word input into
something that can be simply pasted as plain text without spurious
characters?

Turner


From a comment on the PHP documentation for the utf8_decode() function
http://us2.php.net/manual/en/function.utf8-decode.php
peter dot mescalchin at geemail dot com
27-Dec-2005 06:43

Adding to below I have a few more MS word characters that need
replacing. Found this was required when "fixing" some phpmyadmin export
scripts from a live server where MS word characters were all through
the
content - before importing them back into my local mySQL database.

The code I wrote for this process also does a strpos for any extra
"\\xe2\\x80" strings - which are the tell-tale sign of any funny
characters I want removed.

Here are my updated arrays()

<?php
$badchr = array(
"\\xe2\\x80\\xa6", // ellipsis
"\\xe2\\x80\\x93", // long dash
"\\xe2\\x80\\x94", // long dash
"\\xe2\\x80\\x98", // single quote opening
"\\xe2\\x80\\x99", // single quote closing
"\\xe2\\x80\\x9c", // double quote opening
"\\xe2\\x80\\x9d", // double quote closing
"\\xe2\\x80\\xa2" // dot used for bullet points
);

$goodchr = array(
'...',
'-',
'-',
'\\'',
'\\'',
'"',
'"',
'*'
);
?>
--
Julien CROUZET - DSI Theoconcept
julien.crouzet@/enlever ca\theoconcept.com
http://www.theoconcept.com
Feb 21 '06 #2
turnitup wrote:
Dear all,

I have a problem with a form, and I have tried various permutations of
htmlentities() and html_entity_decode() to resolve, but without success.

Here is the workflow.

1: User pastes MS Word formatted text into form field.
2: Server uses mail() to send input text to mail client.
3: Recipient pastes text into html file.

The problem is that MS Word contains peculiar characters for things like
bullets, which come out as tabs, which then come out as different, but
spurious, html characters in the html translation.

Does anyone know of a function(s) that can clean up MS Word input into
something that can be simply pasted as plain text without spurious
characters?


tidy perhaps?

http://us3.php.net/manual/en/ref.tidy.php

http://www.zend.com/php5/articles/php5-tidy.php

http://www.w3.org/People/Raggett/tidy/

--
Justin Koivisto, ZCE - ju****@koivi.com
http://koivi.com
Feb 21 '06 #3
Julien CROUZET wrote:
Il se trouve que turnitup a formulé :
Dear all,

I have a problem with a form, and I have tried various permutations of
htmlentities() and html_entity_decode() to resolve, but without success.

Here is the workflow.

1: User pastes MS Word formatted text into form field.
2: Server uses mail() to send input text to mail client.
3: Recipient pastes text into html file.

The problem is that MS Word contains peculiar characters for things
like bullets, which come out as tabs, which then come out as
different, but spurious, html characters in the html translation.

Does anyone know of a function(s) that can clean up MS Word input into
something that can be simply pasted as plain text without spurious
characters?

Turner


From a comment on the PHP documentation for the utf8_decode() function
http://us2.php.net/manual/en/function.utf8-decode.php
peter dot mescalchin at geemail dot com
27-Dec-2005 06:43

Adding to below I have a few more MS word characters that need
replacing. Found this was required when "fixing" some phpmyadmin export
scripts from a live server where MS word characters were all through the
content - before importing them back into my local mySQL database.

The code I wrote for this process also does a strpos for any extra
"\\xe2\\x80" strings - which are the tell-tale sign of any funny
characters I want removed.

Here are my updated arrays()

<?php
$badchr = array(
"\\xe2\\x80\\xa6", // ellipsis
"\\xe2\\x80\\x93", // long dash
"\\xe2\\x80\\x94", // long dash
"\\xe2\\x80\\x98", // single quote opening
"\\xe2\\x80\\x99", // single quote closing
"\\xe2\\x80\\x9c", // double quote opening
"\\xe2\\x80\\x9d", // double quote closing
"\\xe2\\x80\\xa2" // dot used for bullet points
);

$goodchr = array(
'...',
'-',
'-',
'\\'',
'\\'',
'"',
'"',
'*'
);
?>


Merci!!
Feb 25 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: kbperry | last post by:
Hi all, Background: I need some help. I am trying to streamline a process for one of our technical writers. He is using Perforce (version control system), and is constantly changing his word...
9
by: sherifffruitfly | last post by:
Hi, I've a got a little (exercise) program that reads data from a file and puts it into struct members. I run into trouble when one of the data pieces is comprised of several words (eg "john...
4
by: joesin | last post by:
I recently found a vulnerability on my website that allowed sql injection. I have been trying to write some code that would clean user data but have been running into problems. The validation still...
6
by: SteveM | last post by:
Hi, I am needing some help/advice on how to display a word document in my ASP.NET web pages that can update itself from a word document located on the server. The idea here is that when the user...
3
by: Robertf987 | last post by:
Well, I think I've described what I want to do in the title here. In the database, I have two main tables that contain the main data for the database. One for group expenditures, another for...
34
by: vectorBS | last post by:
I am facing a serious issue. The current data appears as follows and in entered in the same manner. Garment Style Size Total Qty UCPJ 1 32 12 UCPJ 1 30 55 UCT 1 S 25 UCT 1 L 100
8
by: Zhang Weiwu | last post by:
hello. Is it possible to design CSS in the way that content in <inputare not visible in print out (a.k.a. value of <inputnot visible) while the border remain visible? trial: input {...
209
by: arnuld | last post by:
I searched the c.l.c archives provided by Google as Google Groups with "word input" as the key words and did not come up with anything good. C++ has std::string for taking a word as input from...
3
tpgames
by: tpgames | last post by:
I do not understand why this code does not work? It will show ????? for the word length, but will does not actually access the individual letters within the word list. I enter the vowels as guesses,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.