By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,304 Members | 1,242 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,304 IT Pros & Developers. It's quick & easy.

character conversion from MS Word to HTML

P: n/a

Here's a brief description of the problem. My organization has a
client who cuts and pastes information from Microsoft Word documents
into web-based forms, whose contents is then displayed on a website. I
wish to convert the special characters, such as ellipses and trademark
symbols (and whatever else Word might throw at us) into a proper HTML
entity (™) or character reference (®) if the entity does
not exist.

Before you make any suggestions, let me share a brief overview of my
previous attempts at a solution so neither of us wastes his time.
Right now, I'm using a combination of the character map returned by
get_html_translation_table(HTML_ENTITIES) and some kludgy code which
manually maps the Unicode value of an MS Word special character to its
HTML equivalent. For example,

$replace_array[chr(226).chr(128).chr(152)] = "‘" ;

I'd like to be able to do the above operation automatically / across
the board for wacky Word characters. I suspect I may need to use the
mbstring functions. If you have any advice, I'm happy to send helpful
folks some chocolate for their troubles.

Feb 19 '07 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.