473,405 Members | 2,185 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

character conversion from MS Word to HTML


Here's a brief description of the problem. My organization has a
client who cuts and pastes information from Microsoft Word documents
into web-based forms, whose contents is then displayed on a website. I
wish to convert the special characters, such as ellipses and trademark
symbols (and whatever else Word might throw at us) into a proper HTML
entity (™) or character reference (®) if the entity does
not exist.

Before you make any suggestions, let me share a brief overview of my
previous attempts at a solution so neither of us wastes his time.
Right now, I'm using a combination of the character map returned by
get_html_translation_table(HTML_ENTITIES) and some kludgy code which
manually maps the Unicode value of an MS Word special character to its
HTML equivalent. For example,

$replace_array[chr(226).chr(128).chr(152)] = "‘" ;

I'd like to be able to do the above operation automatically / across
the board for wacky Word characters. I suspect I may need to use the
mbstring functions. If you have any advice, I'm happy to send helpful
folks some chocolate for their troubles.

Feb 19 '07 #1
0 1506

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: John van Terheijden | last post by:
Hi. I'm trying to make a conversion algorithm that colors even and odd words in a HTML string with <div> tags. // input text with all sorts of HTML tags and whitespace $str = "<h1>Title...
7
by: WindAndWaves | last post by:
Hi Folk Here I am writing my first php / mysql site, almost ready, and now this... charactersets.... The encoding that I use on my webpage is: <META HTTP-EQUIV="content-type"...
14
by: Dylan | last post by:
Here's what I'm trying to do: - scrape some html content from various sources The issue I'm running to: - some of the sources have incorrectly encoded characters... for example, cp1252...
9
by: jmev7 | last post by:
I'm in the US, and have to constantly take data input from other countries. Some of this data has characters which I can't understand, since it's input from other language keyboards. This prevents...
8
by: Perception | last post by:
Hello all, If I have a C-like data structure such that struct Data { int a; //16-bit value char; //3 ASCII characters int b; //32-bit value int c; //24-bit value }
12
by: RadekP | last post by:
Question : Is there any 100% managed-code API component on the market that allows DOC/HTML -> PDF conversions similar to the activePDF wrapper (http://www.activepdf.com) ? I would like to stress...
7
by: webgreginsf | last post by:
Hello, I tried the following post a few weeks ago and never received any replies, so I figured I'd try again. I'm seeking suggestions for an interesting problem I have. I'm building a web...
40
by: Shmuel (Seymour J.) Metz | last post by:
I'd like to include some Hebrew names in a web page. HTML 4 doesn't appear to include character attributes for ISO-8859-8. I'd prefer avoiding numeric references, e.g.,...
0
by: Lou Evart | last post by:
DOCUMENT CONVERSION SERVICES Softline International (SII) operates one of the industry's largest document and data conversion service bureaus. In the past year, SII converted over a million...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.