473,322 Members | 1,806 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Dealing with Word special characters


I have a database that has been populated with content pasted out of MS
Word, and is full of special characters -- em dashes, curly quotes, curly
apostrophes, etc. Now I'm generating plain text email summaries out of the
database and of course those special chars appear as garbage chars in the
emails.

How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?

Thanks,
Scot

Jul 17 '05 #1
6 3305
On Wed, 24 Nov 2004 19:00:33 GMT, Scot Hacker <sh*****@birdhouse.org> wrote:

I have a database that has been populated with content pasted out of MS
Word, and is full of special characters -- em dashes, curly quotes, curly
apostrophes, etc. Now I'm generating plain text email summaries out of the
database and of course those special chars appear as garbage chars in the
emails.

How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?


You might want to check out the wordDocumentHandler class.
http://psbweb.mirrors.phpclasses.org...kage/1352.html

Jul 17 '05 #2
On 11/24/04 11:22 AM, in article 8n********************************@4ax.com,
"us****@isotopeREEMOOVEmedia.com" <us****@isotopeREEMOOVEmedia.com> wrote:
You might want to check out the wordDocumentHandler class.
http://psbweb.mirrors.phpclasses.org...kage/1352.html


Hmm... That sounded promising, but then I found this comment in the header:

// Of course, you need MsWord installed on the server, so Windows OS.

Not an option here. Also, that class seems to want an actual Word doc as
input, and outputs text or html files. All I want to do is examine the
contents of a variable for typical Word special characters and transform
them (IOTW I don't need document I/O, just a quick filter to scan text for
the funky chars).

Thanks,
Scot

Jul 17 '05 #3
Scot Hacker wrote:
How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?


Have a look at
http://www.ph.net/htmlentities
--
Mail sent to my "From:" address is publicly readable at http://www.dodgeit.com/
== ** ## !! !! ## ** ==
TEXT-ONLY mail to the complete "Reply-To:" address ("My Name" <my@address>) may
bypass the spam filter. I will answer all pertinent mails from a valid address.
Jul 17 '05 #4
Pedro Graca wrote:
Scot Hacker wrote:
How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?


Have a look at
http://www.ph.net/htmlentities


Of course I meant
http://www.php.net/htmlentities
example, written on the command-line

php$ php -r 'echo htmlentities("João Graça"), "\n";'
Jo&atilde;o Gra&ccedil;a

php$

--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
Jul 17 '05 #5
.oO(Pedro Graca)
Scot Hacker wrote:
How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?


Have a look at
http://www.ph.net/htmlentities


The OP wants to print plain text, no HTML.

Micha
Jul 17 '05 #6
Scot Hacker wrote:
I have a database that has been populated with content pasted out of MS
Word, and is full of special characters -- em dashes, curly quotes, curly
apostrophes, etc. Now I'm generating plain text email summaries out of the
database and of course those special chars appear as garbage chars in the
emails.

How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?

Thanks,
Scot

I've managed to do just that using a call to antiword.
http://www.winfield.demon.nl/

$cmd = "/usr/local/bin/antiword -t " . $filename . " > " . $txt_file ;
system($cmd);
$document = mysql_real_escape_string(htmlentities("<pre>" .
fread(fopen($txt_file, "r"), filesize($txt_file)) . "</pre>"));
then store $document in the database. Gets a bit ugly at times, but
works :-)

Sacs

Jul 17 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: John van Terheijden | last post by:
Hi. I'm trying to develop a program that uses XML files store data. I'm using Windows XP, Apache 1.3.29 and PHP 4.3.4. Right now the XML file is read using the xml_parser_create(),...
3
by: Barry Olly | last post by:
Hi, I'm working on a mini content management system and need help with dealing with special characters. The input are taken from html form which are then stored into a varchar column in...
4
by: Ewok | last post by:
let me just say. it's not by choice but im dealing with a .net web app (top down approach with VB and a MySQL database) sigh..... Anyhow, I've just about got all the kinks worked out but I am...
5
by: Sakharam Phapale | last post by:
Hi All, I am using an API function, which takes file path as an input. When file path contains special characters (@,#,$,%,&,^, etc), API function gives an error as "Unable to open input file"....
25
by: Wim Cossement | last post by:
Hello, I was wondering if there are a few good pages and/or examples on how to process form data correctly for putting it in a MySQL DB. Since I'm not used to using PHP a lot, I already found...
11
by: ronrsr | last post by:
I have an MySQL database called zingers. The structure is: zid - integer, key, autoincrement keyword - varchar citation - text quotation - text the encoding and collation is utf-8
1
by: rogoflap | last post by:
I have some regular text I export to a word document. I build this in VBA in Access and want dump it into word. I can do this, but would like to know how I can turn on an off bolding or...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.