473,739 Members | 6,655 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Dealing with Word special characters


I have a database that has been populated with content pasted out of MS
Word, and is full of special characters -- em dashes, curly quotes, curly
apostrophes, etc. Now I'm generating plain text email summaries out of the
database and of course those special chars appear as garbage chars in the
emails.

How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?

Thanks,
Scot

Jul 17 '05 #1
6 3324
On Wed, 24 Nov 2004 19:00:33 GMT, Scot Hacker <sh*****@birdho use.org> wrote:

I have a database that has been populated with content pasted out of MS
Word, and is full of special characters -- em dashes, curly quotes, curly
apostrophes, etc. Now I'm generating plain text email summaries out of the
database and of course those special chars appear as garbage chars in the
emails.

How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?


You might want to check out the wordDocumentHan dler class.
http://psbweb.mirrors.phpclasses.org...kage/1352.html

Jul 17 '05 #2
On 11/24/04 11:22 AM, in article 8n************* *************** ****@4ax.com,
"us****@isotope REEMOOVEmedia.c om" <us****@isotope REEMOOVEmedia.c om> wrote:
You might want to check out the wordDocumentHan dler class.
http://psbweb.mirrors.phpclasses.org...kage/1352.html


Hmm... That sounded promising, but then I found this comment in the header:

// Of course, you need MsWord installed on the server, so Windows OS.

Not an option here. Also, that class seems to want an actual Word doc as
input, and outputs text or html files. All I want to do is examine the
contents of a variable for typical Word special characters and transform
them (IOTW I don't need document I/O, just a quick filter to scan text for
the funky chars).

Thanks,
Scot

Jul 17 '05 #3
Scot Hacker wrote:
How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?


Have a look at
http://www.ph.net/htmlentities
--
Mail sent to my "From:" address is publicly readable at http://www.dodgeit.com/
== ** ## !! !! ## ** ==
TEXT-ONLY mail to the complete "Reply-To:" address ("My Name" <my@address>) may
bypass the spam filter. I will answer all pertinent mails from a valid address.
Jul 17 '05 #4
Pedro Graca wrote:
Scot Hacker wrote:
How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?


Have a look at
http://www.ph.net/htmlentities


Of course I meant
http://www.php.net/htmlentities
example, written on the command-line

php$ php -r 'echo htmlentities("J oão Graça"), "\n";'
Jo&atilde;o Gra&ccedil;a

php$

--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
Jul 17 '05 #5
.oO(Pedro Graca)
Scot Hacker wrote:
How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?


Have a look at
http://www.ph.net/htmlentities


The OP wants to print plain text, no HTML.

Micha
Jul 17 '05 #6
Scot Hacker wrote:
I have a database that has been populated with content pasted out of MS
Word, and is full of special characters -- em dashes, curly quotes, curly
apostrophes, etc. Now I'm generating plain text email summaries out of the
database and of course those special chars appear as garbage chars in the
emails.

How can I filter the extracted text and transform these characters into
plain text equivalents? Is there a builtin function for this, external class
available, or do I need to try and hack it out from scratch?

Thanks,
Scot

I've managed to do just that using a call to antiword.
http://www.winfield.demon.nl/

$cmd = "/usr/local/bin/antiword -t " . $filename . " > " . $txt_file ;
system($cmd);
$document = mysql_real_esca pe_string(htmle ntities("<pre>" .
fread(fopen($tx t_file, "r"), filesize($txt_f ile)) . "</pre>"));
then store $document in the database. Gets a bit ugly at times, but
works :-)

Sacs

Jul 17 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
17718
by: John van Terheijden | last post by:
Hi. I'm trying to develop a program that uses XML files store data. I'm using Windows XP, Apache 1.3.29 and PHP 4.3.4. Right now the XML file is read using the xml_parser_create(), xml_set_element_handler() etc. functions. I have difficulties with special characters in the data. I found information on "<!]>", UTF-8, XML DOM,
3
17205
by: Barry Olly | last post by:
Hi, I'm working on a mini content management system and need help with dealing with special characters. The input are taken from html form which are then stored into a varchar column in oracle database. When i retrieve the data, some of the special characters have been changed to ??? and also
4
5264
by: Ewok | last post by:
let me just say. it's not by choice but im dealing with a .net web app (top down approach with VB and a MySQL database) sigh..... Anyhow, I've just about got all the kinks worked out but I am having trouble preserving data as it gets entered into the database. Primarily, quotes and special characters. Spcifically, I noticed it stripped out some double quotes and a "Registered" symbol &reg; (not the ascii but the actual character"
5
8630
by: Sakharam Phapale | last post by:
Hi All, I am using an API function, which takes file path as an input. When file path contains special characters (@,#,$,%,&,^, etc), API function gives an error as "Unable to open input file". Same file path containing special characters works fine in one machine, but doesn't work in other. I am using following API function to get short file path. Declare Auto Function GetShortPathName Lib "kernel32" (ByVal lpszLongPath As
25
5381
by: Wim Cossement | last post by:
Hello, I was wondering if there are a few good pages and/or examples on how to process form data correctly for putting it in a MySQL DB. Since I'm not used to using PHP a lot, I already found out that addslashes() can be used escape some characters, but I'm having some more problems with for instance ä, å and µ (since the text is scientifical) Now some people also throw in htmlspecialchars() to convert those to HTML entities, but some...
11
10227
by: ronrsr | last post by:
I have an MySQL database called zingers. The structure is: zid - integer, key, autoincrement keyword - varchar citation - text quotation - text the encoding and collation is utf-8
1
3108
by: rogoflap | last post by:
I have some regular text I export to a word document. I build this in VBA in Access and want dump it into word. I can do this, but would like to know how I can turn on an off bolding or underlining using special characters or combinations of these. Is this possible to build this in with my text string that is put into a word document? Where can I get a list of these types of special characters.
0
8792
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
9266
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9209
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8215
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6754
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4570
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4826
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2748
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2193
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.