By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,621 Members | 1,068 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,621 IT Pros & Developers. It's quick & easy.

character-filtering and Word (& company)

P: n/a
I'm working on text-handling programs that want plain-text files as
input. It's fine to tell users to feed the programs with plain-text
only, but not all users know what this means, even after you explain
it, or they forget. So it would be nice to be able to handle gracefully
the stuff that MS Word (or any word-processor) puts into a file.
Inserting a 0-127 filter is easy but not very friendly. Typically, the
w.p. file loads OK (into a wx.StyledTextCtrl a.k.a Scintilla editing
pane), and mostly be readable. Just a few characters will be wrong:
"smart" quotation marks and the like.

Is there some well-known way to filter or translate this w.p. garbage?
I don't know whether encodings are relevant; I don't know what encoding
an MSW file uses. I don't see how to use s.translate() because I don't
know how to predict what the incoming format will be.

Any hints welcome.

Charles Hartman

Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Charles Hartman <ch*************@conncoll.edu> writes:
I'm working on text-handling programs that want plain-text files as
input. It's fine to tell users to feed the programs with plain-text
only, but not all users know what this means, even after you explain
it, or they forget. So it would be nice to be able to handle
gracefully the stuff that MS Word (or any word-processor) puts into a
file. Inserting a 0-127 filter is easy but not very
friendly. Typically, the w.p. file loads OK (into a wx.StyledTextCtrl
a.k.a Scintilla editing pane), and mostly be readable. Just a few
characters will be wrong: "smart" quotation marks and the like.

Is there some well-known way to filter or translate this w.p. garbage?
I don't know whether encodings are relevant;
Bingo. You need to figure out the encoding before you can do
intelligent translation of the non-ASCII characters in the text.
I don't know what encoding an MSW file uses.


Different WPs will use different encodings. Especially when you start
working in a cross-platform environment.

I don't know that there is a good solution to this problem. It
certainly hasn't been sovled on the web.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jul 18 '05 #2

P: n/a

Charles Hartman wrote:
I'm working on text-handling programs that want plain-text files as
input. It's fine to tell users to feed the programs with plain-text
only, but not all users know what this means, even after you explain
it, or they forget. So it would be nice to be able to handle gracefully the stuff that MS Word (or any word-processor) puts into a file.
Inserting a 0-127 filter is easy but not very friendly. Typically, the w.p. file loads OK (into a wx.StyledTextCtrl a.k.a Scintilla editing
pane), and mostly be readable. Just a few characters will be wrong:
"smart" quotation marks and the like.

Is there some well-known way to filter or translate this w.p. garbage? I don't know whether encodings are relevant; I don't know what encoding an MSW file uses. I don't see how to use s.translate() because I don't know how to predict what the incoming format will be.

Any hints welcome.


This may help: http://wvware.sourceforge.net/

[not a recommendation, I've never used it]

Jul 18 '05 #3

P: n/a
In article <11**********************@z14g2000cwz.googlegroups .com>,
John Machin <sj******@lexicon.net> wrote:

Charles Hartman wrote:
I'm working on text-handling programs that want plain-text files as
input. It's fine to tell users to feed the programs with plain-text
only, but not all users know what this means, even after you explain
it, or they forget. So it would be nice to be able to handle

gracefully
the stuff that MS Word (or any word-processor) puts into a file.
Inserting a 0-127 filter is easy but not very friendly. Typically,

the
w.p. file loads OK (into a wx.StyledTextCtrl a.k.a Scintilla editing
pane), and mostly be readable. Just a few characters will be wrong:
"smart" quotation marks and the like.

Is there some well-known way to filter or translate this w.p.

garbage?
I don't know whether encodings are relevant; I don't know what

encoding
an MSW file uses. I don't see how to use s.translate() because I

don't
know how to predict what the incoming format will be.

Any hints welcome.


This may help: http://wvware.sourceforge.net/

[not a recommendation, I've never used it]


As Mike Meyer wrote, there is *not* standardization. wvWare is
indeed useful. Before you go farther, though, I want to empha-
size to you what a challenge this is. While it sounds simple to
users to collect their writings through a Web interface, this
turns out to present difficulties that go on and on. Anything
you can do to structure the problem helps.

One minor variation that can help is to expose TEXTAREAs or
equivalent, and ask users to cut-and-paste their content into
them. In some situations, that's surprisingly effective.
Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.