On Tue, 16 Dec 2003, Stan Brown wrote:
You need to tell Lynx which character set you're using,
Yes - which *display* character set you're using, to be exact (this is
separate from the character encoding of the incoming HTML document:
Lynx does its best to map between the two).
You didn't tell us anything about your system,
Indeed. The observations seemed to me to be suggestive of Lynx
running in a DOS-type window being used under Windows, but there are
other possibilities that fit the facts...
so I have no idea whether my setting
CHARACTER_SET:c p437
will work for you.
Background:
CP437 is the original US-specific MS-DOS code page. It doesn't
contain the complete Latin-1 character repertoire.
As I understand it, Windows (ever since the production release of
Win95) has used CP850 - the DOS Latin-1 code page - for DOS windows
used in Windows. At least that's true in the European context, and I
don't _think_ they've crippled the US users by sticking to CP437 there
(437 had been used in the Win95 pre-production beta, and drawn quite
a lot of adverse comment).
Practical:
If one is using a CP850 display character set, then obviously the
results will be better by telling Lynx that it is so. On the other
hand, if one is using CP437 display, then one should still tell Lynx
the truth, and it will do its best, but some of the Latin-1 repertoire
characters will not display correctly. Sorry, I forget what those
were exactly, but they can be looked-up at the Unicode site:
http://www.unicode.org/Public/MAPPIN...ORS/MICSFT/PC/
The easiest approach might be to visit some kind of bona fide* test
page for the repertoire - for example I have
http://ppewww.ph.gla.ac.uk/~flavell/...unidata00.html as part of
my unicode test pages, or the old
http://ppewww.ph.gla.ac.uk/~flavell/.../isotable.html iso-8859-1
-specific table. Then play with the Lynx "display character set"
setting until the best match is achieved, concentrating on cp850 or
cp437 if the problematic setting had been iso-8859-1, or trying
iso-8859-1 and/or windows-1252 if the problematic setting had been
cp437 or cp850.
I'd be interested to hear whether you see differences between 437 and
850 as your display charset setting, and which gives more accurate
results.
*) if you come across a page which purports to display visible
characters for code positions 128 to 159 decimal (80 to 9f hex) as if
they were a normal and proper part of iso-8859-1, then it's bogus.
Also, I have an ancient web page on the topic of running Lynx in
DOS-type environments, which may be partially useful on this topic
(the stuff in there about DOS-compatible packet drivers is irrelevant
to this).
[note crossposted and f'ups set]
cheers