By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,406 Members | 1,055 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,406 IT Pros & Developers. It's quick & easy.

isprint() equivalent for ISO-8859-15?

P: n/a
Hi,

I need to be able to tell if a character in the ISO-8859-15 codeset
(i.e. 8-bit ASCII, incorporating things like accented 'a' or euro currency
symbol) is printable. Ordinarily I'd use isprint(), but obviously this is
only going to work for 7-bit "true ASCII" characters.

I've thought of running the program in a different locale, but I want to
catch *all* printable characters, not only those used by a specific
locale. It'd be nice for the code to be as portable as possible, too.

I could just write a macro to range-check the character (it's not like the
standard's going to change), but I'd prefer a cleaner way if possible!

All help gratefully appreciated.

Cheers
- Ian

--
Ian Chard <ia*@tanagra.demon.co.uk>
"Resting" Unix systems administrator and RHCE
near Linlithgow, central Scotland

Nov 13 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
"Ian Chard" <ia*@tanagra.demon.co.uk> writes:
I need to be able to tell if a character in the ISO-8859-15 codeset
(i.e. 8-bit ASCII, incorporating things like accented 'a' or euro currency
symbol) is printable.
#define IS_ISO_8859_15(c) \
(((unsigned char)(c) >= 0x20 && (unsigned char)(c) < 0x7F) \
|| ((unsigned char)(c) >= 0xA0 && (unsigned char)(c) <= 0xFF))

or the function equivalent if a macro is not acceptable.
Ordinarily I'd use isprint(), but obviously this is only going to work
for 7-bit "true ASCII" characters.
Actually, it is locale dependent how isprint behaves.
I've thought of running the program in a different locale, but I want to
catch *all* printable characters, not only those used by a specific
locale. It'd be nice for the code to be as portable as possible, too.
If you restrict yourself to a specific character encoding (ISO-8859-15),
the code will obviously not run correctly on systems with incompatible
character encodings. Therefore, it is by definition not portable.

If you want the most portable solution, use `isprint'.
I could just write a macro to range-check the character (it's not like the
standard's going to change), but I'd prefer a cleaner way if possible!


If you want to check for a printable character in a portable way, use
`isprint'. If you want to check for a character printable in ISO-8859-15
encoding, use something like the macro above.

Martin
Nov 13 '05 #2

P: n/a
"Ian Chard" <ia*@tanagra.demon.co.uk> wrote in message
news:pa****************************@tanagra.demon. co.uk...
I need to be able to tell if a character in the ISO-8859-15 codeset [snip] I've thought of running the program in a different locale, but I want to
catch *all* printable characters, not only those used by a specific
locale.


A printable character in one locale may not be in another. In other words,
the concept of a printable character is inherently locale-specific.

You seem to be asking for the logical "or" of isprint() return values for
all possible locales - this does not make sense at all. Perhaps I
misunderstand what you are asking.

Alex
Nov 13 '05 #3

P: n/a
In <bp*************@news.t-online.com> Martin Dickopp <ex****************@zero-based.org> writes:
"Ian Chard" <ia*@tanagra.demon.co.uk> writes:
I need to be able to tell if a character in the ISO-8859-15 codeset
(i.e. 8-bit ASCII, incorporating things like accented 'a' or euro currency
symbol) is printable.


#define IS_ISO_8859_15(c) \
(((unsigned char)(c) >= 0x20 && (unsigned char)(c) < 0x7F) \
|| ((unsigned char)(c) >= 0xA0 && (unsigned char)(c) <= 0xFF))


The casts to unsigned char are wrong, drop them. They make
IS_ISO_8859_15(288) return true on any implementation with
UCHAR_MAX == 256.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #4

P: n/a
Da*****@cern.ch (Dan Pop) writes:
In <bp*************@news.t-online.com> Martin Dickopp <ex****************@zero-based.org> writes:
"Ian Chard" <ia*@tanagra.demon.co.uk> writes:
I need to be able to tell if a character in the ISO-8859-15 codeset
(i.e. 8-bit ASCII, incorporating things like accented 'a' or euro currency
symbol) is printable.


#define IS_ISO_8859_15(c) \
(((unsigned char)(c) >= 0x20 && (unsigned char)(c) < 0x7F) \
|| ((unsigned char)(c) >= 0xA0 && (unsigned char)(c) <= 0xFF))


The casts to unsigned char are wrong, drop them. They make
IS_ISO_8859_15(288) return true on any implementation with
UCHAR_MAX == 256.


Why does that make them wrong? I would expect the above macro to
be documented similar to "expects a char [or unsigned char] as
argument", in which case a caller giving an argument of 288
should /expect/ undefined results in the situation you've
described.

--
Micah J. Cowan
mi***@cowan.name
Nov 13 '05 #5

P: n/a
In article <m3************@localhost.localdomain> Micah Cowan <mi***@cowan.name> writes:
Da*****@cern.ch (Dan Pop) writes: ....
#define IS_ISO_8859_15(c) \
(((unsigned char)(c) >= 0x20 && (unsigned char)(c) < 0x7F) \
|| ((unsigned char)(c) >= 0xA0 && (unsigned char)(c) <= 0xFF))


The casts to unsigned char are wrong, drop them. They make
IS_ISO_8859_15(288) return true on any implementation with
UCHAR_MAX == 256.


Why does that make them wrong? I would expect the above macro to
be documented similar to "expects a char [or unsigned char] as
argument",


That is *not* the wording for "isprint" and friends. There the argument is
an int with a value that is representable as unsigned char, or is EOF.
IS_ISO_8859_15(EOF) most likely will yield true.
in which case a caller giving an argument of 288
should /expect/ undefined results in the situation you've
described.


Note that "isascii" is a common extension which expects an arbitrary
integer. Your macro more resembles isprint. And wouldn't it be easier
to just use "setlocale"?
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Nov 13 '05 #6

P: n/a
In <m3************@localhost.localdomain> Micah Cowan <mi***@cowan.name> writes:
Da*****@cern.ch (Dan Pop) writes:
In <bp*************@news.t-online.com> Martin Dickopp <ex****************@zero-based.org> writes:
>"Ian Chard" <ia*@tanagra.demon.co.uk> writes:
>
>> I need to be able to tell if a character in the ISO-8859-15 codeset
>> (i.e. 8-bit ASCII, incorporating things like accented 'a' or euro currency
>> symbol) is printable.
>
>#define IS_ISO_8859_15(c) \
> (((unsigned char)(c) >= 0x20 && (unsigned char)(c) < 0x7F) \
> || ((unsigned char)(c) >= 0xA0 && (unsigned char)(c) <= 0xFF))


The casts to unsigned char are wrong, drop them. They make
IS_ISO_8859_15(288) return true on any implementation with
UCHAR_MAX == 256.


Why does that make them wrong? I would expect the above macro to
be documented similar to "expects a char [or unsigned char] as
argument", in which case a caller giving an argument of 288
should /expect/ undefined results in the situation you've
described.


Since I haven't seen the documentation of IS_ISO_8859_15, I would expect
it to do the job advertised by its name.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #7

P: n/a
"Dik T. Winter" <Di********@cwi.nl> writes:
In article <m3************@localhost.localdomain> Micah Cowan <mi***@cowan.name> writes:
> Da*****@cern.ch (Dan Pop) writes: ...
> > >#define IS_ISO_8859_15(c) \
> > > (((unsigned char)(c) >= 0x20 && (unsigned char)(c) < 0x7F) \
> > > || ((unsigned char)(c) >= 0xA0 && (unsigned char)(c) <= 0xFF))
> >
> > The casts to unsigned char are wrong, drop them. They make
> > IS_ISO_8859_15(288) return true on any implementation with
> > UCHAR_MAX == 256.

>
> Why does that make them wrong? I would expect the above macro to
> be documented similar to "expects a char [or unsigned char] as
> argument",


That is *not* the wording for "isprint" and friends. There the argument is
an int with a value that is representable as unsigned char, or is EOF.
IS_ISO_8859_15(EOF) most likely will yield true.


I didn't say it was. The above is not isprint(). I do agree that
it should probably act the same way as isprint()--but it wouldn't
have to. Handling 288 correctly, though (assuming it's out of
range for an unsigned char) is not something one should expect of
isprint(), either.
> in which case a caller giving an argument of 288
> should /expect/ undefined results in the situation you've
> described.


Note that "isascii" is a common extension which expects an arbitrary
integer. Your macro more resembles isprint. And wouldn't it be easier
to just use "setlocale"?


Not my macro. And I agree about setlocale().

--
Micah J. Cowan
mi***@cowan.name
Nov 13 '05 #8

P: n/a
Da*****@cern.ch (Dan Pop) writes:
In <m3************@localhost.localdomain> Micah Cowan <mi***@cowan.name> writes:
Why does that make them wrong? I would expect the above macro to
be documented similar to "expects a char [or unsigned char] as
argument", in which case a caller giving an argument of 288
should /expect/ undefined results in the situation you've
described.


Since I haven't seen the documentation of IS_ISO_8859_15, I would expect
it to do the job advertised by its name.


Fair enough: perhaps this should serve as a reminder to all that,
no matter how small their example, they should always specify
exactly *how* it is meant be used, even if you think it should be
obvious from reading.

--
Micah J. Cowan
mi***@cowan.name
Nov 13 '05 #9

P: n/a
in comp.lang.c i read:
I need to be able to tell if a character in the ISO-8859-15 codeset
(i.e. 8-bit ASCII, incorporating things like accented 'a' or euro currency
symbol) is printable. Ordinarily I'd use isprint(), but obviously this is
only going to work for 7-bit "true ASCII" characters.
isprint will work fine if the locale is set appropriately.
I've thought of running the program in a different locale, but I want to
catch *all* printable characters, not only those used by a specific
locale.


on most systems a uni/single-byte character sequence cannot compose all
possible printable characters, char having a range well below that
necessary for even the small number of unique characters in the iso-8859-x
repertoires. for this to be possible you need multi-byte character
sequences.

yet if your stream contains mbcs then isprint is totally inappropriate, you
need to convert to a wchar_t or wint_t and use iswprint, and the conversion
demands that the locale be set appropriately.

--
a signature
Nov 13 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.