arabic_caracters

devdris

Please, help me to print text with arabic font using lccwin32 C
compiler.

Thanks.

Driss

Jul 14 '06 #1

Subscribe Post Reply

1910

Clever Monkey

devdris wrote:

Please, help me to print text with arabic font using lccwin32 C
compiler.

Not really enough information here, but if you need this sort of
processing you need to think about wide or multi-byte characters. I
assume that this compiler toolchain supports and documents the
implementation wide/mb chars.

Jul 14 '06 #2

Simon Biber

Clever Monkey wrote:

devdris wrote:
>Please, help me to print text with arabic font using lccwin32 C
compiler.

Not really enough information here, but if you need this sort of
processing you need to think about wide or multi-byte characters. I
assume that this compiler toolchain supports and documents the
implementation wide/mb chars.

Not necessarily. Legacy character sets like Windows 1256, ISO 8859-6,
IBM-864 and MacArabic are all single byte character sets. I would
recommend going with UTF-8 in most cases though, which is a multibyte
character set. Or you could use UTF-16 or UTF-32 which work with wide
characters.

--
Simon.

Jul 14 '06 #3

Clever Monkey

Simon Biber wrote:

Clever Monkey wrote:
>devdris wrote:
>>Please, help me to print text with arabic font using lccwin32 C
compiler.

Not really enough information here, but if you need this sort of
processing you need to think about wide or multi-byte characters. I
assume that this compiler toolchain supports and documents the
implementation wide/mb chars.

Not necessarily. Legacy character sets like Windows 1256, ISO 8859-6,
IBM-864 and MacArabic are all single byte character sets. I would
recommend going with UTF-8 in most cases though, which is a multibyte
character set. Or you could use UTF-16 or UTF-32 which work with wide
characters.

I was suggesting UTF-8 was the way to go. This means wide chars, correct?

Jul 14 '06 #4

Simon Biber

Clever Monkey wrote:

I was suggesting UTF-8 was the way to go. This means wide chars, correct?

No, it doesn't mean wide chars necessarily.

UTF-8 data is generally stored as strings in C (arrays of char
terminated by a null character).

A UTF-8 data stream may or may not have multi-byte characters. The size
of each character can vary. However, ASCII characters from 0 to 127
always occupy a single byte. Any byte in a UTF-8 data stream that has a
value from 0 to 127 must be a single character, not part of a multi-byte
character. Thus the null character ('\0') can still be used in the
normal way to terminate a string. The strlen() function is useful for
determining the number of bytes that a UTF-8 string takes, but not the
number of characters.

Functions like isalpha() or tolower() are no longer useful for UTF-8
because they need to operate on more than one byte at a time. Converting
a character from upper to lower case or vice versa may even change the
number of bytes that a particular character takes up.

Here's how I would go about converting the UTF-8 character "A" to
lowercase, on a system where there is a locale available such that
multibyte encoding is UTF-8.

/* The locale name in the line below must correspond
to a valid UTF-8 locale on your implementation */
setlocale(LC_ALL, "en_US.UTF-8");

/* utf8 array contains the string "A" with enough space to
store any multibyte character plus the null character */
char utf8[MB_CUR_MAX + 1] = "A";

/* the tmp variable will contain the wide character */
wchar_t tmp;

/* The first multibyte character found in utf8 is
converted to a wide character and stored in tmp */
mbtowc(&tmp, utf8, strlen(utf8));

/* tmp is replaced by a lowercase version of itself */
tmp = towlower(tmp);

/* tmp is converted to a multibyte character sequence
and stored in utf8, followed by a null character */
utf8[wctomb(utf8, tmp)] = 0;

I believe there is no issue with utf8 being written to twice in the
statement above, as there is a sequence point just before the return of
any library functions.

--
Simon.

Jul 14 '06 #5

Dik T. Winter

In article <44**********@news.peopletelecom.com.auSimon Biber <ne**@ralmin.ccwrites:

A UTF-8 data stream may or may not have multi-byte characters. The size
of each character can vary. However, ASCII characters from 0 to 127
always occupy a single byte. Any byte in a UTF-8 data stream that has a
value from 0 to 127 must be a single character, not part of a multi-byte
character. Thus the null character ('\0') can still be used in the
normal way to terminate a string. The strlen() function is useful for
determining the number of bytes that a UTF-8 string takes, but not the
number of characters.

size_t utf8strlen(const char *s) {
size_t l = 0;
while(*s != 0) {
if((*s & 0x0c0) != 0x080) l++;
s++;
}
return l;
}

But this code allows for representations that are not formally allowed in
UTF-8.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Jul 14 '06 #6