473,387 Members | 1,536 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

arabic_caracters

Please, help me to print text with arabic font using lccwin32 C
compiler.

Thanks.

Driss

Jul 14 '06 #1
5 1910
devdris wrote:
Please, help me to print text with arabic font using lccwin32 C
compiler.
Not really enough information here, but if you need this sort of
processing you need to think about wide or multi-byte characters. I
assume that this compiler toolchain supports and documents the
implementation wide/mb chars.
Jul 14 '06 #2
Clever Monkey wrote:
devdris wrote:
>Please, help me to print text with arabic font using lccwin32 C
compiler.
Not really enough information here, but if you need this sort of
processing you need to think about wide or multi-byte characters. I
assume that this compiler toolchain supports and documents the
implementation wide/mb chars.
Not necessarily. Legacy character sets like Windows 1256, ISO 8859-6,
IBM-864 and MacArabic are all single byte character sets. I would
recommend going with UTF-8 in most cases though, which is a multibyte
character set. Or you could use UTF-16 or UTF-32 which work with wide
characters.

--
Simon.
Jul 14 '06 #3
Simon Biber wrote:
Clever Monkey wrote:
>devdris wrote:
>>Please, help me to print text with arabic font using lccwin32 C
compiler.
Not really enough information here, but if you need this sort of
processing you need to think about wide or multi-byte characters. I
assume that this compiler toolchain supports and documents the
implementation wide/mb chars.

Not necessarily. Legacy character sets like Windows 1256, ISO 8859-6,
IBM-864 and MacArabic are all single byte character sets. I would
recommend going with UTF-8 in most cases though, which is a multibyte
character set. Or you could use UTF-16 or UTF-32 which work with wide
characters.
I was suggesting UTF-8 was the way to go. This means wide chars, correct?
Jul 14 '06 #4
Clever Monkey wrote:
I was suggesting UTF-8 was the way to go. This means wide chars, correct?
No, it doesn't mean wide chars necessarily.

UTF-8 data is generally stored as strings in C (arrays of char
terminated by a null character).

A UTF-8 data stream may or may not have multi-byte characters. The size
of each character can vary. However, ASCII characters from 0 to 127
always occupy a single byte. Any byte in a UTF-8 data stream that has a
value from 0 to 127 must be a single character, not part of a multi-byte
character. Thus the null character ('\0') can still be used in the
normal way to terminate a string. The strlen() function is useful for
determining the number of bytes that a UTF-8 string takes, but not the
number of characters.

Functions like isalpha() or tolower() are no longer useful for UTF-8
because they need to operate on more than one byte at a time. Converting
a character from upper to lower case or vice versa may even change the
number of bytes that a particular character takes up.

Here's how I would go about converting the UTF-8 character "A" to
lowercase, on a system where there is a locale available such that
multibyte encoding is UTF-8.

/* The locale name in the line below must correspond
to a valid UTF-8 locale on your implementation */
setlocale(LC_ALL, "en_US.UTF-8");

/* utf8 array contains the string "A" with enough space to
store any multibyte character plus the null character */
char utf8[MB_CUR_MAX + 1] = "A";

/* the tmp variable will contain the wide character */
wchar_t tmp;

/* The first multibyte character found in utf8 is
converted to a wide character and stored in tmp */
mbtowc(&tmp, utf8, strlen(utf8));

/* tmp is replaced by a lowercase version of itself */
tmp = towlower(tmp);

/* tmp is converted to a multibyte character sequence
and stored in utf8, followed by a null character */
utf8[wctomb(utf8, tmp)] = 0;

I believe there is no issue with utf8 being written to twice in the
statement above, as there is a sequence point just before the return of
any library functions.

--
Simon.
Jul 14 '06 #5
In article <44**********@news.peopletelecom.com.auSimon Biber <ne**@ralmin.ccwrites:
A UTF-8 data stream may or may not have multi-byte characters. The size
of each character can vary. However, ASCII characters from 0 to 127
always occupy a single byte. Any byte in a UTF-8 data stream that has a
value from 0 to 127 must be a single character, not part of a multi-byte
character. Thus the null character ('\0') can still be used in the
normal way to terminate a string. The strlen() function is useful for
determining the number of bytes that a UTF-8 string takes, but not the
number of characters.
size_t utf8strlen(const char *s) {
size_t l = 0;
while(*s != 0) {
if((*s & 0x0c0) != 0x080) l++;
s++;
}
return l;
}

But this code allows for representations that are not formally allowed in
UTF-8.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Jul 14 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.