473,322 Members | 1,510 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

sprintf for utf8 formated characters

Anyone know if the standard sprintf supports utf8 characters that
extend beyond the normal ascii characters?

Thanks!
Jun 27 '08 #1
4 15589
Ma*********@gmail.com writes:
Anyone know if the standard sprintf supports utf8 characters that
extend beyond the normal ascii characters?
That depends on what you mean by "support". If you do thing like:

sprintf(buf, "%s", some_string);

(but never do that unless you are sure buf has enough space) or
something along the lines of:

sprintf(buf, format, arg1, arg2 /* ... */);

(of course be sure format is valid and buf has enough space) and all
strings are UTF-8 encoded you'll get UTF-8 encoded string in the end.
This is guaranteed because UTF-8 is designed in such a way that NUL
bytes never occur in sequences encoding other characters.

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl>--<jid:mina86*jabber.org>--ooO--(_)--Ooo--
Jun 27 '08 #2
On Apr 14, 2:39 pm, Michal Nazarewicz <min...@tlen.plwrote:
Mandrago...@gmail.com writes:
Anyone know if the standard sprintf supports utf8 characters that
extend beyond the normal ascii characters?

That depends on what you mean by "support". If you do thing like:

sprintf(buf, "%s", some_string);

(but never do that unless you are sure buf has enough space) or
something along the lines of:

sprintf(buf, format, arg1, arg2 /* ... */);

(of course be sure format is valid and buf has enough space) and all
strings are UTF-8 encoded you'll get UTF-8 encoded string in the end.
This is guaranteed because UTF-8 is designed in such a way that NUL
bytes never occur in sequences encoding other characters.

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl>--<jid:mina86*jabber.org>--ooO--(_)--Ooo--
Thanks for the reply.

What is still left unanswered is whether I can put utf-8 strings (ie
they have characters that take up to 4 bytes of space) and sprint f
that into a string without screwing up the byts of data. So something
like this:

unsigned int myVar= 0xDB0;

convertMyVarToUTF8(myVar);

char buff[512];

sprintf( buff, "Long string with %u", myVar);

is there a legitimate UTF-8 string in buff at this point?

Thanks!

Mandragon

Jun 27 '08 #3
On 15 avr, 23:02, Mandrago...@gmail.com wrote:
On Apr 14, 2:39 pm, Michal Nazarewicz <min...@tlen.plwrote:
[...]
What is still left unanswered is whether I can put utf-8 strings (ie
they have characters that take up to 4 bytes of space) and sprint f
that into a string without screwing up the byts of data. So something
like this:
unsigned int myVar= 0xDB0;
convertMyVarToUTF8(myVar);
char buff[512];
sprintf( buff, "Long string with %u", myVar);
is there a legitimate UTF-8 string in buff at this point?
If the native encoding of narrow character strings is ASCII, or
an encoding which uses ASCII for its lower 128 code points, yes.
Because "%u" will only generated characters in the range
[0-9a-f], and all of those characters have the same encoding in
ASCII and in UTF-8.

However, I suspect that the function convertMyVarToUTF8 is
supposed to do something. But I don't see what, and I don't see
what it could do which would affect the results here.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #4
Ma*********@gmail.com writes:
What is still left unanswered is whether I can put utf-8 strings (ie
they have characters that take up to 4 bytes of space) and sprint f
that into a string without screwing up the byts of data. So something
like this:

unsigned int myVar= 0xDB0;
convertMyVarToUTF8(myVar);
char buff[512];
sprintf( buff, "Long string with %u", myVar);

is there a legitimate UTF-8 string in buff at this point?
Are you sure you meant that code? I suspect you meant something like:

#v+
char *utf8char(unsigned long code);
char buf[512];
sprintf(buf, "Long string with %s", utf8char(0xDB0));
#v-

Where "utf8char" converts given code to it's UTF-8 representation and
follows it by NUL byte returning pointer to first byte of the sequence.
If CHAR_BIT==8 and strings literals use ASCII codes for all alphanumeric
characters then in the end buf will contain a valid UTF-8 encoded
string.

Basically, if your implementation doesn't do anything funky with string
literals you can use UTF-8 encoded strings almost like any other
strings. The thing you'll have to remember is that some characters take
up more then one byte so ie. strlen() won't return string length, and
foo[10] won't necessarily get you the 11th character.

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl>--<jid:mina86*jabber.org>--ooO--(_)--Ooo--
Jun 27 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
by: Pikkel | last post by:
i'm looking for a way to replace special characters with characters without accents, cedilles, etc.
6
by: Spamtrap | last post by:
I only work in Perl occasionaly, and have been searching for a solution for a conversion, and everything I found seems much too complex. All I need to do is take a simple text file and copy...
1
by: ryang | last post by:
I am trying to understand how to work with Unicode in Perl. I have read the relevant man pages (perluniintro, perlunicode, etc.) and have written severl scripts to test/verifiy my understanding. ...
21
by: pramod | last post by:
Two different platforms communicate over protocols which consist of functions and arguments in ascii form. System might be little endian/big endian. It is possible to format string using sprintf...
0
by: Sean | last post by:
I have a MySQL 4.1.11 database, table and table columns all configured as utf8 as I need to accept data in a number of languages. The MySQL database is hosted so I use SET NAMES utf8 in the...
4
by: chris_fieldhouse | last post by:
Hi, I'm almost done with a php driven email filter and automated forwarder, I've tested it out with various emails and ironed out plain text and html. But this final item has me stumped. ...
2
by: Jason | last post by:
Hi, I was wondering if anyone could advise me on this. Right now I am setting up a DB2 UDB V8.2.3 database with UTF8 character set, which will work with a J2EE application running on...
15
by: krister | last post by:
Hello, I'm working in a quite large system that has some limitations. One of those is that I can't use printf() to get an output on a screen. I'm forced to use a special function, let's call it...
173
by: Ron Ford | last post by:
I'm looking for a freeware c99 compiler for windows. I had intended to use MS's Visual C++ Express and use its C capability. In the past with my MS products, I've simply needed to make .c the...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.