473,654 Members | 3,066 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

sprintf for utf8 formated characters

Anyone know if the standard sprintf supports utf8 characters that
extend beyond the normal ascii characters?

Thanks!
Jun 27 '08 #1
4 15675
Ma*********@gma il.com writes:
Anyone know if the standard sprintf supports utf8 characters that
extend beyond the normal ascii characters?
That depends on what you mean by "support". If you do thing like:

sprintf(buf, "%s", some_string);

(but never do that unless you are sure buf has enough space) or
something along the lines of:

sprintf(buf, format, arg1, arg2 /* ... */);

(of course be sure format is valid and buf has enough space) and all
strings are UTF-8 encoded you'll get UTF-8 encoded string in the end.
This is guaranteed because UTF-8 is designed in such a way that NUL
bytes never occur in sequences encoding other characters.

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl >--<jid:mina86*jab ber.org>--ooO--(_)--Ooo--
Jun 27 '08 #2
On Apr 14, 2:39 pm, Michal Nazarewicz <min...@tlen.pl wrote:
Mandrago...@gma il.com writes:
Anyone know if the standard sprintf supports utf8 characters that
extend beyond the normal ascii characters?

That depends on what you mean by "support". If you do thing like:

sprintf(buf, "%s", some_string);

(but never do that unless you are sure buf has enough space) or
something along the lines of:

sprintf(buf, format, arg1, arg2 /* ... */);

(of course be sure format is valid and buf has enough space) and all
strings are UTF-8 encoded you'll get UTF-8 encoded string in the end.
This is guaranteed because UTF-8 is designed in such a way that NUL
bytes never occur in sequences encoding other characters.

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl >--<jid:mina86*jab ber.org>--ooO--(_)--Ooo--
Thanks for the reply.

What is still left unanswered is whether I can put utf-8 strings (ie
they have characters that take up to 4 bytes of space) and sprint f
that into a string without screwing up the byts of data. So something
like this:

unsigned int myVar= 0xDB0;

convertMyVarToU TF8(myVar);

char buff[512];

sprintf( buff, "Long string with %u", myVar);

is there a legitimate UTF-8 string in buff at this point?

Thanks!

Mandragon

Jun 27 '08 #3
On 15 avr, 23:02, Mandrago...@gma il.com wrote:
On Apr 14, 2:39 pm, Michal Nazarewicz <min...@tlen.pl wrote:
[...]
What is still left unanswered is whether I can put utf-8 strings (ie
they have characters that take up to 4 bytes of space) and sprint f
that into a string without screwing up the byts of data. So something
like this:
unsigned int myVar= 0xDB0;
convertMyVarToU TF8(myVar);
char buff[512];
sprintf( buff, "Long string with %u", myVar);
is there a legitimate UTF-8 string in buff at this point?
If the native encoding of narrow character strings is ASCII, or
an encoding which uses ASCII for its lower 128 code points, yes.
Because "%u" will only generated characters in the range
[0-9a-f], and all of those characters have the same encoding in
ASCII and in UTF-8.

However, I suspect that the function convertMyVarToU TF8 is
supposed to do something. But I don't see what, and I don't see
what it could do which would affect the results here.

--
James Kanze (GABI Software) email:ja******* **@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #4
Ma*********@gma il.com writes:
What is still left unanswered is whether I can put utf-8 strings (ie
they have characters that take up to 4 bytes of space) and sprint f
that into a string without screwing up the byts of data. So something
like this:

unsigned int myVar= 0xDB0;
convertMyVarToU TF8(myVar);
char buff[512];
sprintf( buff, "Long string with %u", myVar);

is there a legitimate UTF-8 string in buff at this point?
Are you sure you meant that code? I suspect you meant something like:

#v+
char *utf8char(unsig ned long code);
char buf[512];
sprintf(buf, "Long string with %s", utf8char(0xDB0) );
#v-

Where "utf8char" converts given code to it's UTF-8 representation and
follows it by NUL byte returning pointer to first byte of the sequence.
If CHAR_BIT==8 and strings literals use ASCII codes for all alphanumeric
characters then in the end buf will contain a valid UTF-8 encoded
string.

Basically, if your implementation doesn't do anything funky with string
literals you can use UTF-8 encoded strings almost like any other
strings. The thing you'll have to remember is that some characters take
up more then one byte so ie. strlen() won't return string length, and
foo[10] won't necessarily get you the 11th character.

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl >--<jid:mina86*jab ber.org>--ooO--(_)--Ooo--
Jun 27 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
47811
by: Pikkel | last post by:
i'm looking for a way to replace special characters with characters without accents, cedilles, etc.
6
18321
by: Spamtrap | last post by:
I only work in Perl occasionaly, and have been searching for a solution for a conversion, and everything I found seems much too complex. All I need to do is take a simple text file and copy it, however some specific lines are in fact in UTF8 as printed garbagy characters and they need to be converted to Unicode, so that the new text file can be imported into a desktop program and into some Word documents. For the moment I would be...
1
3963
by: ryang | last post by:
I am trying to understand how to work with Unicode in Perl. I have read the relevant man pages (perluniintro, perlunicode, etc.) and have written severl scripts to test/verifiy my understanding. However, I created a script that has unexpected output. The script is below and it contains some UTF-8 encoded characters which represent all five Spanish accented vowels plus the enye (n with a tilde over it) in upper and lower case. I hope...
21
674
by: pramod | last post by:
Two different platforms communicate over protocols which consist of functions and arguments in ascii form. System might be little endian/big endian. It is possible to format string using sprintf and retreive it using sscanf. Each parameter has a delimiter, data type size is ported to the platform, and expected argument order is known. Is this approach portable w.r.t. endianess ?
0
2804
by: Sean | last post by:
I have a MySQL 4.1.11 database, table and table columns all configured as utf8 as I need to accept data in a number of languages. The MySQL database is hosted so I use SET NAMES utf8 in the connection string in ASP e.g. sCon = "dsn=mydsn;uid=user;pwd=pass;stmt=set names utf8;option=3;". The ASP pages are all charset utf8. Now the ASP pages *seem* to work fine - I add some test characters (for example special Turkish characters) from...
4
2408
by: chris_fieldhouse | last post by:
Hi, I'm almost done with a php driven email filter and automated forwarder, I've tested it out with various emails and ironed out plain text and html. But this final item has me stumped. When processing an email which contains UTF8 encoded characters, I can't work out how to detect the presence of the UTF8 characters, so I get =E2=80=99 displayed instead of a '.
2
10295
by: Jason | last post by:
Hi, I was wondering if anyone could advise me on this. Right now I am setting up a DB2 UDB V8.2.3 database with UTF8 character set, which will work with a J2EE application running on WebSphere Application Server. I have two questions: 1. How many characters, such as Chinese, Japanese, can a CHAR(128) or
15
3520
by: krister | last post by:
Hello, I'm working in a quite large system that has some limitations. One of those is that I can't use printf() to get an output on a screen. I'm forced to use a special function, let's call it PrintOnConsole(), to get the output on a console. The problem with PrintOnConsole() is that it only takes strings as input arguments. On the other hand, I'm free to use sprintf(), so I can convert everything I want to print into a string and then...
173
13882
by: Ron Ford | last post by:
I'm looking for a freeware c99 compiler for windows. I had intended to use MS's Visual C++ Express and use its C capability. In the past with my MS products, I've simply needed to make .c the filetype to invoke the C compiler. Here's a link http://www.microsoft.com/express/download/#webInstall The download is 2.6 megs, which is near a reasonable size for a compiler, but then setup.exe wants to download 87 megs of dot net framework...
0
8296
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8816
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8710
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8598
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7310
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6162
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4150
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4299
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2721
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.