8 bit character string to 16 bit character string

Brand Bogard

Does the C standard include a library function to convert an 8 bit character
string to a 16 bit character string?

May 25 '06 #1

Subscribe Reply

11862

Walter Roberson

In article <e5**********@n ewshost.mot.com >,
Brand Bogard <br**********@m otorola.com> wrote:

Does the C standard include a library function to convert an 8 bit character
string to a 16 bit character string?

No. All that the C standard knows about char is that it is a -minimum-
of 8 bits long.

What might interest you, however is:

wchar_t is value superset of char_t, so if you have an array of wchar_t
and copy each member of a char array in the corresponding position in
it, the result will be a valid wchar_t string representing the same text.

Once you have a wchar_t string, you can use wcstombs() to convert
it into a locale-dependant multibyte string (c.f. LC_CTYPE). If
your locale has been set up properly, this should do the transformation
you want.

By itself "16 bit character string" is not specific enough: you
need to know which encoding you are using, such as utf-16 .

--
Prototypes are supertypes of their clones. -- maplesoft

May 25 '06 #2

those who know me have no need of my name

in comp.lang.c i read:

In article <e5**********@n ewshost.mot.com >,
Brand Bogard <br**********@m otorola.com> wrote:
Does the C standard include a library function to convert an 8 bit
character string to a 16 bit character string?

No. All that the C standard knows about char is that it is a -minimum-
of 8 bits long.

i.e., if you must work with 8 and 16 bit character strings you will need
custom routines if you want much portability.
What might interest you, however is:

wchar_t is value superset of char_t, so if you have an array of wchar_t
and copy each member of a char array in the corresponding position in
it, the result will be a valid wchar_t string representing the same text.

also, if setlocale() has been appropriately used then mbstowcs or mbsrtowcs
will convert a string (a sequence of char terminated by a null byte '\0'),
each of which may be part of a multi-byte sequence, into a wide-character
string (a sequence of wchar_t terminated by a wide null byte L'\0').

--
a signature

May 26 '06 #3

Haider

Try mbstowcs it will work.

May 26 '06 #4

Brand Bogard

"Haider" <hm*****@yahoo. com> wrote in message
news:11******** **************@ i40g2000cwc.goo glegroups.com.. .

Try mbstowcs it will work.

mbstowcs isn't in out environment, but mbtowc is. Thanks.

May 26 '06 #5

Walter Roberson

In article <e5**********@n ewshost.mot.com >,
Brand Bogard <br**********@m otorola.com> wrote:

mbstowcs isn't in out environment, but mbtowc is.

mbstowcs() is part of the C89 standard, and so should be available
in any hosted environment. I suggest you check <stdlib.h> to see if
it is declared there.

mbstowcs() is for converting multibyte character strings into
wide character strings. Multibyte character strings are not
necessarily "16 bit characters"; for example, the encoding used might
normally represent ISO8896-1 characters as single bytes, only
shifting into 16+ bit representations when necessary to encode
characters from other character sets. In some cases, a multibyte
character string that requires multiple bytes to represent might
convert into byte that fits within a standard (narrow) char.
The detailed representations of characters in multibyte strings
is outside of the perview of the C standard (other than a constraint
put upon the nul character.)

If you have a (narrow) char string, you cannot convert it to
a wchar_t string by setting your locale to "C" and then passing
the string through mbstowcs(). That's because the "C" locale specifies
a -particular- character encoding, and that encoding might not match
the encoding of the execution character set, so mbstowcs() might
map the characters to something unexpected, or could even fail
(if the execution character set happened to use encodings that
were incompatible with the encoding structure for the C locale
character set.)

Thus, in order to convert a char string into a wider string, you
have to copy the chars one by one into an array of wchar_t .
If you need to work with Unicode or utf-16 or whatever after that,
then wcstombs() is what you should look at.
--
Programming is what happens while you're busy making other plans.

May 26 '06 #6

Simon Biber

Walter Roberson wrote:

In article <e5**********@n ewshost.mot.com >,
Brand Bogard <br**********@m otorola.com> wrote:
Does the C standard include a library function to convert an 8 bit character
string to a 16 bit character string?

No. All that the C standard knows about char is that it is a -minimum-
of 8 bits long.

What might interest you, however is:

wchar_t is value superset of char_t, so if you have an array of wchar_t
and copy each member of a char array in the corresponding position in
it, the result will be a valid wchar_t string representing the same text.

No, in the general case it is not!

On most of the Linux systems that I admin, wchar_t is UTF-32 and char is
UTF-8. In that case, if you simply copy each member of a char array in
the corresponding position to a wchar_t array, it will not be a valid
wchar_t string representing the same text!

The same is true for any encoding of the char array apart from ISO-8859-1.

The standard only guarantees the "value superset" semantics for the
_basic character set_. (Ref: C99 7.17 paragraph 2)

Assuming that wchar_t is either UTF-16 or UTF-32, then there is only
case where char arrays containing characters outside the basic character
set can be copied wholesale into wchar_t arrays. That is where the
encoding of the char array is ISO-8859-1.

--
Simon.

May 27 '06 #7

Stephen Sprunk

"Walter Roberson" <ro******@ibd.n rc-cnrc.gc.ca> wrote in message
news:e5******** **@canopus.cc.u manitoba.ca...

If you have a (narrow) char string, you cannot convert it to
a wchar_t string by setting your locale to "C" and then passing
the string through mbstowcs(). That's because the "C" locale specifies
a -particular- character encoding, and that encoding might not match
the encoding of the execution character set, so mbstowcs() might
map the characters to something unexpected, or could even fail
(if the execution character set happened to use encodings that
were incompatible with the encoding structure for the C locale
character set.)

Thus, in order to convert a char string into a wider string, you
have to copy the chars one by one into an array of wchar_t .
If you need to work with Unicode or utf-16 or whatever after that,
then wcstombs() is what you should look at.

Please pardon the tangent...

Does anyone have a reference to _how to actually use_ the multi-byte / wide
functions in a real program? I've studied the documentation available, and
I can't make heads or tails of them or figure out how to do what I want.

Specifically, I'm looking for a way to read from a text file that is in one
multibyte encoding, manipulate the contents as wide chars, then write to a
text file that is in a _different_ multibyte encoding. I'm sure it's
simple, but I can't find any examples of code using the standard C
functions, just stuff like <OT>libiconv</OT>.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin
*** Posted via a free Usenet account from http://www.teranews.com ***

May 27 '06 #8

those who know me have no need of my name

in comp.lang.c i read:

Does anyone have a reference to _how to actually use_ the multi-byte /
wide functions in a real program?
the main issue is that it is something of a portability nightmare, at least
without resorting to facilities beyond those in the c standard.
Specifically , I'm looking for a way to read from a text file that is
in one multibyte encoding, manipulate the contents as wide chars, then
write to a text file that is in a _different_ multibyte encoding.

the main issue is setting the locales properly. since there are few
standards for the meaning of the names, and what few exist don't tend to be
strict, this means much guessing and potential failures. sometimes this is
a non-issue, as a single known (and working) locale is involved for input
and output.

secondarily is library conformance; specifically whether it supports amd1
or c99, vs plain old c89. without amd1 or later you need to read a string
then use mbstowcs to convert to a wide string, at which point you can
manipulate the various wchar_t. character by character is not possible
using just c89 facilities (unless you want to go into the business of
decoding character encodings yourself).

a program that counts upper-case characters looks nearly the same when
insensitive to locale:

#include <stdio.h>
#include <ctype.h>

int main(void)
{
unsigned long upper = 0;
int c;

while (EOF != (c = getc(stdin)))
if (isupper(c))
upper++;

printf("There were %lu upper-case characters.\n", upper);

return 0;
}

as when sensitive (w/amd1 or c99 conformance):

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>
#include <wctype.h>

int main(void)
{
unsigned long upper = 0;
wint_t c;

if (0 ==
setlocale(LC_CT YPE, "")) /* environment specified locale */
{
fputs("your locale is invalid, the world ends\n", stderr);
abort();
}

while (WEOF != (c = getwc(stdin)))
if (iswupper(c))
upper++;

wprintf(L"There were %lu upper-case characters.\n", upper);

return 0;
}

but your desire for a different locale on output makes it tricky. worse,
switching between locales can have issues, so best to get everything done
with one locale before moving to the next. you might let the user specify
each, and pray they supply valid names:

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>
#include <wctype.h>

int main(void)
{
unsigned long upper = 0;
wint_t c;

if (3 != argc)
{
fputs("incorrec t number of arguments\n", stderr);
fputs("supply input and output locale names\n", stderr);
abort();
}

if (0 ==
setlocale(LC_CT YPE, argv[1])) /* user specified input locale */
{
fputs("input locale is invalid, the world ends\n", stderr);
abort();
}
while (WEOF != (c = getwc(stdin)))
if (iswupper(c))
upper++;

if (0 ==
setlocale(LC_AL L, argv[2])) /* user specified output locale */
{
fputs("output locale is invalid, the world ends\n", stderr);
abort();
}
wprintf(L"There were %lu upper-case characters.\n", upper);

return 0;
}

though i've used wide string literals, and associated output functions, i
haven't actually shown anything that would make them useful, because the
form is implementation defined so anything outside the basic character set
may not be portable. wonderful, huh? now that isn't to say there is no
way to handle it, most people would use a localization (l10n) mechanism
like catgets or gettext so that the strings would be fetched from an
external resource which is aligned with the implementation requirements.
c99 provides a (somewhat clumsy) way to use iso-10646 characters in wide
string literals, which increases source portability -- i could have used
them here, though that would just make the "c99 isn't real" people come out
of the woodwork.

--
a signature

May 28 '06 #9

Similar topics

12913

Converting Character Array to String

by: Charles L | last post by:

I don't know if this is a stupid quesiton or not. I would like to know how to convert an array of characters generated from a previous operation to a string ie how do I append a null character at the end? I haven't been able to do it so far. Is there a string function I can use? Can anyone help? Charles L

C / C++

96283

printing % with printf(), use of \ (escape) character

by: teachtiro | last post by:

Hi, 'C' says \ is the escape character to be used when characters are to be interpreted in an uncommon sense, e.g. \t usage in printf(), but for printing % through printf(), i have read that %% should be used. Wouldn't it have been better (from design perspective) if the same escape character had been used in this case too. Forgive me for posting without verfying things with any standard compiler, i don't have the means for now.

C / C++

1857

accessing and storing character arguements from command line

by: Dawn Minnis | last post by:

Hey guys If I have a program (see codeSnippet1) that I compile to be called test.o Then run it as test.o n n 2 3 4 I want the code to be able to strip out the two characters at the start (always going to be 2) and store them as characters. But I can't seem to get it to work because it is a pointer to a vector of characters. However, if I only run with integer arguements and use codeSnippet2 it works fine and they convert nicely to...

C / C++

4607

Problems with Replace Method

by: james | last post by:

Hi, I am loading a CSV file ( Comma Seperated Value) into a Richtext box. I have a routine that splits the data up when it hits the "," and then copies the results into a listbox. The data also has some different characters in it that I am trying to remove. The small a with two dots over it and the small y with two dots over it. Here is my code so far to remove the small y: Private Sub Button2_Click(ByVal sender As System.Object, ByVal...

Visual Basic .NET

2501

building an unsigned character string.

by: Justin | last post by:

i need to build the unsigned character string: "PR0N\0Spam\0G1RLS\0Other\0Items\0\0\0" from the signed character string: "PR0N Spam G1RLS Other Items" Tokeninzing the character string is not a problem. I can't solve my concatenation problem. I've researched this topic extensively and I've found nothing to help. Failure resulted when I used memcpy,_mbscat, and various other methods. If anyone knows how to build a unsigned

C / C++

10996

how to initial and print the unicode character?

by: wizardyhnr | last post by:

i want to try ANSI C99's unicode fuctions. so i write a test program. the function is simple, but i cannot compile it with dev c++ 4.9.9.2 under windows xp sp2, since the compiler always think that the initialization of the wchar_t string is illegal. here is my function: #include <stdio.h> #include <stdlib.h> #include <wchar.h> #include <wctype.h> #include <string.h>

C / C++

4061

String and Character Array...

by: Shhnwz.a | last post by:

Hi, I am in confusion regarding jargons. When it is technically correct to say.. String or Character Array.in c. just give me your perspectives in this issue. Thanx in Advance.

C / C++

12555

How to translate string with data have integer + character and null data

by: amija0311 | last post by:

Hi, I am new using DB2 9.1 database by windows base. I want to query the data that contain string then translate the string into integer using DB2. The problems is If the data is null, i got the problem to translate. How to translate string also allow null data to integer. If null data it will read as space. My Data :- GEOSEG_ID SEQNO

DB2 Database

2553

Questions about character entities in XML and PCI security compliance

by: tempest | last post by:

Hi all. This is a rather long posting but I have some questions concerning the usage of character entities in XML documents and PCI security compliance. The company I work for is using a third party ecommerce service for hosting its online store. A few months ago this third party commerce site began using PGP file encryption on XML files (e.g. web orders) transferred to us as part of the ongoing PCI security compliance.

.NET Framework

7993

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

7920

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8054

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8268

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6730

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

5867

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

3900

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

3944

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2418

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp