473,695 Members | 2,801 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

how to use unicode in c under linux?

hi all
you know unicdoe is very important, under linux, i always use
utf-8, but now i need save one file in unicode. my linux is centos.
and i know this system support unicode. the wchar_t *p is a unicode
string, i print the len it is 7. it is right,i save the file, the
file length is 7. i had checked it, it is'hello??' ? = 0x3F, so
what's wrong with these code? thank you.
#define __STDC_ISO_1064 6__ 200104L
#include <wchar.h>
#include <stdio.h>
#include <stdlib.h>
#define _TEXT(x) L ## x
int main() {

FILE *fp = NULL;
wchar_t *filename = _TEXT("oov.txt" );
wchar_t *p = _TEXT("hello ");

wprintf(_TEXT(" %S\n"), p);

wprintf(_TEXT(" %d \n"), wcslen(p));

fp = fopen( "oov.txt", "w");
fwprintf(fp, _TEXT("%S"), p);
fclose(fp);
return 0;
}


my locale

LANG=zh_CN.UTF-8
LC_CTYPE="zh_CN .UTF-8"
LC_NUMERIC="zh_ CN.UTF-8"
LC_TIME="zh_CN. UTF-8"
LC_COLLATE="zh_ CN.UTF-8"
LC_MONETARY="zh _CN.UTF-8"
LC_MESSAGES="zh _CN.UTF-8"
LC_PAPER="zh_CN .UTF-8"
LC_NAME="zh_CN. UTF-8"
LC_ADDRESS="zh_ CN.UTF-8"
LC_TELEPHONE="z h_CN.UTF-8"
LC_MEASUREMENT= "zh_CN.UTF-8"
LC_IDENTIFICATI ON="zh_CN.UTF-8"
LC_ALL=
Sep 13 '08 #1
10 8178

"flywav" <pp*******@gmai l.comwrote in message fp = >fopen(
"oov.txt", "w");
fwprintf(fp, _TEXT("%S"), p);
fclose(fp);
(fwprintf() prints in ASCII)

Make sure your w_char type is actually multi-byte. If it is, then fwprintf()
must be doing the wrong thing. Try opening the file in binary. If that
fails, you'll just have to accept that the function doesn't do what you
want, and call putc to write out the Unicode byte by byte.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm
Sep 13 '08 #2
On 913, 5ʱ58, "Malcolm McLean" <regniz....@bti nternet.comwrot e:
"flywav" <ppmsn2...@gmai l.comwrote in message fp = >fopen(

"oov.txt", "w"); fwprintf(fp, _TEXT("%S"), p);
fclose(fp);

(fwprintf() prints in ASCII)

Make sure your w_char type is actually multi-byte. If it is, then fwprintf()
must be doing the wrong thing. Try opening the file in binary. If that
fails, you'll just have to accept that the function doesn't do what you
want, and call putc to write out the Unicode byte by byte.

--
Free games and programming goodies.http://www.personal.leeds.ac.uk/~bgy1mm
thanks, i had check my code
i use gcc -E 1.c

i found the code:
typedef long int wchar_t;
so i think wchar_t is unicode.
#define __STDC_ISO_1064 6__ 200104L
#include <wchar.h>
#include <stdio.h>
#include <stdlib.h>
#define _TEXT(x) L ## x

int main() {

FILE *fp = NULL;
wchar_t *filename = _TEXT("oov.txt" );
wchar_t *p = _TEXT("hello");

wprintf(_TEXT(" %S\n"), p);

wprintf(_TEXT(" %d \n"), wcslen(p));

fp = fopen( "oov.txt", "wb");
fwprintf(fp, _TEXT("%S"), p);
fclose(fp);
return 0;

}

the file length is still 5. :(
Sep 13 '08 #3
flywav <pp*******@gmai l.comwrites:
hi all
you know unicdoe is very important, under linux, i always use
utf-8, but now i need save one file in unicode. my linux is centos.
and i know this system support unicode. the wchar_t *p is a unicode
string, i print the len it is 7. it is right,i save the file, the
file length is 7. i had checked it, it is'hello??' ? = 0x3F, so
what's wrong with these code? thank you.
There are a few things wrong. Lets have a look...
#define __STDC_ISO_1064 6__ 200104L
This is set by the implementation. You don't get to say!
#include <wchar.h>
#include <stdio.h>
#include <stdlib.h>
#define _TEXT(x) L ## x
int main() {

FILE *fp = NULL;
wchar_t *filename = _TEXT("oov.txt" );
wchar_t *p = _TEXT("hello* 好");

wprintf(_TEXT(" %S\n"), p);
You can't print "wide" to stdout by default. Also, %S is
non-standard. Is it a typo?

You need to call setlocale first or none of the conversions will work.
After that, you need to decide if you want byte or wide output.
Byte output is easier, but if you must use wide output, then you must
set that first with a call to fwide.
wprintf(_TEXT(" %d \n"), wcslen(p));

fp = fopen( "oov.txt", "w");
fwprintf(fp, _TEXT("%S"), p);
fclose(fp);
return 0;
}
Try this:

#include <wchar.h>
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>

int main(void)
{
setlocale(LC_AL L, "");
FILE *fp = fopen("oov.txt" , "w");
if (fp == NULL) {
fprintf(stderr, "Open failed.\n");
return EXIT_FAILURE;
}
if (fwide(stdout, 1) < 0 || fwide(fp, 1) < 0) {
fprintf(stderr, "Failed to set wide output.\n");
return EXIT_FAILURE;
}

const wchar_t *p = L"hello* ";
wprintf(L"%ls\n ", p);
fwprintf(fp, L"%ls\n", p);
fclose(fp);
return 0;
}

You can avoid all the fwide stuff if you fprintf is acceptable.

--
Ben.
Sep 13 '08 #4
Ben Bacarisse wrote:
flywav <pp*******@gmai l.comwrites:
>you know unicdoe is very important, under linux, i always use
utf-8, but now i need save one file in unicode. my linux is
centos. and i know this system support unicode. the wchar_t *p
is a unicode string, i print the len it is 7. it is right, i
save the file, the file length is 7. i had checked it, it is
'hello??' ? = 0x3F, so what's wrong with these code?

There are a few things wrong. Lets have a look...
>#define __STDC_ISO_1064 6__ 200104L

This is set by the implementation. You don't get to say!
Adequately covered by the reservation of such names to the
implementation.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.
Sep 14 '08 #5
CBFalconer <cb********@yah oo.comwrites:
Ben Bacarisse wrote:
[...]
>>#define __STDC_ISO_1064 6__ 200104L

This is set by the implementation. You don't get to say!

Adequately covered by the reservation of such names to the
implementation.
Not really. The standard *could* have defined a mechanism allowing
programs to define a value for __STDC_ISO_1064 6__; such a mechanism
would not have violated the reservation of names starting with "__" to
the implementation, any more than "#ifdef __STDC_ISO_1064 6__" would
violate that reservation.

The fact that the standard *didn't* define such a mechanism is specified
in C99 6.10.8p4:

None of these macro names, nor the identifier defined, shall be
the subject of a #define or a #undef preprocessing directive.

(Since this is a "shall" requirement outside a constraint, the
behavior is undefined.)

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 14 '08 #6
On 914, 1ʱ42, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
flywav <ppmsn2...@gmai l.comwrites:
hi all
you know unicdoe is very important, under linux, i always use
utf-8, but now i need save one file in unicode. my linux is centos.
and i know this system support unicode. the wchar_t *p is a unicode
string, i print the len it is 7. it is right,i save the file, the
file length is 7. i had checked it, it is'hello??' ? = 0x3F, so
what's wrong with these code? thank you.

There are a few things wrong. Lets have a look...
#define __STDC_ISO_1064 6__ 200104L

This is set by the implementation. You don't get to say!
#include <wchar.h>
#include <stdio.h>
#include <stdlib.h>
#define _TEXT(x) L ## x
int main() {
FILE *fp = NULL;
wchar_t *filename = _TEXT("oov.txt" );
wchar_t *p = _TEXT("hello ");
wprintf(_TEXT(" %S\n"), p);

You can't print "wide" to stdout by default. Also, %S is
non-standard. Is it a typo?

You need to call setlocale first or none of the conversions will work.
After that, you need to decide if you want byte or wide output.
Byte output is easier, but if you must use wide output, then you must
set that first with a call to fwide.
wprintf(_TEXT(" %d \n"), wcslen(p));
fp = fopen( "oov.txt", "w");
fwprintf(fp, _TEXT("%S"), p);
fclose(fp);
return 0;
}

Try this:

#include <wchar.h>
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>

int main(void)
{
setlocale(LC_AL L, "");
FILE *fp = fopen("oov.txt" , "w");
if (fp == NULL) {
fprintf(stderr, "Open failed.\n");
return EXIT_FAILURE;
}
if (fwide(stdout, 1) < 0 || fwide(fp, 1) < 0) {
fprintf(stderr, "Failed to set wide output.\n");
return EXIT_FAILURE;
}

const wchar_t *p = L"hello" ;
wprintf(L"%ls\n ", p);
fwprintf(fp, L"%ls\n", p);
fclose(fp);
return 0;

}

You can avoid all the fwide stuff if you fprintf is acceptable.

--
Ben.
Ben, i had keyed in your code.

it run well. i got message hello world. english hello andchinese
hello)

but i hexdump file oov.txt
hexdump oov.txt
0000000 6568 6c6c e46f a0bd a5e5 0abd
000000c

is it right? i think if it is unicode file, it should be 65 00 68
00 ,etc? (intel cpu)

Sep 16 '08 #7
In article <aa************ *************** *******@v39g200 0pro.googlegrou ps.com>,
flywav <pp*******@gmai l.comwrote:
>0000000 6568 6c6c e46f a0bd a5e5 0abd
000000c
>is it right? i think if it is unicode file,
Unicode itself is not a character encoding, it's a list of characters
with corresponding numbers (known as "code points"). There are
several different ways of encoding those numbers as a sequence of
bytes.
>it should be 65 00 68 00 ,etc? (intel cpu)
You would get that if it were using the UTF-16 (little-endian) encoding
of Unicode. What you are actually getting is the UTF-8 encoding,
in which ascii characters (i.e. those < 128) appear normally, and
other characters are encoded as a sequence of 2 or more bytes. You
have two sequences of 3 bytes corresponding to two Chinese characters.

-- Richard
--
Please remember to mention me / in tapes you leave behind.
Sep 16 '08 #8
On 916, 4ʱ03, rich...@cogsci. ed.ac.uk (Richard Tobin) wrote:
In article <aa62d447-5548-4937-bb27-a1127c938...@v3 9g2000pro.googl egroups..com>,

flywav <ppmsn2...@gmai l.comwrote:
0000000 6568 6c6c e46f a0bd a5e5 0abd
000000c
is it right? i think if it is unicode file,

Unicode itself is not a character encoding, it's a list of characters
with corresponding numbers (known as "code points"). There are
several different ways of encoding those numbers as a sequence of
bytes.
it should be 65 00 68 00 ,etc? (intel cpu)

You would get that if it were using the UTF-16 (little-endian) encoding
of Unicode. What you are actually getting is the UTF-8 encoding,
in which ascii characters (i.e. those < 128) appear normally, and
other characters are encoded as a sequence of 2 or more bytes. You
have two sequences of 3 bytes corresponding to two Chinese characters.

-- Richard
--
Please remember to mention me / in tapes you leave behind.
nice!
but i still had some question
in ode
wchar_t *p = L"aaaa";

p is an unicode string or ansi string?
i think it is an unicode, but what's encoding? UTF-16 or UTF8 ? how
can i sure it ?
(in windows, wchar_t *p = L"aaa", i think it always is a unicode
string with UTF-16 encodeing, is it right?)

You would get that if it were using the UTF-16 (little-endian)
encodingof Unicode.
how to do this?. (use iconv lib ??).I want use the unicode string
(UTF-16 encodeing) in all mycode?

how to write the string to the file using UTD-16 encoding? I also want
read this string from saved file.
thanks all!

Sep 16 '08 #9
In article <ba************ *************** *******@v16g200 0prc.googlegrou ps.com>,
flywav <pp*******@gmai l.comwrote:
>but i still had some question
in ode
wchar_t *p = L"aaaa";

p is an unicode string or ansi string?
That's an internal matter for the system. It's probably UTF-16 or
UTF-32. The question of little- or big-endian doesn't normally arise,
any more than it does for ints.
>how to write the string to the file using UTD-16 encoding? I also want
read this string from saved file.
You may be able to control this by setting the locale appropriately.

-- Richard
--
Please remember to mention me / in tapes you leave behind.
Sep 16 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
4584
by: Peter Maas | last post by:
Recently I replaced Win2k with Linux on my desktop computer. Using mostly multi-platform software I thought this would be easy. It was not as easy as expected getting wxPython to work. There seemed to be no SuSE RPM so I installed from source. Here are my steps (gtk 2.4 was already installed): - Built wxWidgets (.configure --enable-unicode) - Built wxPython (python setup.py install) error: "you should use wx-config program for...
12
8221
by: Mike Dee | last post by:
A very very basic UTF-8 question that's driving me nuts: If I have this in the beginning of my Python script in Linux: #!/usr/bin/env python # -*- coding: UTF-8 -*- should I - or should I not - be able to use non-ASCII characters in strings and in Tk GUI button labels and GUI window titles and in raw_input data without Python returning wrong case in manipulated
7
5104
by: Me | last post by:
I am trying to compile some code Ive gotten from another and I know I need a 16 bit unicode string, for he passes the pointer to functions that take a (uint16 *), however there are initializations that look like this. typedef unsigned short int ucs2_char; .... ....
2
9787
by: hezhenjie | last post by:
Hi, all: I just need to parse a unicode file, and assume to get data one line by one line. I use _wfopen(), fgetws(), wcslen(), wcsstr(), making it work normally on Windows platform. However, when migrate it to Linux platform, issue occurs. Linux only has fopen() function, and fgetws() could not correctly get lines, in fact, it gets nothing.
7
4199
by: Robert | last post by:
Hello, I'm using Pythonwin and py2.3 (py2.4). I did not come clear with this: I want to use win32-fuctions like win32ui.MessageBox, listctrl.InsertItem ..... to get unicode strings on the screen - best results according to the platform/language settings (mainly XP Home, W2K, ...). Also unicode strings should be displayed as nice as possible at the console with normal print-s to stdout (on varying platforms, different
5
11706
by: Josh | last post by:
Can anyone tell me how do we use Unicode characters in C++ ????
9
1946
by: Gerry | last post by:
I'm using pyExcelerator and xlrd to read and write data from and to two spreadsheets. I created the "read" spreadsheet by importing a text file - and I had no unicode aspirations. When I read a cell, it appears to be unicode u'Q1", say. I can try cleaning it, like this:
1
5839
by: erikcw | last post by:
Hi, I'm trying to insert some data from an XML file into MySQL. However, while importing one of the files, I got this error: Traceback (most recent call last): File "wa.py", line 304, in ? main() File "wa.py", line 257, in main curHandler.walkData()
1
3036
by: noopurtiwari | last post by:
Hi All, I am porting a windows c++ code on to linux platform.The problen is, it uses functions like _tfopen and _taccess which are wndows specific function for providing unicode support. i was interested in knowing : 1. how to enable unicode on linux. 2. Are there any equivalent functions for _tfopen _taccess() in linux that would enable unicode support. (After some investigation about fopen() i found that it might support unicode on...
1
5773
by: anonymous | last post by:
1 Objective to write little programs to help me learn German. See code after numbered comments. //Thanks in advance for any direction or suggestions. tk 2 Want keyboard answer input, for example: answer_str = raw_input(' Enter answer ') Herr
0
8642
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
9126
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
8861
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8833
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6500
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4349
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3021
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2283
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
1984
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.