wcsftime output encoding

Roger Leigh

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The program listed below demonstrates the use of wcsftime() and
std::time_put<w char_t> which is a C++ wrapper around it. (I know this
isn't C; but the "problem" lies in the C library implementation of
wcsftime()). I'm not sure if this is a platform-dependent feature or
part of the C standard.

I've compiled with GCC 3.4.3 on GNU/Linux, and run in an en_GB UTF-8
locale. The output looks like this:

$ ./date3
asctime: Fri Nov 26 13:26:48 2004
strftime: Fri 26 Nov 2004 13:26:48 GMT
wcsftime: Fri 26 Nov 2004 13:26:48 GMT
std::time_put<c har>: Fri 26 Nov 2004 13:26:48 GMT
std::time_put<w char_t>: Fri 26 Nov 2004 13:26:48 GMT

Everything worked. It also works if I run in a different locale (all
locales use UTF-8 as their codeset):

$ LANG=de_DE LC_ALL=de_DE ./date3
asctime: Fri Nov 26 13:28:03 2004
strftime: Fr 26 Nov 2004 13:28:03 GMT
wcsftime: Fr 26 Nov 2004 13:28:03 GMT
std::time_put<c har>: Fr 26 Nov 2004 13:28:03 GMT
std::time_put<w char_t>: Fr 26 Nov 2004 13:28:03 GMT

$ LANG=pt_BR LC_ALL=pt_BR ./date3
asctime: Fri Nov 26 13:29:18 2004
strftime: Sex 26 Nov 2004 13:29:18 GMT
wcsftime: Sex 26 Nov 2004 13:29:18 GMT
std::time_put<c har>: Sex 26 Nov 2004 13:29:18 GMT
std::time_put<w char_t>: Sex 26 Nov 2004 13:29:18 GMT

However, if I use a locale where the output includes non-ASCII
characters, I get this:

asctime: Fri Nov 26 13:30:08 2004
strftime: ÐŸÑ‚Ð½ 26 ÐÐ¾Ñ 2004 13:30:08
wcsftime: ^_B= 26 ^]>O 2004 13:30:08
std::time_put<c har>: ÐŸÑ‚Ð½ 26 ÐÐ¾Ñ 2004 13:30:08
std::time_put<w char_t>: ^_B= 26 ^]>O 2004 13:30:08

In this case the "narrow" and "wide" outputs differ. The "narrow"
output is valid UTF-8, whereas the "wide" output is something
different entirely. What encoding does wcsftime() use when outputting
characters outside the ASCII range? UCS-4? Something
implementation-defined? I expected that both would result in readable
output; is this assumption incorrect?

My question is basically this: what is wcsftime() actually doing, and
how should I get printable output from the wide string it fills for
me?
Many thanks,
Roger
#include <iostream>
#include <locale>
#include <ctime>
#include <cwchar>

int main()
{
// Set up locale stuff...
std::locale::gl obal(std::local e(""));
std::cout.imbue (std::locale()) ;
std::wcout.imbu e(std::locale() );

// Get current time
time_t simpletime = time(0);

// Break down time.
std::tm brokentime;
localtime_r(&si mpletime, &brokentime) ;

// Normalise.
mktime(&brokent ime);

std::cout << "asctime: " << asctime(&broken time);

// Print with strftime(3)
char buffer[40];
std::strftime(& buffer[0], 40, "%c", &brokentime) ;

std::cout << "strftime: " << &buffer[0] << '\n';

wchar_t wbuffer[40];
std::wcsftime(& wbuffer[0], 40, L"%c", &brokentime) ;
std::wcout << L"wcsftime: " << &wbuffer[0] << L'\n';

// Try again, but use proper locale facets...
const std::time_put<c har>& tp =
std::use_facet< std::time_put<c har> >(std::cout.get loc());

std::string pattern("std::t ime_put<char>: %c\n");
tp.put(std::cou t, std::cout, std::cout.fill( ),
&brokentime, &*pattern.begin (), &*pattern.end() );

// And again, but using wchar_t...
const std::time_put<w char_t>& wtp =
std::use_facet< std::time_put<w char_t> >(std::wcout.ge tloc());

std::wstring wpattern(L"std: :time_put<wchar _t>: %c\n");
wtp.put(std::wc out, std::wcout, std::wcout.fill (),
&brokentime, &*wpattern.begi n(), &*wpattern.end( ));

return 0;
}
- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourc eforge.net/>

iD8DBQFBpz0qVcF caSW/uEgRAjGMAKCusoG dSOupZEllYLA5eC h65pL6awCfcnpu
sdoS5qoYLjBiULI arVOD5bE=
=BHQO
-----END PGP SIGNATURE-----

Nov 14 '05 #1

Subscribe Reply

5213

Roger Leigh

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Roger Leigh <${******@inval id.whinlatter.u klinux.net.inva lid> writes:

However, if I use a locale where the output includes non-ASCII
characters, I get this:

asctime: Fri Nov 26 13:30:08 2004
strftime: ÐŸÑ‚Ð½ 26 ÐÐ¾Ñ 2004 13:30:08
wcsftime: ^_B= 26 ^]>O 2004 13:30:08
std::time_put<c har>: ÐŸÑ‚Ð½ 26 ÐÐ¾Ñ 2004 13:30:08
std::time_put<w char_t>: ^_B= 26 ^]>O 2004 13:30:08

This occurs because I've mixed calls to std::cout and std::wcout. If
I only use one or the other, things work perfectly (I get valid UTF-8
in both cases).

I wrote a plain C testcase (below) that uses fprintf/wfprintf, and
this also works fine, but not if I mix them for the same FILE stream.
What is the reason for not allowing narrow and wide I/O to the same
stream?

Regards,
Roger
#define _GNU_SOURCE
#include <stdio.h>
#include <locale.h>
#include <time.h>
#include <wchar.h>

int main(void)
{
// Set up locale stuff...
setlocale(LC_AL L, "");

// Get current time
time_t simpletime = time(0);

// Break down time.
struct tm brokentime;
localtime_r(&si mpletime, &brokentime) ;

// Normalise.
mktime(&brokent ime);

fprintf (stdout, "asctime: %s", asctime(&broken time));

// Print with strftime(3)
char buffer[40];
strftime(&buffe r[0], 40, "%c", &brokentime) ;

fprintf (stdout, "strftime: %s\n", &buffer[0]);

wchar_t wbuffer[40];
wcsftime(&wbuff er[0], 40, L"%c", &brokentime) ;

fwide (stderr, 1);
fwprintf(stderr , L"wcsftime: %ls\n", &wbuffer[0]);

return 0;
}

- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourc eforge.net/>

iD8DBQFBp7J2VcF caSW/uEgRAgxnAKCmj5T OtbeBvVaw1WpEvx eejyNIoACeIFsU
ufebBdtactU0jyC Ff1NF/ac=
=rB04
-----END PGP SIGNATURE-----

Nov 14 '05 #2

Jack Klein

On Fri, 26 Nov 2004 22:47:34 +0000, Roger Leigh
<${******@inval id.whinlatter.u klinux.net.inva lid> wrote in
comp.lang.c:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Roger Leigh <${******@inval id.whinlatter.u klinux.net.inva lid> writes:

However, if I use a locale where the output includes non-ASCII
characters, I get this:

asctime: Fri Nov 26 13:30:08 2004
strftime: ??? 26 ??? 2004 13:30:08
wcsftime: ^_B= 26 ^]>O 2004 13:30:08
std::time_put<c har>: ??? 26 ??? 2004 13:30:08
std::time_put<w char_t>: ^_B= 26 ^]>O 2004 13:30:08
This occurs because I've mixed calls to std::cout and std::wcout. If

Please stop posting C++ details to comp.lang.c. The fact that C++
claims to include some of the C standard library is a C++ issue. The
C standard and this newsgroup disclaim all responsibility for how C++
library functions that happen to have the same name as C library
functions behave in a C++ program. Or how anything at all behaves in
a C++ program.

As for your assertion that your problem only occurs when you output
'non-ASCII' characters, or whether your output is UTF-8 or not, be
aware that neither language specifies the encoding a wide characters,
this is completely compiler and operating system specific, and not a
language issue at all.
I only use one or the other, things work perfectly (I get valid UTF-8
in both cases).

I wrote a plain C testcase (below) that uses fprintf/wfprintf, and
this also works fine, but not if I mix them for the same FILE stream.
What is the reason for not allowing narrow and wide I/O to the same
stream?

Regards,
Roger
#define _GNU_SOURCE
#include <stdio.h>
#include <locale.h>
#include <time.h>
#include <wchar.h>

int main(void)
{
// Set up locale stuff...
setlocale(LC_AL L, "");

// Get current time
time_t simpletime = time(0);

// Break down time.
struct tm brokentime;
localtime_r(&si mpletime, &brokentime) ; ^^^^^^^^^^^

This is not a function in either the C or C++ standard library,
neither language states anything at all about what it might or might
not do.

// Normalise.
mktime(&brokent ime);

fprintf (stdout, "asctime: %s", asctime(&broken time));
Here stdout becomes a byte-oriented stream by the act of calling a
character input/output function.
// Print with strftime(3)
char buffer[40];
strftime(&buffe r[0], 40, "%c", &brokentime) ;

fprintf (stdout, "strftime: %s\n", &buffer[0]);

wchar_t wbuffer[40];
wcsftime(&wbuff er[0], 40, L"%c", &brokentime) ;

fwide (stderr, 1);
The fwide() attempts to set the orientation of a stream. There is no
guarantee in the C standard library that it will succeed. Like most C
standard library functions, it returns a value indicating its result,
in this case the orientation, if any, of the stream after the call.

You are neglecting the returned value, yet it might have some bearing
on your issue.
fwprintf(stderr , L"wcsftime: %ls\n", &wbuffer[0]);

return 0;
}

Above you said "this code works fine, but not if you mix them" for the
same stream. This code performs byte and wide output to the same
stream. Does it work or not?

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.l earn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Nov 14 '05 #3

Roger Leigh

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jack Klein <ja*******@spam cop.net> writes:

On Fri, 26 Nov 2004 22:47:34 +0000, Roger Leigh
<${******@inval id.whinlatter.u klinux.net.inva lid> wrote in
comp.lang.c: Please stop posting C++ details to comp.lang.c. The fact that C++
claims to include some of the C standard library is a C++ issue. The
C standard and this newsgroup disclaim all responsibility for how C++
library functions that happen to have the same name as C library
functions behave in a C++ program. Or how anything at all behaves in
a C++ program.
My question was never about C++, it was solely about wcsftime(). C++
std::time_put<> wraps strftime() and wcsftime() in the C library
directly, and so it's not strictly a C++ issue either. Where would be
the correct place to ask, or does everyone absolve responsibility for
interoperabilit y?
As for your assertion that your problem only occurs when you output
'non-ASCII' characters, or whether your output is UTF-8 or not, be
aware that neither language specifies the encoding a wide characters,
this is completely compiler and operating system specific, and not a
language issue at all.
I'm aware of that, but I had hoped for a more constructive response,
for example what the standard says wcsftime() should output, and if
there was some portable method for determining this (if I'm writing
portable code, I won't know what this will be). Since wchar_t may be
used to store characters of any encoding of the programmer's choice, I
did expect it to be documented somewhere. It actualy appears to be
UCS-4 in this case, but I obviously can't rely on that if I need to do
any character manipulation.

I wrote a plain C testcase (below) that uses fprintf/wfprintf, and
this also works fine, but not if I mix them for the same FILE stream.
What is the reason for not allowing narrow and wide I/O to the same
stream?
This non-mixing is apparently specified in the C standard, but I don't
have access to a copy to verify this. The C++ restrictions come about
because they apparently defer to the C standard.
#define _GNU_SOURCE
#include <stdio.h>
#include <locale.h>
#include <time.h>
#include <wchar.h>

int main(void)
{
// Set up locale stuff...
setlocale(LC_AL L, "");

// Get current time
time_t simpletime = time(0);

// Break down time.
struct tm brokentime;
localtime_r(&si mpletime, &brokentime) ;

^^^^^^^^^^^

This is not a function in either the C or C++ standard library,
neither language states anything at all about what it might or might
not do.

It's a thread-safe localtime() equivalent, which has a nicer
interface. Replace with

struct tm *brokentime = localtime(&simp letime);

if you prefer.
The fwide() attempts to set the orientation of a stream. There is no
guarantee in the C standard library that it will succeed. Like most C
standard library functions, it returns a value indicating its result,
in this case the orientation, if any, of the stream after the call.

You are neglecting the returned value, yet it might have some bearing
on your issue.

That's very true, but in this case it's guaranteed to succeed, since
*stderr* has no orientation at this point.

fwprintf(stderr , L"wcsftime: %ls\n", &wbuffer[0]);

return 0;
}

Above you said "this code works fine, but not if you mix them" for the
same stream. This code performs byte and wide output to the same
stream. Does it work or not?

I use stdout as a narrow stream, and stderr as a wide stream (i.e. no
mixing at all). It works perfectly (the wide UCS-4 is transcoded to
UTF-8 for output). If I use stdout for both, I fail to get output
(because fwide() fails, as you would expect, and nothing wide is
printed).
Thanks,
Roger

- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourc eforge.net/>

iD8DBQFBqGs1VcF caSW/uEgRAsaMAJwOh+Y TiTRnnoAMAilmZG rygW0WewCfZQvT
6M0DO/6tCg+PsNRpI6r+S Ao=
=qEhw
-----END PGP SIGNATURE-----

Nov 14 '05 #4

Roger Leigh

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jack Klein <ja*******@spam cop.net> writes:

On Fri, 26 Nov 2004 22:47:34 +0000, Roger Leigh
<${******@inval id.whinlatter.u klinux.net.inva lid> wrote in
comp.lang.c:

> However, if I use a locale where the output includes non-ASCII
> characters, I get this:
>
> asctime: Fri Nov 26 13:30:08 2004
> strftime: ??? 26 ??? 2004 13:30:08
> wcsftime: ^_B= 26 ^]>O 2004 13:30:08
> std::time_put<c har>: ??? 26 ??? 2004 13:30:08
> std::time_put<w char_t>: ^_B= 26 ^]>O 2004 13:30:08

This occurs because I've mixed calls to std::cout and std::wcout. If

Please stop posting C++ details to comp.lang.c. The fact that C++
claims to include some of the C standard library is a C++ issue.

OK, here's a C++-free C example, that should be 100% Standard C:

#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>

int main(void)
{
setlocale(LC_AL L, "");

const char *narrow = "Test Unicode (narrow): Ã¯Ã*Ã½ ÐÐ¾Ñ!\n";
fprintf(stdout, "%s\n", narrow);

fprintf(stdout, "Narrow bytes:\n");
for (int i = 0; i< strlen(narrow); ++i)
fprintf(stdout, "%3d: %02X\n", i, (unsigned int) *((unsigned char *)narrow+i));

if (fwide (stderr, 1) <= 0)
fprintf(stdout, "Failed to set stderr to wide orientation\n") ;

const wchar_t *wide = L"Test Unicode (wide): Ã¯Ã*Ã½ ÐÐ¾Ñ!\n";
fwprintf(stderr , L"\n%ls\n", wide);

fprintf(stdout, "Wide bytes:\n");
for (int i = 0; i< (wcslen(wide) * sizeof(wchar_t) ); ++i)
fprintf(stdout, "%3d: %02X\n", i, (unsigned int) *((unsigned char *)wide+i));

return 0;
}

On my system, this exists on disc as UTF-8 encoded text:

$ file unicode.c
unicode.c: UTF-8 Unicode C program text

When I compile this on a GNU/Linux system with GCC 3.4 in C99 mode,
the narrow string exists in the compiled binary as UTF-8-encoded
bytes, while the wide string exists as UCS-4-encoded bytes. These
both appear to be output as UTF-8 when using a locale with a UTF-8
codeset.

My question is is this use of non-ASCII source code either standard or
portable? How portable would this code be to non-GNU systems and/or
compilers?

If a system uses other encodings for narrow and wide characters, are
there any macros/constants defined to determine these at compile time
or runtime?
Many thanks,
Roger

- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourc eforge.net/>

iD8DBQFBqHjRVcF caSW/uEgRAjn1AKDNuAZ gNrFZ2+Xw3QKwZm 0yC1GECgCeO0Dh
IC6jle+B4ELhH2i diFJIbE0=
=RGac
-----END PGP SIGNATURE-----

Nov 14 '05 #5

Mark McIntyre

On Sat, 27 Nov 2004 12:53:58 +0000, in comp.lang.c , Roger Leigh
<${******@inval id.whinlatter.u klinux.net.inva lid> wrote:
(snip sample using wide and narrow chars)

My question is is this use of non-ASCII source code either standard or
portable?
The standard defines a Translation Environment, in which the source code
lives as "units". The units must use the Source Character Set. This
consists of characters , and characters are defined in 3.7.1 as a bit
representation that fits in a single byte. There is however apparently
nothing that mandates the Source Character set to be restricted to only
single-byte chars. Indeed 3.7.2 says that a multibyte character can be part
of the source set too. Wide chars are not mentioned.
How portable would this code be to non-GNU systems and/or
compilers?
The standard doesn't say. I believe that it would be the responsibility of
any process which moved it from one system to another, to ensure it was
adequately translated for the new platform. Compare this to moving text
files from unix to windows/dos to mac - different ways of storing the
"unit" typically require it to be converted before it can be used on a
different platform.
If a system uses other encodings for narrow and wide characters, are
there any macros/constants defined to determine these at compile time
or runtime?

If there are, tehy're offtopic here as ISO C doesn't require them.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt >

Nov 14 '05 #6

Roger Leigh

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mark McIntyre <ma**********@s pamcop.net> writes:

On Sat, 27 Nov 2004 12:53:58 +0000, in comp.lang.c , Roger Leigh
<${******@inval id.whinlatter.u klinux.net.inva lid> wrote:

How portable would this code be to non-GNU systems and/or
compilers?

The standard doesn't say. I believe that it would be the responsibility of
any process which moved it from one system to another, to ensure it was
adequately translated for the new platform. Compare this to moving text
files from unix to windows/dos to mac - different ways of storing the
"unit" typically require it to be converted before it can be used on a
different platform.

OK, I can cope with that. At worst, it will need recoding with iconv
or similar.

If a system uses other encodings for narrow and wide characters, are
there any macros/constants defined to determine these at compile time
or runtime?

If there are, tehy're offtopic here as ISO C doesn't require them.

What I meant by this is this:

const char *narrow = "foo";
const wchar_t *wide = L"bar";

printf("%ls\n", bar);
wprintf("%s\n", foo);

In this example, I've printed a wide string to a narrow stream and
vice versa. The strings are transparently recoded to the other form,
so the C implementation must know at some level what encoding
represents each form. What I want to know is: what are the wide and
narrow forms for a given implementation?

I found one constant:
/* wchar_t uses ISO 10646-1 (2nd ed., published 2000-09-15) / Unicode 3.1. */
#define __STDC_ISO_1064 6__ 200009L

Are there any others that might be defined?
Thanks!
Roger

- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourc eforge.net/>

iD8DBQFBqhGUVcF caSW/uEgRAq3XAJ4xqu1 Mfr9j+wchzJjDyg egGrFXRACguB6S
T3L7K0Z3yA7vfv5 9yv3kdRg=
=fytC
-----END PGP SIGNATURE-----

Nov 14 '05 #7

CBFalconer

Roger Leigh wrote:

Jack Klein <ja*******@spam cop.net> writes:
Please stop posting C++ details to comp.lang.c. The fact that C++
The claims to include some of the C standard library is a C++
issue. C standard and this newsgroup disclaim all responsibility
for how C++ library functions that happen to have the same name as
C library functions behave in a C++ program. Or how anything at
all behaves in a C++ program.
My question was never about C++, it was solely about wcsftime().
C++ std::time_put<> wraps strftime() and wcsftime() in the C
library directly, and so it's not strictly a C++ issue either.
Where would be the correct place to ask, or does everyone absolve
responsibility for interoperabilit y?

The C standard (N869) says the following:

7.24.5.1 The wcsftime function

Synopsis

[#1]
#include <time.h>
#include <wchar.h>
size_t wcsftime(wchar_ t * restrict s,
size_t maxsize,
const wchar_t * restrict format,
const struct tm * restrict timeptr);

Description

[#2] The wcsftime function is equivalent to the strftime
function, except that:

-- The argument s points to the initial element of an
array of wide characters into which the generated
output is to be placed.

-- The argument maxsize indicates the limiting number of
wide characters.

-- The argument format is a wide string and the conversion
specifiers are replaced by corresponding sequences of
wide characters.

-- The return value indicates the number of wide
characters.

Returns

[#3] If the total number of resulting wide characters
including the terminating null wide character is not more
than maxsize, the wcsftime function returns the number of
wide characters placed into the array pointed to by s not
including the terminating null wide character. Otherwise,
zero is returned and the contents of the array are
indeterminate.

Similarly, you can look up the description of strftime referenced
above. All of this has nothing whatsoever to to with C++, and
cross posting to C.L.C++ is completely off topic there. Follow-ups
set accordingly.

.... snip ...
This non-mixing is apparently specified in the C standard, but I
don't have access to a copy to verify this. The C++ restrictions
come about because they apparently defer to the C standard.
Nonsense. Everybody has free access to the final draft N869. Just
google for it. You can also try the links in my sig block below.

Please also get rid of the following nonsense, which is totally
useless and annoying in newsgroups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourc eforge.net/>

iD8DBQFBqGs1VcF caSW/uEgRAsaMAJwOh+Y TiTRnnoAMAilmZG rygW0WewCfZQvT
6M0DO/6tCg+PsNRpI6r+S Ao=
=qEhw
-----END PGP SIGNATURE-----

--
Some useful references:
<http://www.ungerhu.com/jxh/clc.welcome.txt >
<http://www.eskimo.com/~scs/C-faq/top.html>
<http://benpfaff.org/writings/clc/off-topic.html>
<http://anubis.dkuug.dk/jtc1/sc22/wg14/www/docs/n869/> (C99)
<http://www.dinkumware. com/refxc.html> C-library

Nov 14 '05 #8

Charlie Gordon

"Roger Leigh" <${******@inval id.whinlatter.u klinux.net.inva lid> wrote in message
news:87******** ****@whinlatter .whinlatter.ukf sn.org...

#include <iostream>
#include <locale>
#include <ctime>
#include <cwchar>

int main()
{
// Set up locale stuff...
std::locale::gl obal(std::local e(""));
std::cout.imbue (std::locale()) ;
std::wcout.imbu e(std::locale() );

// Get current time
time_t simpletime = time(0);

// Break down time.
std::tm brokentime;
localtime_r(&si mpletime, &brokentime) ;

// Normalise.
mktime(&brokent ime);

std::cout << "asctime: " << asctime(&broken time);

// Print with strftime(3)
char buffer[40];
std::strftime(& buffer[0], 40, "%c", &brokentime) ;

std::cout << "strftime: " << &buffer[0] << '\n';

wchar_t wbuffer[40];
std::wcsftime(& wbuffer[0], 40, L"%c", &brokentime) ;
std::wcout << L"wcsftime: " << &wbuffer[0] << L'\n';

// Try again, but use proper locale facets...
const std::time_put<c har>& tp =
std::use_facet< std::time_put<c har> >(std::cout.get loc());

std::string pattern("std::t ime_put<char>: %c\n");
tp.put(std::cou t, std::cout, std::cout.fill( ),
&brokentime, &*pattern.begin (), &*pattern.end() );

// And again, but using wchar_t...
const std::time_put<w char_t>& wtp =
std::use_facet< std::time_put<w char_t> >(std::wcout.ge tloc());

std::wstring wpattern(L"std: :time_put<wchar _t>: %c\n");
wtp.put(std::wc out, std::wcout, std::wcout.fill (),
&brokentime, &*wpattern.begi n(), &*wpattern.end( ));

return 0;
}

For those who still thought C++ was close to C, look above.
Such nonsense makes me puke.
I have seen Perl scripts more readable than this.
Please keep comp.lang.c free of such pollution !

Thank you for re-posting using C.

--
Chqrlie.

Nov 14 '05 #9

Kevin Bracey

In message <87************ @whinlatter.whi nlatter.ukfsn.o rg>
Roger Leigh <${******@inval id.whinlatter.u klinux.net.inva lid> wrote:

What I meant by this is this: const char *narrow = "foo";
const wchar_t *wide = L"bar"; printf("%ls\n", bar);
wprintf("%s\n", foo); In this example, I've printed a wide string to a narrow stream and
vice versa. The strings are transparently recoded to the other form,
so the C implementation must know at some level what encoding
represents each form.
Indeed.
What I want to know is: what are the wide and narrow forms for a given
implementation?
You'll have to check with your implementation' s documentation. The C standard
unfortunately (from a programmer's point of view) specifies very little in
this area. It just puts in a framework on which an implementation can build
its facilities.

Personally, I find it all of rather dubious utility - the same "standard" C
functions might exist on all sorts of platforms, but exactly what encodings
any of them are going to use/support is unknown, making any practical code
using the functions effectively non-portable.
I found one constant:
/* wchar_t uses ISO 10646-1 (2nd ed., published 2000-09-15) / Unicode 3.1. */
#define __STDC_ISO_1064 6__ 200009L

Well, if that's defined then wchar_t contains Unicode/ISO 10646 code points.
That's a starting block. Then on a "reasonable " implementation, wprintf would
be translating from wide (hopefully 32-bit) Unicode to your system encoding.

On other platforms the wchar_t encoding may vary with locale - it may just be
a "wide" form of the current multibyte encoding, thus the recoding you
mention above would be very simple. If wchar_t is always Unicode, on the
other hand, then printf must contain iconv-like functionality.

I believe that setlocale() should do some of the configuration work,
depending on how your implementation handles it. If I recall the standard
correctly, then the locale in force at the time of fopen() is remembered
inside the FILE object. So if the locale determines encodings, then you're
set.

--
Kevin Bracey, Principal Software Engineer
Tematic Ltd Tel: +44 (0) 1223 503464
182-190 Newmarket Road Fax: +44 (0) 1728 727430
Cambridge, CB5 8HE, United Kingdom WWW: http://www.tematic.com/

Nov 14 '05 #10

Similar topics

2954

codecs latin1 unicode standard output file

by: Marko Faldix | last post by:

Hello, with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece of code: import codecs f = codecs.open("klotentest.txt", "w", "latin-1") print >>f, unicode("My umlauts are ä, ö, ü", "latin-1")

Python

4943

XSL output and/or encoding question..

by: Beowulf | last post by:

Hi, I have an XML file generated by a third party (and therefore unchangable) program. 1st line in it is <?xml version="1.0" encoding="us-ascii"?> and down in the depths of the xml I have a element <FirstName>Françoise</FirstName> I have an xsl file I've created to attempt to export this xml to CSV.

.NET Framework

5581

XSLT disable-output-escaping

by: Lisa | last post by:

I need to apply the HTML formatting tags and the French accented characters in a XML document. The XML is generated from a database that has HTML tags and French accented characters in the records. I have specified <xsl:output method="html"/> and encoding="iso-8859-1". When I apply the xsl:value-of and set the disable-output-escaping to "yes", the HTML formatting tags are displayed correctly, but the French accented characters are...

.NET Framework

1738

Formatting output - Part 2

by: Tom Petersen | last post by:

Here is a little more info, sorry should have explained what my final goal was. I am creating a .vcs file from a form to import into Outlook. I was just testing the output on screen then pasting that into a file, after removing the extra white space, and inserting line breaks. The data is valid, but the formatting into the file isn't working. Was I doing the formatting right if I was generating a file? I need it to create the .vcs...

ASP / Active Server Pages

3622

XslTransform throws exception when output is an XML fragment

by: John Meyer | last post by:

I have an application where I create an xml fragment using an XslTransform object. However, if I use the following output method, <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="no" omit-xml-declaration="yes" standalone="no" /> some of my transforms fail with the following exception: System.Xml.Xsl.XsltException: There are multiple root

.NET Framework

2019

How to output numerical entity encodings with the XslTransform class?

by: Tim Meagher | last post by:

I am using the XslTransform class in C#.net to output XML files that include non-ascii Unicode characters such as the Greek capital letter theta U+0398. I can easily outout the data as serialized UTF-8 by using the default encoding of UTF-8; however, my customer wants any non-ascii characters output as numerically-entity encoded characters, e.g. Θ (where 920 is the decimal equivalent for hexadecimal 398). If I use the Saxon processor or...

.NET Framework

6215

Read Input, Write Output (File) with Umlaute

by: Carlo Marchesoni | last post by:

I really don't achieve to read a simple 'input.txt' with the following content: JÃ¼rg (Hex: 4a fc 72 67) to an identical 'output.txt' I do the following (and tried with tons of different encodings): private static void WriteFile() { StreamWriter sr = File.CreateText("Output.txt"); try { using (TextReader tr = new StreamReader(new

ASP.NET

19014

output ANSI encoding for unicode character

by: Nick | last post by:

Hi, I am trying to output a string of chinese characters as a text file. When I open a file for writing from VB, the file is automatically set to UTF-8 encoding (can tell by opening the file from notepad). However, when I open this file from a Chinese program that does not support unicode, garbage is displayed. So what I have to do is to first use Notepad to change the encoding of the file to ANSI encoding, then the file would be...

Visual Basic .NET

8039

Length of encrypted output under 3DES in CBC cipher mode

by: Sathyaish | last post by:

I have the following scenario: Algorithm: 3DES Cipher Mode: CBC Key Size: 128-bit Block Size: 64 bit IV: 0x0000000000000000 (an eight byte array of zeros) The results I get using .NET with the following routine are:

C# / C Sharp

8989

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8828

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

9319

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9243

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6795

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6073

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4599

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4869

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2780

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP