Hello,
I need to confirm this with the product team, however the process may need
a long time.
Thank you. Hopefuly, the bug is easy to omitting in a code, but under condition that
programmer know about it. Usually nobody assumes that compiler incorectly calculates
a value of literals. When pi == 3 and exp(1) == 2 results can be strange and nobody
knows why.
I'd like to inform that there are other strange behaviours (if not bugs) of the
compiler concerning character sets. I will describe CL 14.0 (from VS 2005 SP1). I'm
using English version of VS - not Asian ones. To illustrate the problems I've
attached two files. If they are deleted from my post then I will place their
description:
=== BEGIN OF test-ansi.c ===
#pragma setlocale(".932")
#define MSG_SPACE "Some Japanese DBCS text\n" \
"containing backslash\n" \
"as the trailing character"
main()
{
char *s = MSG_SPACE;
}
=== END OF test-ansi.c ===
=== BEGIN OF test-uni.c ===
The same as above test-ansi.c file, but encoded as UTF-16
=== END OF test-uni.c ===
My system codepage is 1250.
I will refer to the ISO C (9899:1999) which is the same as the ISO C++ in the
following case.
Compiling ,,test-ansi.c'' gives following warning:
test-ansi.c(9) : warning C4129: '' : unrecognized character escape sequence
According to ISO C ,,5.2.1.2 Multibyte characters'': <<While in the initial shift
state, all single-byte characters retain their usual interpretation and do not alter
the shift state. The interpretation for subsequent bytes in the sequence is a
function of the current shift state.>>
In ISO C ,,5.1.1.2 Translation phases'' states that conversion of escape sequences
occurs after preprocessing directives are executed. So preprocessor should know that
the string is in 932 DBCS and the backslash is a trailing char - not a leading one.
Adding the second backslash produce desired results. Fortunately CL 14.0 handles
UTF-16 files. However compiling ,,test-uni.c'' gives many of the following warning:
test-uni.c(9) : warning C4566: character represented by universal-character-name
'\u7A7A' cannot be represented in the current code page (1250)
which gives "??????????" string - not a 932 DBCS one.
As above, according to the ,,5.1.1.2 Translation phases'', <<5. Each source character
set member and escape sequence in character constants and string literals is
converted to the corresponding member of the execution character set; if there is no
corresponding member, it is converted to an implementation defined member other than
the null (wide) character.>>, which occurs _after_ preprocessing directives are
executed (phase 4.), so preprocessor should know that it should convert unicode
string to DBCS codepage 932 and not to SBCS 1250 codepage (which is my system
codepage).
-- best regards
Cezary Noweta