473,786 Members | 2,795 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Unicode characters in identifiers

Where can I find which Unicode characters are valid for identifiers in
Visual C++ 2005?

Thanks
Richard
Aug 15 '06 #1
9 1441
"R.Kaiser" <rk@invalid-rkaiser.dewrote in message
news:eQ******** ******@TK2MSFTN GP02.phx.gbl...
Where can I find which Unicode characters are valid for identifiers in
Visual C++ 2005?
According to the VC++ documentation:

http://msdn2.microsoft.com/en-us/library/565w213d.aspx

The following characters are valid as the first character of a name:

_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

And the following characters, in addition to the above, are valid as the
second or subsequent character:

0 1 2 3 4 5 6 7 8 9

Plus, the $ character is also valid as a MS extension.

-cd
Aug 15 '06 #2
>Where can I find which Unicode characters are valid for identifiers in
Visual C++ 2005?

According to the VC++ documentation:
...
More accurate definition according to the spec is:

2.10 Identifiers [lex.name]
identifier:
nondigit
identifier nondigit
identifier digit

nondigit: one of
universal-character-name
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

digit: one of
0 1 2 3 4 5 6 7 8 9

1 An identifier is an arbitrarily long sequence of letters and digits. Each
universal-character-name in an identifier shall designate a character whose
encoding in ISO 10646 falls into one of the ranges specified in Annex E.
Upper- and lower-case letters are different. All characters are significant.

2 In addition, some identifiers are reserved for use by C++ implementations
and standard libraries (17.4.3.1.2) and shall not be used otherwise; no
diagnostic is required.

This means that national characters can be a part of a name. I've tested
Russian and Hebrew characters. It's worked.
--
Vladimir Nesterovsky
Aug 15 '06 #3
Thanks Carl and Vladimir, what you are listing are the valid characters
for Standard C++ identifiers.

But in Windows Forms Applications, Visual C++ 2005 also accepts
extensions, like German Umlauts in ordinary identifiers

int Zähler;

or names like

àÀáÁãÃçÇéÉêÊÍíó ÓúÚüÜ

for names of Windows controls. I could use this expression as a value
for the Name property of e.g. a Button by setting it in the Properties
window.

The rules seem to be quite complicated:

int Zähler; // valid
int Z$hler; // valid
int Zä$hler; // error
int xàÀáÁãÃçÇéÉêÊÍí óÓúÚüÜ; // valid
int àÀáÁãÃçÇéÉêÊÍíó ÓúÚüÜ; // error

Despite intensive search, I could find no reference for the valid
characters of identifiers in VS2005. I would expect, that certain Arabic
and Asian characters are also valid.

Richard Kaiser

Carl Daniel [VC++ MVP] schrieb:
"R.Kaiser" <rk@invalid-rkaiser.dewrote in message
news:eQ******** ******@TK2MSFTN GP02.phx.gbl...
>Where can I find which Unicode characters are valid for identifiers in
Visual C++ 2005?

According to the VC++ documentation:

http://msdn2.microsoft.com/en-us/library/565w213d.aspx

The following characters are valid as the first character of a name:

_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

And the following characters, in addition to the above, are valid as the
second or subsequent character:

0 1 2 3 4 5 6 7 8 9

Plus, the $ character is also valid as a MS extension.

-cd

Aug 15 '06 #4
"Vladimir Nesterovsky" wrote:
>
Where can I find which Unicode characters are valid for identifiers in
Visual C++ 2005?
According to the VC++ documentation:
...

More accurate definition according to the spec is:

2.10 Identifiers [lex.name]
identifier:
nondigit
identifier nondigit
identifier digit

nondigit: one of
universal-character-name
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

digit: one of
0 1 2 3 4 5 6 7 8 9

1 An identifier is an arbitrarily long sequence of letters and digits. Each
universal-character-name in an identifier shall designate a character whose
encoding in ISO 10646 falls into one of the ranges specified in Annex E.
Upper- and lower-case letters are different. All characters are significant.

2 In addition, some identifiers are reserved for use by C++ implementations
and standard libraries (17.4.3.1.2) and shall not be used otherwise; no
diagnostic is required.

This means that national characters can be a part of a name. I've tested
Russian and Hebrew characters. It's worked.
--
Vladimir Nesterovsky
How do you save the file? as utf-8 or utf-16, with unicode marker ?

--PA
Aug 16 '06 #5
"Vladimir Nesterovsky" <vl******@neste rovsky-bros.comwrote in message
news:Of******** ******@TK2MSFTN GP02.phx.gbl...
>
>>Where can I find which Unicode characters are valid for identifiers in
Visual C++ 2005?

According to the VC++ documentation:
...

More accurate definition according to the spec is:

2.10 Identifiers [lex.name]
identifier:
nondigit
identifier nondigit
identifier digit

nondigit: one of
universal-character-name
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

digit: one of
0 1 2 3 4 5 6 7 8 9

1 An identifier is an arbitrarily long sequence of letters and digits.
Each universal-character-name in an identifier shall designate a character
whose encoding in ISO 10646 falls into one of the ranges specified in
Annex E. Upper- and lower-case letters are different. All characters are
significant.

2 In addition, some identifiers are reserved for use by C++
implementations and standard libraries (17.4.3.1.2) and shall not be used
otherwise; no diagnostic is required.
Yes, I know what the C++ standard says - I was simply quoting what the VC++
2005 documentation says. I rather suspected that the documentation was
wrong, since I know there was a bunch of work done in the 7.0 compiler to
support unicode source files.
>
This means that national characters can be a part of a name. I've tested
Russian and Hebrew characters. It's worked.
Good to know, thanks.

-cd
Aug 16 '06 #6
"R.Kaiser" <rk@invalid-rkaiser.dewrote in message
news:OO******** ******@TK2MSFTN GP06.phx.gbl...
Thanks Carl and Vladimir, what you are listing are the valid characters
for Standard C++ identifiers.

But in Windows Forms Applications, Visual C++ 2005 also accepts
extensions, like German Umlauts in ordinary identifiers
Those are also valid according to the C++ standard, and apparently according
to VC++ (but not according to the VC++ documentation - go figure).
>
int Zähler;

or names like

àÀáÁãÃçÇéÉêÊÍíó ÓúÚüÜ

for names of Windows controls. I could use this expression as a value for
the Name property of e.g. a Button by setting it in the Properties window.

The rules seem to be quite complicated:

int Zähler; // valid
int Z$hler; // valid
int Zä$hler; // error
int xàÀáÁãÃçÇéÉêÊÍí óÓúÚüÜ; // valid
int àÀáÁãÃçÇéÉêÊÍíó ÓúÚüÜ; // error

Despite intensive search, I could find no reference for the valid
characters of identifiers in VS2005. I would expect, that certain Arabic
and Asian characters are also valid.
The cases you cite that don't work are interesting. What source file
encoding were you using? I can imagine that if the source file was, for
example, Latin-8 but the compiler concluded (incorrectly) that it was UTF-8,
that certain pairs of characters would end up being illegal.

I would expect that only a Unicode encoding would be safe for files with
non-ASCII characters, but I haven't done any experimenting in that area
myself.

-cd
Aug 16 '06 #7
Thanks Vladimir,

I have overlooked the reference to Annex E in the standard.

Richard
Vladimir Nesterovsky schrieb:
>>Where can I find which Unicode characters are valid for identifiers in
Visual C++ 2005?
According to the VC++ documentation:
...

More accurate definition according to the spec is:

2.10 Identifiers [lex.name]
identifier:
nondigit
identifier nondigit
identifier digit

nondigit: one of
universal-character-name
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

digit: one of
0 1 2 3 4 5 6 7 8 9

1 An identifier is an arbitrarily long sequence of letters and digits. Each
universal-character-name in an identifier shall designate a character whose
encoding in ISO 10646 falls into one of the ranges specified in Annex E.
Upper- and lower-case letters are different. All characters are significant.

2 In addition, some identifiers are reserved for use by C++ implementations
and standard libraries (17.4.3.1.2) and shall not be used otherwise; no
diagnostic is required.

This means that national characters can be a part of a name. I've tested
Russian and Hebrew characters. It's worked.
--
Vladimir Nesterovsky

Aug 16 '06 #8
Carl Daniel [VC++ MVP] schrieb:
"R.Kaiser" <rk@invalid-rkaiser.dewrote in message
news:OO******** ******@TK2MSFTN GP06.phx.gbl...
>Thanks Carl and Vladimir, what you are listing are the valid characters
for Standard C++ identifiers.

But in Windows Forms Applications, Visual C++ 2005 also accepts
extensions, like German Umlauts in ordinary identifiers

Those are also valid according to the C++ standard, and apparently according
to VC++ (but not according to the VC++ documentation - go figure).
> int Zähler;

or names like

àÀáÁãÃçÇéÉêÊÍíó ÓúÚüÜ

for names of Windows controls. I could use this expression as a value for
the Name property of e.g. a Button by setting it in the Properties window.

The rules seem to be quite complicated:

int Zähler; // valid
int Z$hler; // valid
int Zä$hler; // error
int xàÀáÁãÃçÇéÉêÊÍí óÓúÚüÜ; // valid
int àÀáÁãÃçÇéÉêÊÍíó ÓúÚüÜ; // error

Despite intensive search, I could find no reference for the valid
characters of identifiers in VS2005. I would expect, that certain Arabic
and Asian characters are also valid.

The cases you cite that don't work are interesting. What source file
encoding were you using?
I used

Western European (Windows) - Codepage 1252

Richard
I can imagine that if the source file was, for
example, Latin-8 but the compiler concluded (incorrectly) that it was UTF-8,
that certain pairs of characters would end up being illegal.

I would expect that only a Unicode encoding would be safe for files with
non-ASCII characters, but I haven't done any experimenting in that area
myself.

-cd

Aug 16 '06 #9
I would expect that only a Unicode encoding would be safe for files with
non-ASCII characters, but I haven't done any experimenting in that area
myself.
When in doubt, go Unicode :-)
If you want something else, you will be (of course) limited to characters
from that code page.
And you can inform the compiler what that is by using #pragma setlocale
(http://msdn2.microsoft.com/en-us/library/3e22ty2t.aspx)

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Aug 17 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
11888
by: Gerson Kurz | last post by:
AAAAAAAARG I hate the way python handles unicode. Here is a nice problem for y'all to enjoy: say you have a variable thats unicode directory = u"c:\temp" Its unicode not because you want it to, but because its for example read from _winreg which returns unicode. You do an os.listdir(directory). Note that all filenames returned are now unicode. (Change introduced I believe in 2.3).
23
2141
by: Michel Claveau - abstraction méta-galactique non | last post by:
Hi ! If Python is Ok with Unicode, why the next script not run ? # -*- coding: utf-8 -*- def режим(toto): return(toto*3)
48
4645
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at once) * regular expression search & replace. Normally my documents are encoded with the ISO setting. Recently I was writing an XHTML document. After changing the encoding to UTF-8 I used the
4
3172
by: Basil | last post by:
Hello. I have compiler BC Builder 6.0. I have an example: #include <strstrea.h> int main () { wchar_t ff = {' s','d ', 'f', 'g', 't'};
7
9010
by: Michael Davis | last post by:
Hi, I've known C/C++ for years, but only ever used ascii strings. I have a client who wants to know how gcc handles unicode. I've found the functions utf8_mbtowc, utf8_mbstowcs, utf8_wctomb and utf8_wcstombs, but I'm wondering if there are any other libraries or functions which can do things like handle different kinds of encodings? Thanks Michael Davis
18
34148
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found Encoding.Convert, but that needs byte arrays. Thanks, /Ger
6
6616
by: John Sidney-Woollett | last post by:
Hi I need to store accented characters in a postgres (7.4) database, and access the data (mostly) using the postgres JDBC driver (from a web app). Does anyone know if: 1) Is there a performance loss using (multibyte) UNICODE vs (single byte) SQL_ASCII/LATINxxx character encoding? (In terms of extra data, and searching/sorting speeds).
6
2263
by: Dennis Gearon | last post by:
This is what has to be eventually done:(as sybase, and probably others do it) http://www.ianywhere.com/whitepapers/unicode.html I'm not sure how that will affect LIKE and REGEX. ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
11
9512
by: George Sakkis | last post by:
The following snippet results in different outcome for (at least) the last three major releases: # Python 2.3.4 u'%94' # Python 2.4.2 UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 0: ordinal not in range(128)
0
9650
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9497
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10363
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10164
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8992
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6748
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4067
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3670
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2894
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.