473,409 Members | 1,934 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,409 software developers and data experts.

character sets

1) Am I correct that C++ does not have a defined character set? In
particular, a platform might not use the ASCII character set?

2) C++ supports wchar_t types. But again, this has no defined
character set? For instance, it might not be a unicode character set?

Jun 24 '07 #1
9 2896

"jraul" <jr*******@yahoo.comwrote in message
news:11**********************@e16g2000pri.googlegr oups.com...
1) Am I correct that C++ does not have a defined character set?
It imposes a requirement that certain characters
exist in the source and execution character sets,
but no, it does not mandate any particular set.
In
particular, a platform might not use the ASCII character set?
Correct. E.g. many/most IBM systems use EBCDIC
>
2) C++ supports wchar_t types.
Correct; this allows a larger number of characters than
possible with the minimum eight-bit sized type 'char'.
>But again, this has no defined
character set?
No.
For instance, it might not be a unicode character set?
Correct.

-Mike
Jun 24 '07 #2

jraul <jr*******@yahoo.comwrote in message...
1) Am I correct that C++ does not have a defined character set? In
particular, a platform might not use the ASCII character set?
Yup and yup. (that's yes and yes in yupese. <G>)
>
2) C++ supports wchar_t types. But again, this has no defined
character set? For instance, it might not be a unicode character set?
If you compile a C++ program, run it, the system runs your program in a
'console' (assumed). It's the 'console' that has the char set, AFAIK.

Think about it. What would a char set be used for? Output to a screen (CRT)?
C++ does not know anything about a screen, keyboard, mouse, filesystem,
etc.. Those are supplied by libraries. Try writing anything[1] in C++
*without* any '#include' in it, and get IO. You can 'crunch' numbers, but
you won't be able to *see* the results (....unless it's a toaster. <G).

[1] - ..except an IO module. <G>
--
Bob R
POVrookie
Jun 25 '07 #3
BobR wrote:
>
jraul <jr*******@yahoo.comwrote in message...
>1) Am I correct that C++ does not have a defined character set? In
particular, a platform might not use the ASCII character set?

Yup and yup. (that's yes and yes in yupese. <G>)
>>
2) C++ supports wchar_t types. But again, this has no defined
character set? For instance, it might not be a unicode character set?

If you compile a C++ program, run it, the system runs your program in a
'console' (assumed). It's the 'console' that has the char set, AFAIK.

Think about it. What would a char set be used for?
Knowing the difference between printable and unprintable characters?
Letters? #include <cctype>? <locale>? C++ has standard libraries that can
tell you a lot about supported char sets on your platform.
Output to a screen
(CRT)? C++ does not know anything about a screen, keyboard, mouse,
filesystem, etc.. Those are supplied by libraries.
Correct, but the standard in, standard out and standard error file streams
must be open on a hosted implementation, nevertheless. And when using them,
one should know what the output will be.
Try writing anything[1]
in C++ *without* any '#include' in it, and get IO. You can 'crunch'
numbers, but you won't be able to *see* the results (....unless it's a
toaster. <G).
extern "C" int printf(const char*,...);
int main()
{
printf("hello, world\n");
}

It only requires a library function named printf. This is not portable, but
it works on my g++. But why would you? The standard headers _is_ part of
C++, and the standard library is there for reason.

--
rbh
Jun 25 '07 #4

Robert Bauck Hamar <ro**********@ifi.uio.nowrote in message...
BobR wrote:
If you compile a C++ program, run it, the system runs your program in a
'console' (assumed). It's the 'console' that has the char set, AFAIK.
Think about it. What would a char set be used for?

Knowing the difference between printable and unprintable characters?
Letters? #include <cctype>? <locale>? C++ has standard libraries that can
tell you a lot about supported char sets on your platform.
Ahem, let's try that again <G>:
Think about it. What would a char set be used for, output to a screen
(CRT)? (ya' know, like theoretical question.)

And, on this device with no CRT, no keyboard/pad, how is the built in
character set used?
>
Try writing anything[1]
in C++ *without* any '#include' in it, and get IO. You can 'crunch'
numbers, but you won't be able to *see* the results (....unless it's a
toaster. <G).

extern "C" int printf(const char*,...);
int main(){
printf("hello, world\n");
}

It only requires a library function named printf. This is not portable,
but
it works on my g++. But why would you? The standard headers _is_ part of
C++, and the standard library is there for reason.
Oh. So, you're saying that the standard library provides an 'character set'.
I see. I get it now. Silly me, I'v been includeing <iostreamfor nothing! I
could just use 'printf' in my GUI apps. Right?
Then how do you tell 'printf' what char set to use?

But, that's a 'library written in the language', not 'the language'. Or is
it the other way around?

Wow, ya' learn something new every day.
--
Bob R
POVrookie
Jun 25 '07 #5
On Jun 25, 12:48 pm, Robert Bauck Hamar <roberth+n...@ifi.uio.no>
wrote:
>
extern "C" int printf(const char*,...);
int main()
{
printf("hello, world\n");

}

It only requires a library function named printf. This is not portable,
What is not portable about it? The standard specifies
that printf has just that signature.

Jun 25 '07 #6
On Jun 25, 2:46 pm, "BobR" <removeBadB...@worldnet.att.netwrote:
Robert Bauck Hamar <roberth+n...@ifi.uio.nowrote in message...
Knowing the difference between printable and unprintable characters?
Letters? #include <cctype>? <locale>? C++ has standard libraries that can
tell you a lot about supported char sets on your platform.

Ahem, let's try that again <G>:
Think about it. What would a char set be used for, output to a screen
(CRT)? (ya' know, like theoretical question.)

And, on this device with no CRT, no keyboard/pad, how is the built in
character set used?
Could be any number of uses. Storing character
strings read in from files or other storage,
for example.

BTW, many devices have non-CRT displays (e.g. LCD panels).
Oh. So, you're saying that the standard library provides an 'character set'.
I see. I get it now. Silly me,
The C++ language provides a character set. This is
formally known as 'the execution character set'.
I'v been includeing <iostreamfor nothing! I
could just use 'printf' in my GUI apps.
You certainly could, although this has nothing to
do with character sets.
Right? Then how do you tell 'printf' what char set to use?
I guess you mean: how do you tell printf which
locale to use? If so, then the answer is: call
the setlocale() function.
Wow, ya' learn something new every day.
Indeed ya' do.

Jun 25 '07 #7
On Jun 25, 2:25 am, "BobR" <removeBadB...@worldnet.att.netwrote:
jraul <jrauli...@yahoo.comwrote in message...
1) Am I correct that C++ does not have a defined character set? In
particular, a platform might not use the ASCII character set?
Yup and yup. (that's yes and yes in yupese. <G>)
In fact, some platforms don't. Windows, for example, or Linux,
or most Unices. Some platforms don't even use an encoding which
is a superset of ASCII: IBM mainframes use EBCDIC, for example.

All you're guaranteed is that:

-- a certain number of characters (known as the basic character
set) are present,

-- the digits (but not necessarily the upper or lower case
letters) are successive, and in ascending order, and

-- no character in the basic character set will be negative
when stored in a char (but this doesn't hold for characters
in the extended characters set).

In addition, the actual run-time character set can change
depending on the locale. Which can play havoc with e.g. string
literals (which don't appear like they do in the code).
2) C++ supports wchar_t types. But again, this has no defined
character set? For instance, it might not be a unicode character set?
And often isn't, for historical reasons. Even when it is
Unicode, sometimes it's UTF-16, other times UTF-32.
If you compile a C++ program, run it, the system runs your
program in a 'console' (assumed). It's the 'console' that has
the char set, AFAIK.
It's significantly more complicated than that: as you correctly
observer, the "characters" are interpreted by many different
components, some of which are completely independant of your
code: in a string literal, the apparent character will probably
depend on the encoding of the font you use in the editor, when
you write the code. Unless the editor is compensating in some
way---most editors will allow using one encoding for display,
and another when writing to the file. After that, the compiler
might remap some of the characters, according to its ideas as to
what the "default" execution code set is, compared to the code
set it's reading. (At present, I don't think many compilers
actually do this. But it could make a lot of sense for
cross-compilers.) Until this point, of course, we're only
concerned with string literals and character constants. At
runtime, how the program interprets characters internally (i.e.
things like isupper) depends on the current locale; in C++, this
means that it can be different for different files. Once you've
output the character (say 0xE9---a "Latin small letter e with
grave accent" in ISO 8859-1), of course, you have no more
control over how it is interpreted; if you output to the
console, it will depend on the codeset the current console font
is using, which is pretty much out of the control of your
program. (I think Windows calls this a codepage.) If you
output it to a file, and copy the file into a console window
sometime later (the "cat" command under Unix and most advanced
Windows command interpreters, "type" in the default Windows
command interpreter), it will depend on the font being used in
the console window at the time you execute the command. (At
least under X, it's possible to have two different console
windows using different fonts, with different encodings, running
at the same time. So the "character" you see will depend on
which window you look at the file in.) Copy the file to the
printer, of course, and the character will depend on the font
used by the printer.
Think about it. What would a char set be used for? Output to a screen (CRT)?
C++ does not know anything about a screen, keyboard, mouse, filesystem,
etc.. Those are supplied by libraries. Try writing anything[1] in C++
*without* any '#include' in it, and get IO.
The language does define a certain number of includes, including
<iostreamand <fstream>. So there is support for IO in the
language. The semantics, on the other hand, are very, very
loosely defined; more a suggestion of an intent than an actual
definition. Probably because C++ can't affect much of this.

--
James Kanze (GABI Software, from CAI) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 25 '07 #8
On Jun 25, 4:46 am, "BobR" <removeBadB...@worldnet.att.netwrote:
Robert Bauck Hamar <roberth+n...@ifi.uio.nowrote in message...
BobR wrote:
If you compile a C++ program, run it, the system runs your
program in a 'console' (assumed). It's the 'console' that
has the char set, AFAIK. Think about it. What would a
char set be used for?
Knowing the difference between printable and unprintable
characters? Letters? #include <cctype>? <locale>? C++ has
standard libraries that can tell you a lot about supported
char sets on your platform.
But it can't guarantee that the supported character set you're
asking about is the one which will be used for display.
Ahem, let's try that again <G>:
Think about it. What would a char set be used for, output to a screen
(CRT)? (ya' know, like theoretical question.)
And, on this device with no CRT, no keyboard/pad, how is the
built in character set used?
The standard does define the concept of an "interactive device".
It also distinguishes between hosted and free-standing
implementations; a free-standing implementation isn't required
to support standard IO, but a hosted one is.
Try writing anything[1]
in C++ *without* any '#include' in it, and get IO. You can 'crunch'
numbers, but you won't be able to *see* the results (....unless it's a
toaster. <G).
extern "C" int printf(const char*,...);
int main(){
printf("hello, world\n");
}
It only requires a library function named printf. This is not portable,
Only in that it isn't defined whether printf is `extern "C"' or
not. It would be a 100% portable C program.
but it works on my g++. But why would you? The standard
headers _is_ part of C++, and the standard library is there
for reason.
Oh. So, you're saying that the standard library provides an
'character set'.
The language requires a "character set".
I see. I get it now. Silly me, I'v been includeing <iostream>
for nothing!
What's you're point. The standard says you have to include
<iostreamin order to use the symbol std::cout, and a number of
other things.
I could just use 'printf' in my GUI apps.
Who knows. C++ doesn't speak of GUI's. But that doesn't mean
that it doesn't consider the issue of character sets, both
compile time and run-time. Otherwise, things like string
literals and character constants wouldn't make sense.
Right?
Then how do you tell 'printf' what char set to use?
Are you trying to be intentionally stupid, or do you just not
know the language or understand this issues?
But, that's a 'library written in the language', not 'the
language'. Or is it the other way around?
The standard library is part of the language. As are string
literals and character constants. And concepts like the "basic
execution character set" and the "extended execution character
set".

And the issues surrounding character encoding are extremely
complex, because elements outside the language, over which C++
has no control, do come into play. (I can start a new console
Window using Zapf dingbats on my system. The C++
implementations I have access to don't have a locale which
supports it, probably because the character set doesn't even
support the basic characters required by the C++ standard.)

--
James Kanze (GABI Software, from CAI) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 25 '07 #9
Old Wolf wrote:
On Jun 25, 12:48 pm, Robert Bauck Hamar <roberth+n...@ifi.uio.no>
wrote:
>>
extern "C" int printf(const char*,...);
int main()
{
printf("hello, world\n");

}

It only requires a library function named printf. This is not portable,

What is not portable about it? The standard specifies
that printf has just that signature.
AFAIK, the standard specifies printf to be part of namespace std, and that
it implementation-defined whether the linkage of printf is C or C++
(§17.4.2.2). The standard recommends C++ linkage , but printf is reserved
to the implementation in the global namespace as an external symbol with C
linkage(§17.4.3.1.3).

I believe this is not portable, but it would often work, as C++ compilers
often link with the platform's C library. But there is no guarantee there
exists a library function named printf with C linkage in the global
namespace.

--
rbh
Jun 25 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Thom McGrath | last post by:
I have a text area that people should type in (duh) which will later be displayed for other users via HTML. I've taken care of the HTML aspect in a pretty cool way, but I worry about character sets...
7
by: WindAndWaves | last post by:
Hi Folk Here I am writing my first php / mysql site, almost ready, and now this... charactersets.... The encoding that I use on my webpage is: <META HTTP-EQUIV="content-type"...
0
by: Thiko | last post by:
Hi According to the official mysql manual: http://www.mysql.com/doc/en/Charset-SHOW-CHARSET.html The syntax to show all available character sets is the SHOW CHARACTER SET command. It takes...
19
by: Ian | last post by:
I'm using the following meta tag with my documents: <meta http-equiv="Content-Type" content= "text/html; charset=us-ascii" /> and yet using character entities like &rsquo; and &mdash; It...
5
by: PEK | last post by:
I need some code that convert a multi-byte string to a Unicode string, and Unicode to multi-byte. I work mostly in Windows and know how to solve it there, but I would like to have some platform...
4
by: siliconmike | last post by:
All I know is that there are 8 bit numbers from 0 to 255 mapped to characters like A, B, C, D and some strange looking ones (like the ones used to make boxes in old PC text modes) all these being...
1
by: Vishal | last post by:
Hello! My client has a need to be able to store Japanese characters in their PeopleSoft database. So we need to change the character set from from Latin1_General (1252) to Japanese character set...
37
by: chandy | last post by:
Hi, I have an Html document that declares that it uses the utf-8 character set. As this document is editable via a web interface I need to make sure than high-ascii characters that may be...
21
by: aegis | last post by:
7.4#1 states The header <ctype.h> declares several functions useful for classifying and mapping characters.166) In all cases the argument is an int, the value of which shall be representable as an...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.