473,383 Members | 1,762 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

'hello world' OS

Hello all,

I would like to know how an OS makes a computer boot up.
For that, as a start I would like to see an e_ample (read the
underscore as the letter before y, as the keyboard here is
defective) C program (along with instructions on how to compile
and usae it) that will take over when the computer boots and
print "hello world" and reboots the machine when 'enter' is
pressed. This will be something of a "hello world" OS (for me)
(I have tried to go through the boot.S (not sure about the name)
program in the Linu_ kernel source, but I could not understand
it since I am not familiar with assembly language).

I would be glad if someone could please point me in the right
direction.

Regards,
Santanu
Nov 14 '05 #1
26 2466
Santanu Chatterjee wrote:
Hello all,

I would like to know how an OS makes a computer boot up.
For that, as a start I would like to see an e_ample (read the
underscore as the letter before y, as the keyboard here is
defective) C program (along with instructions on how to compile
and usae it) that will take over when the computer boots and
print "hello world" and reboots the machine when 'enter' is
pressed. This will be something of a "hello world" OS (for me)
(I have tried to go through the boot.S (not sure about the name)
program in the Linu_ kernel source, but I could not understand
it since I am not familiar with assembly language).

I would be glad if someone could please point me in the right
direction.


Here's a program that waits for an 'x'. Using any other input
for a program like this is dangerous and not standard C.

#include <stdio.h>

int main(void)
{
printf("hello world\n");
while (getchar() != 'y' - 1)
{
}
}

HTH

Case

Nov 14 '05 #2
sa*****@softhome.net (Santanu Chatterjee) wrote:
I would like to know how an OS makes a computer boot up.


It doesn't. The ROM bootstrap loader does. _How_ it does this is
completely system-dependent, and therefore you should ask about it in a
newsgroup dedicated to the architecture you are interested in. There is
no C program which can demonstrate this for more than a single
architecture, and for most you will need low-level system calls, since
the higher level functions used by ISO C are dependent on the very OS
you want to replace.

Richard
Nov 14 '05 #3
Case <no@no.no> wrote:
Santanu Chatterjee wrote:
I would like to know how an OS makes a computer boot up.
Here's a program that waits for an 'x'. Using any other input
for a program like this is dangerous and not standard C.
#include <stdio.h>

int main(void)
{
printf("hello world\n");
while (getchar() != 'y' - 1)
{
}
}


Beautiful. Not only has it nothing whatsoever to do with the question,
it is even wrong. (Hint: what makes you think the ISO C Standard
mandates ASCII?)

Richard
Nov 14 '05 #4
Richard Bos <rl*@hoekstra-uitgeverij.nl> scribbled the following:
Case <no@no.no> wrote:
Santanu Chatterjee wrote:
> I would like to know how an OS makes a computer boot up. Here's a program that waits for an 'x'. Using any other input
for a program like this is dangerous and not standard C.
#include <stdio.h>

int main(void)
{
printf("hello world\n");
while (getchar() != 'y' - 1)
{
}
}

Beautiful. Not only has it nothing whatsoever to do with the question,
it is even wrong. (Hint: what makes you think the ISO C Standard
mandates ASCII?)


I figure there is no portable way whatsoever to guarantee an integer
value corresponds to the character 'x' without actually using the
character constant
'x'
either by itself, or as part of an array or string, at some point in
the C source code.
For characters corresponding to digits from 0 to 9 it can be done, but
I don't think it can be done for any other characters.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"I will never display my bum in public again."
- Homer Simpson
Nov 14 '05 #5
Joona I Palaste wrote:
I figure there is no portable way whatsoever to guarantee an integer
value corresponds to the character 'x' without actually using the
character constant
'x'
either by itself, or as part of an array or string, at some point in
the C source code.


#include <string.h>
#include <ctype.h>

int is_x(int c) /* C locale */
{
return islower((unsigned char)c)
&& strchr("abcdefghijklmnopqrstuvwyz", c) == NULL;
}

Jeremy.
Nov 14 '05 #6
In <59*************************@posting.google.com> sa*****@softhome.net (Santanu Chatterjee) writes:
I would like to know how an OS makes a computer boot up.
This is beyond the capabilities of any OS. At least the first stages
of the booting procedure are handled by programs that are not OS-specific.
These programs, are, however, heavily platform specific.
For that, as a start I would like to see an e_ample (read the
underscore as the letter before y, as the keyboard here is
defective) C program (along with instructions on how to compile
and usae it) that will take over when the computer boots and
print "hello world" and reboots the machine when 'enter' is
pressed.
There is no way to write such a program in portable C. Because there is
no OS, such a program would have to be written as a freestanding
application, i.e. all its output must be generated by its own means,
with no standard library support. And, with no standard library
support, there is no portable way of generating any output.

Furthermore, such a program may have to do things that cannot be done in
C at all, like setting various CPU registers to appropriate values.
This will be something of a "hello world" OS (for me)
(I have tried to go through the boot.S (not sure about the name)
program in the Linu_ kernel source, but I could not understand
it since I am not familiar with assembly language).

I would be glad if someone could please point me in the right
direction.


The right direction is to start learning assembly programming. It cannot
be bypassed when programming at this level, even if it's merely asm
statements embedded in C code.

And once you learn assembly, you'll discover that you have no need for C
at all for implementing the program you have in mind.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #7
Santanu Chatterjee wrote:
Hello all,

I would like to know how an OS makes a computer boot up. A better place to discuss your issue is in news:comp.os.*.
Another place is news:comp.arch.embedded.

For that, as a start I would like to see an e_ample (read the
underscore as the letter before y, as the keyboard here is
defective) C program (along with instructions on how to compile
and usae it) that will take over when the computer boots and
print "hello world" and reboots the machine when 'enter' is
pressed. This will be something of a "hello world" OS (for me)
(I have tried to go through the boot.S (not sure about the name)
program in the Linu_ kernel source, but I could not understand
it since I am not familiar with assembly language). The boot sequence is platform dependent and usually differs
by platform and operating system.

A common sequence is:
1. Turn off all interrupts
2. Perform diagnostics (i.e. memory testing, device testing)
3. Initialize interrupt and other vectors.
4. Initialize memory structure (including stacks).
5. Initialize 'C' run-time library.
6. Jump to "main" function in C program.

Platforms with more complex operating systems would have different
sequences (and more complicated ones). Much of the boot code is
written in assembly language. The run-time environment for the
high level language must be initialized before a high-level
language can be executed.

I would be glad if someone could please point me in the right
direction.

Regards,
Santanu


On many platforms, your executable program is loaded into memory
and executed by the operating system. Your program has no idea
how long the platform has been operational before your program
is executed.
--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Nov 14 '05 #8
Richard Bos wrote:
Case <no@no.no> wrote:

Santanu Chatterjee wrote:
I would like to know how an OS makes a computer boot up.


Here's a program that waits for an 'x'. Using any other input
for a program like this is dangerous and not standard C.
#include <stdio.h>

int main(void)
{
printf("hello world\n");
while (getchar() != 'y' - 1)
{
}
}

Beautiful. Not only has it nothing whatsoever to do with the question,
it is even wrong. (Hint: what makes you think the ISO C Standard
mandates ASCII?)


The OP is unable to type an 'x' that's why I used his
suggestion to get there. Thanks for pointing out that
it has nothing to do with the question, I did not know
that. And, on his OS, Linu_, ASCII is quit common, if
I'm right; but could you please confirm. BTW, do you
know a real world character set in which 'x' != 'y' - 1?

Case

Nov 14 '05 #9
Jeremy Yallop <je****@jdyallop.freeserve.co.uk> scribbled the following:
Joona I Palaste wrote:
I figure there is no portable way whatsoever to guarantee an integer
value corresponds to the character 'x' without actually using the
character constant
'x'
either by itself, or as part of an array or string, at some point in
the C source code.
#include <string.h>
#include <ctype.h> int is_x(int c) /* C locale */
{
return islower((unsigned char)c)
&& strchr("abcdefghijklmnopqrstuvwyz", c) == NULL;
}


Quite clever. AFAIK, however, this can be only done for one character in
one C program. If we were trying to use such functions to identify *two*
characters, the best we could achieve would be knowing whether the input
is *either* of them, but we couldn't know *which*. If we used those
alphabet strings missing one letter, both letters would show up in each
other's alphabet strings. But if we used only one, missing two letters,
there would be no way to tell which one of them the input was.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"I said 'play as you've never played before', not 'play as IF you've never
played before'!"
- Andy Capp
Nov 14 '05 #10
Case <no@no.no> wrote:
Richard Bos wrote:
Case <no@no.no> wrote:
Santanu Chatterjee wrote:

I would like to know how an OS makes a computer boot up.
Here's a program that waits for an 'x'. Using any other input
for a program like this is dangerous and not standard C.
Beautiful. Not only has it nothing whatsoever to do with the question,
it is even wrong. (Hint: what makes you think the ISO C Standard
mandates ASCII?)


The OP is unable to type an 'x' that's why I used his
suggestion to get there.


Then first of all, you should learn to snip, because you made it look as
though you replied to that whole post; and second, how would a program
that _waits_ for an 'x' help someone who cannot _type_ an 'x'?
And, on his OS, Linu_, ASCII is quit common, if I'm right;
So bloody what? This is comp.lang.c, not comp.lang.c.linux.
BTW, do you know a real world character set in which 'x' != 'y' - 1?


No, but I do know one in which not all letters are consecutive.

Richard
Nov 14 '05 #11
In <cb**********@oravannahka.helsinki.fi> Joona I Palaste <pa*****@cc.helsinki.fi> writes:
I figure there is no portable way whatsoever to guarantee an integer
value corresponds to the character 'x' without actually using the
character constant
'x'
either by itself, or as part of an array or string, at some point in
the C source code.


I was about to say that '\U0078' would do in C99, but it appears to be
a constraint violation: you can't use UCNs for the members of the basic
source character set. Only the ASCII characters that aren't part of the
basic source character set can be represented with UCNs: $, @ and `.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #12
In <40***********************@news.xs4all.nl> Case <no@no.no> writes:
Santanu Chatterjee wrote:
Hello all,

I would like to know how an OS makes a computer boot up.
For that, as a start I would like to see an e_ample (read the
underscore as the letter before y, as the keyboard here is
defective) C program (along with instructions on how to compile
and usae it) that will take over when the computer boots and
print "hello world" and reboots the machine when 'enter' is
pressed. This will be something of a "hello world" OS (for me)
(I have tried to go through the boot.S (not sure about the name)
program in the Linu_ kernel source, but I could not understand
it since I am not familiar with assembly language).

I would be glad if someone could please point me in the right
direction.


Here's a program that waits for an 'x'. Using any other input
for a program like this is dangerous and not standard C.

#include <stdio.h>

int main(void)
{
printf("hello world\n");
while (getchar() != 'y' - 1)
{
}
}


Now explain how is it supposed to work on a freestanding platform, where:

1. The name and interface of the startup function is
implementation-defined.

2. Neither printf nor getchar are available (they typically rely on the
existence of an OS, but there is none in our case).

3. #include <stdio.h> may stop the compilation process with a message like

test.c:1:20: stdio.h: No such file or directory

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #13
In <40****************@news.individual.net> rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:
Case <no@no.no> wrote:
Santanu Chatterjee wrote:
> I would like to know how an OS makes a computer boot up.

Here's a program that waits for an 'x'. Using any other input
for a program like this is dangerous and not standard C.
#include <stdio.h>

int main(void)
{
printf("hello world\n");
while (getchar() != 'y' - 1)
{
}
}


Beautiful. Not only has it nothing whatsoever to do with the question,
it is even wrong. (Hint: what makes you think the ISO C Standard
mandates ASCII?)


It doesn't have to. EBCDIC satisfies the poster's assumption, as well.
Good luck finding a conforming hosted implementation whose execution
character set is not based on (i.e. an extension of) either ASCII
or EBCDIC.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #14
Dan Pop <Da*****@cern.ch> scribbled the following:
In <cb**********@oravannahka.helsinki.fi> Joona I Palaste <pa*****@cc.helsinki.fi> writes:
I figure there is no portable way whatsoever to guarantee an integer
value corresponds to the character 'x' without actually using the
character constant
'x'
either by itself, or as part of an array or string, at some point in
the C source code.
I was about to say that '\U0078' would do in C99, but it appears to be
a constraint violation: you can't use UCNs for the members of the basic
source character set. Only the ASCII characters that aren't part of the
basic source character set can be represented with UCNs: $, @ and `.


Now this question is perhaps off-topic for comp.lang.c, but I don't
understand *why* you can't use UCNs for members of the basic character
set. What is the rationale behind this constraint?

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"The question of copying music from the Internet is like a two-barreled sword."
- Finnish rap artist Ezkimo
Nov 14 '05 #15
Joona I Palaste wrote:
Jeremy Yallop <je****@jdyallop.freeserve.co.uk> scribbled the following:
Joona I Palaste wrote:
I figure there is no portable way whatsoever to guarantee an
integer value corresponds to the character 'x' without actually
using the character constant 'x' either by itself, or as part of
an array or string, at some point in the C source code.

#include <string.h>
#include <ctype.h>

int is_x(int c) /* C locale */
{
return islower((unsigned char)c)
&& strchr("abcdefghijklmnopqrstuvwyz", c) == NULL;
}


Quite clever. AFAIK, however, this can be only done for one character in
one C program. If we were trying to use such functions to identify *two*
characters, the best we could achieve would be knowing whether the input
is *either* of them, but we couldn't know *which*.


Okay, here's another way, which doesn't have that restriction:

if (c == tolower('X'))

Jeremy.
Nov 14 '05 #16
Dan Pop wrote:
In <40***********************@news.xs4all.nl> Case <no@no.no> writes:

Santanu Chatterjee wrote:
Hello all,

I would like to know how an OS makes a computer boot up.
For that, as a start I would like to see an e_ample (read the
underscore as the letter before y, as the keyboard here is
defective) C program (along with instructions on how to compile
and usae it) that will take over when the computer boots and
print "hello world" and reboots the machine when 'enter' is
pressed. This will be something of a "hello world" OS (for me)
(I have tried to go through the boot.S (not sure about the name)
program in the Linu_ kernel source, but I could not understand
it since I am not familiar with assembly language).

I would be glad if someone could please point me in the right
direction.


Here's a program that waits for an 'x'. Using any other input
for a program like this is dangerous and not standard C.
Huh, what can he mean here? And, the OP is not even able to type
an 'x'. The question is, at least a bit, off-topic too. Sorry
guys, and especially a sorry to the OP; I'll try not to do it again.

#include <stdio.h>

int main(void)
{
printf("hello world\n");
while (getchar() != 'y' - 1)
{
}
}

Now explain how is it supposed to work on a freestanding platform, where:

1. The name and interface of the startup function is
implementation-defined.

2. Neither printf nor getchar are available (they typically rely on the
existence of an OS, but there is none in our case).

3. #include <stdio.h> may stop the compilation process with a message like

test.c:1:20: stdio.h: No such file or directory


This point 3. has a causality problem. If you're able to
run a C compiler, a C compiler that is able to print, then
I guess stdio.h will be close enough.

Case

Nov 14 '05 #17
Jeremy Yallop wrote:

Joona I Palaste wrote:
I figure there is no portable way whatsoever to guarantee an integer
value corresponds to the character 'x' without actually using the
character constant
'x'
either by itself, or as part of an array or string, at some point in
the C source code.


#include <string.h>
#include <ctype.h>

int is_x(int c) /* C locale */
{
return islower((unsigned char)c)
&& strchr("abcdefghijklmnopqrstuvwyz", c) == NULL;
}


What happens if the user enters a lowercase accented letter, such as 'á'
(which may or may not show up on your system properly, but is an accented
'a' here)?

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody at spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Nov 14 '05 #18
In <40**********************@dreader2.news.tiscali.nl > Case - <no@no.no> writes:
Dan Pop wrote:
In <40***********************@news.xs4all.nl> Case <no@no.no> writes:

Santanu Chatterjee wrote:

Hello all,

I would like to know how an OS makes a computer boot up.
For that, as a start I would like to see an e_ample (read the
underscore as the letter before y, as the keyboard here is
defective) C program (along with instructions on how to compile
and usae it) that will take over when the computer boots and
print "hello world" and reboots the machine when 'enter' is
pressed. This will be something of a "hello world" OS (for me)
(I have tried to go through the boot.S (not sure about the name)
program in the Linu_ kernel source, but I could not understand
it since I am not familiar with assembly language).

I would be glad if someone could please point me in the right
direction.

Here's a program that waits for an 'x'. Using any other input
for a program like this is dangerous and not standard C.
Huh, what can he mean here? And, the OP is not even able to type
an 'x'. The question is, at least a bit, off-topic too. Sorry
More than a bit. It's downright off-topic.
guys, and especially a sorry to the OP; I'll try not to do it again.

#include <stdio.h>

int main(void)
{
printf("hello world\n");
while (getchar() != 'y' - 1)
{
}
}

Now explain how is it supposed to work on a freestanding platform, where:

1. The name and interface of the startup function is
implementation-defined.

2. Neither printf nor getchar are available (they typically rely on the
existence of an OS, but there is none in our case).

3. #include <stdio.h> may stop the compilation process with a message like

test.c:1:20: stdio.h: No such file or directory


This point 3. has a causality problem. If you're able to
run a C compiler, a C compiler that is able to print, then
I guess stdio.h will be close enough.


I'm afraid I can't make any sense out of your incoherent statement.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #19
In <cb**********@oravannahka.helsinki.fi> Joona I Palaste <pa*****@cc.helsinki.fi> writes:
Now this question is perhaps off-topic for comp.lang.c, but I don't
understand *why* you can't use UCNs for members of the basic character
set. What is the rationale behind this constraint?


I have no clue. Try the C99 rationale or ask in comp.std.c. The relevant
chapter and verse is:

6.4.3 Universal character names
....
Constraints

2 A universal character name shall not specify a character whose
short identifier is less than 00A0 other than 0024 ($), 0040 (@),
or 0060 (`), nor one in the range D800 through DFFF inclusive. 61)

____________________

61) The disallowed characters are the characters in the basic
character set and the code positions reserved by ISO/IEC 10646
for control characters, the character DELETE, and the S-zone
(reserved for use by UTF-16).

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #20
Kenneth Brody wrote:
Jeremy Yallop wrote:

Joona I Palaste wrote:
> I figure there is no portable way whatsoever to guarantee an integer
> value corresponds to the character 'x' without actually using the
> character constant
> 'x'
> either by itself, or as part of an array or string, at some point in
> the C source code.


#include <string.h>
#include <ctype.h>

int is_x(int c) /* C locale */
{
return islower((unsigned char)c)
&& strchr("abcdefghijklmnopqrstuvwyz", c) == NULL;
}


What happens if the user enters a lowercase accented letter, such as 'á'
(which may or may not show up on your system properly, but is an accented
'a' here)?


In the "C" locale, islower() only returns true for the 26 lowercase
letters of the Latin alphabet, so is_x() will return false for 'á'.

Jeremy.
Nov 14 '05 #21

On Wed, 30 Jun 2004, Joona I Palaste wrote:

Dan Pop <Da*****@cern.ch> scribbled the following:
I was about to say that '\U0078' would do in C99, but it appears to be
a constraint violation: you can't use UCNs for the members of the basic
source character set. Only the ASCII characters that aren't part of the
basic source character set can be represented with UCNs: $, @ and `.


Now this question is perhaps off-topic for comp.lang.c, but I don't
understand *why* you can't use UCNs for members of the basic character
set. What is the rationale behind this constraint?


I'm not an authority, but I assume the reason is so that implementations
that don't support extended character sets don't have to implement
anything special to parse UCNs (which I think are new in C99?).

Alternatively, it could be a B&D approach to clarity and portability:
if the only way to write 'x' is to actually use the letter 'x', and not
to use arbitrarily complicated arithmetic, then the maintainer has one
less problem to worry about when porting to an EBCDIC system. ;-)

-Arthur

Nov 14 '05 #22
In <Pi**********************************@unix41.andre w.cmu.edu> "Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:

On Wed, 30 Jun 2004, Joona I Palaste wrote:

Dan Pop <Da*****@cern.ch> scribbled the following:
> I was about to say that '\U0078' would do in C99, but it appears to be
> a constraint violation: you can't use UCNs for the members of the basic
> source character set. Only the ASCII characters that aren't part of the
> basic source character set can be represented with UCNs: $, @ and `.
Now this question is perhaps off-topic for comp.lang.c, but I don't
understand *why* you can't use UCNs for members of the basic character
set. What is the rationale behind this constraint?


I'm not an authority, but I assume the reason is so that implementations
that don't support extended character sets don't have to implement
anything special to parse UCNs (which I think are new in C99?).


Wrong. UCN support is mandatory:

6.4.2 Identifiers

6.4.2.1 General

Syntax

1 identifier:
identifier-nondigit
identifier identifier-nondigit
identifier digit

identifier-nondigit:
nondigit
universal-character-name
other implementation-defined characters

nondigit: one of
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

digit: one of
0 1 2 3 4 5 6 7 8 9

An interesting consequence is that f$ is not a valid identifier, but
it becomes a valid identifier if the $ sign is replaced by its UCN:
f\u0024 !
Alternatively, it could be a B&D approach to clarity and portability:
if the only way to write 'x' is to actually use the letter 'x', and not
to use arbitrarily complicated arithmetic, then the maintainer has one
less problem to worry about when porting to an EBCDIC system. ;-)


Wrong again: UCNs have nothing to do with ASCII vs EBCDIC issues.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #23
Da*****@cern.ch (Dan Pop) writes:
[...]
An interesting consequence is that f$ is not a valid identifier, but
it becomes a valid identifier if the $ sign is replaced by its UCN:
f\u0024 !


I don't think so.

C99 6.4.2.1p3 says:

Each universal character name in an identifier shall designate a
character whose encoding in ISO/IEC 10646 falls into one of the
ranges specified in annex D.

The encoding of '$', 0024, is not within one of the ranges specified
in annex D.

Interestingly, the "shall" in 6.4.2.1p3 is not in a constraint, so
using f\u0024 as an identifier invokes undefined behavior (it doesn't
violate a syntax rule either). I wonder if that was the intent. It
seems to me that it would make more sense for it to be a constraint
violation, requiring a diagnostic. If I'm not mistaken, a conforming
implementation could simply ignore annex D and allow any arbitrary
UCNs in identifiers. (That doesn't make f\u0024 a valid identifier,
it just means the implemention isn't required to diagnose it.)

Another possible oversight: the same paragraph also says

The initial character shall not be a universal character name
designating a digit.

but there's no specification in annex D of which UCNs specify digits.
Presumably ISO/IEC 10646 covers that, but it would be useful to spell
it out in the C standard, perhaps in a footnote.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #24

On Thu, 1 Jul 2004, Keith Thompson wrote:

Da*****@cern.ch (Dan Pop) writes:
[...]
An interesting consequence is that f$ is not a valid identifier, but
it becomes a valid identifier if the $ sign is replaced by its UCN:
f\u0024 !
I don't think so.

[...] The encoding of '$', 0024, is not within one of the ranges specified
in annex D.

Interestingly, the "shall" in 6.4.2.1p3 is not in a constraint, so
using f\u0024 as an identifier invokes undefined behavior (it doesn't
violate a syntax rule either). I wonder if that was the intent. It
seems to me that it would make more sense for it to be a constraint
violation, requiring a diagnostic. If I'm not mistaken, a conforming
implementation could simply ignore annex D and allow any arbitrary
UCNs in identifiers. (That doesn't make f\u0024 a valid identifier,
it just means the implemention isn't required to diagnose it.)
I was wrong about implementations' being allowed to not-support UCNs
(all conforming implementations must, I think). But the passage to
which you're referring does seem to support the general conclusion that
UCNs were added grudgingly: there are a lot of other places where
dubious use of UCNs leads to UB rather than a constraint violation
(a couple of places in the preprocessing stages, for example). I
think this is because maybe the Committee realized that nobody was
going to build in full "Unicode"[1] support just for the benefit of
anal-retentive users.
(Non-USAnians may have a better idea, but I'm under the impression that
\u4E00 looks like "backslash, letter u, 4, E, 0, 0" in all major IDEs, so
there's no good reason to use UCNs in C code except inside string literals
anyway. It doesn't let you "write code in your own language" or
anything.)
Another possible oversight: the same paragraph also says

The initial character shall not be a universal character name
designating a digit.

but there's no specification in annex D of which UCNs specify digits.
Presumably ISO/IEC 10646 covers that, but it would be useful to spell
it out in the C standard, perhaps in a footnote.


I thought one of the sections in Annex D was labeled "Extended Digits"
or something like that?

-Arthur

Nov 14 '05 #25
"Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:
On Thu, 1 Jul 2004, Keith Thompson wrote: [...] I was wrong about implementations' being allowed to not-support UCNs
(all conforming implementations must, I think). But the passage to
which you're referring does seem to support the general conclusion that
UCNs were added grudgingly: there are a lot of other places where
dubious use of UCNs leads to UB rather than a constraint violation
(a couple of places in the preprocessing stages, for example). I
think this is because maybe the Committee realized that nobody was
going to build in full "Unicode"[1] support just for the benefit of
anal-retentive users.
(Non-USAnians may have a better idea, but I'm under the impression that
\u4E00 looks like "backslash, letter u, 4, E, 0, 0" in all major IDEs, so
there's no good reason to use UCNs in C code except inside string literals
anyway. It doesn't let you "write code in your own language" or
anything.)


Presumably the intent is to allow programmers to use native characters
in identifiers; nobody is expected to write "\u4E00".

In translation phase 1:

Physical source file multibyte characters are mapped, in an
implementation-defined manner, to the source character set ...

I think the sequence "\u4E00" is normally expected to occur only after
translation phase 1; in the actual source file, it should look like
the corresponding Asian ideograph. As the rationale says:

Given the current state of multibyte encodings, this mapping is
specified to be implementation-defined; but an implementation can
provide the users with utility programs that do the conversion
from UCNs to "native" multibytes or vice versa, thus providing a
way to exchange source files between implementations using the UCN
notation.

UCNs are similar to trigraphs, but they seem to work in the opposite
direction. Phase 1 maps trigraphs to their legible single-character
equivalents, but it (optionally?) maps legible native characters to
their illegible UCN equivalents. Trigraphs are intended to be used in
human-readable source code (believe it or not); UCNs are not.

Of course UCNs can be used in source code if the programmer is
sufficiently masochistic; in that case, phase 1 presumably will pass
them through unchanged.

It's quite possible that I've misunderstood this. None of the
characters that require UCNs to represent them appear on my keyboard,
so I don't have much experience with this kind of thing. Corrections
are welcome.
Another possible oversight: the same paragraph also says

The initial character shall not be a universal character name
designating a digit.

but there's no specification in annex D of which UCNs specify digits.
Presumably ISO/IEC 10646 covers that, but it would be useful to spell
it out in the C standard, perhaps in a footnote.


I thought one of the sections in Annex D was labeled "Extended Digits"
or something like that?


You're right. Annex D is two pages long; the last two sections at the
bottom of the second page are "Digits" and "Special characters".
(There's no other mention of "special characters", so I suppose they
can be used in identifiers as if they were letters.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #26
In <ln************@nuthaus.mib.org> Keith Thompson <ks***@mib.org> writes:
Da*****@cern.ch (Dan Pop) writes:
[...]
An interesting consequence is that f$ is not a valid identifier, but
it becomes a valid identifier if the $ sign is replaced by its UCN:
f\u0024 !


I don't think so.

C99 6.4.2.1p3 says:

Each universal character name in an identifier shall designate a
character whose encoding in ISO/IEC 10646 falls into one of the
ranges specified in annex D.

The encoding of '$', 0024, is not within one of the ranges specified
in annex D.


Good point! So, \u0024 can appear only in character constants and
string literals, as expected.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #27

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

24
by: Andy Sutorius | last post by:
Has anyone successfully created a Hello World program without using Visual Studio.NET? If so, what IDE did you use and what namespaces did you import? Thanks! Andy Sutorius
2
by: ME | last post by:
I am trying to find a prewritten method for converting a string like this:"Hello World" to this "Hello\sWorld". I plan to use it to build a regular expression. Specifically I am looking for a...
17
by: Tim Judd | last post by:
I must not be grasping anything here. Just a simple application, as follows (as any first-program is...) ---- #include <iostream.h> int main() { cout <<"hello world"; return 0;
6
by: RC | last post by:
Hello World, I am try do call a JavaScript function from XSLT, but I got function not avaible error. See "????" below. Would someone out there tell me how? Thank Q! <xsl:stylesheet...
8
by: vijay | last post by:
Hello, As the subject suggests, I need to print the string in the reverse order. I made the following program: # include<stdio.h> struct llnode { char *info;
4
by: arnuld | last post by:
i am learning C and doing the exercise 1-1 of K&R2, where K&R ask to remove some parts of programme and experiment with error, so here i go: #include <stdio.h> int main () { printf('hello...
0
by: C.W.Holeman II | last post by:
As K&R state the hardest part is getting a first instance to work. So I am looking for a "hello, world!" example for adding an additional element to an XHTML file. <html> <head><title>Hello,...
1
by: James T. Dennis | last post by:
You'd think that using things like gettext would be easy. Superficially it seems well documented in the Library Reference(*). However, it can be surprisingly difficult to get the external details...
11
by: cj | last post by:
Perhaps someone knows how to do this. When I open a new ASP.NET web service application in VS 2008, it opens with a simple Hello World web service already written. I want to see this work. ...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.