Interesting coding idea

Bruno R. Dias

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

And what about a fictional computer, such as one that works on an
entirely different way (such as a non-binary computer)?

It wouldn't be very useful, but it wouold be a very fun and very
interesting thing to hack on.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/E/L d-- s+:+ a--- C++ UL+ P--- L++>+++ E W++ N+ o+ K++ w---
!O M-- V--PS++ PE++ Y>+ PGP>+ t++(+++) 5? X R+ tv@ b+++@ DI++++ D--- G+
e- h! r-- y
------END GEEK CODE BLOCK------

Nov 14 '05 #1

Subscribe Post Reply

1991

Nils O. Selåsdal

On Sat, 23 Oct 2004 18:54:55 -0300, Bruno R. Dias wrote:

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language. http://www.aracnet.com/~healyzh/decemu.html
(There are emulators/virtual machines for most ancient computers out there
as well ;) And what about a fictional computer, such as one that works on an entirely different way (such as a non-binary computer)?

Well, gcc supports mmix
http://www-cs-faculty.stanford.edu/~knuth/mmix.html , and something in the
same area; http://tph.tuwien.ac.at/~oemer/qcl.html

Nov 14 '05 #2

Jack Klein

On Sat, 23 Oct 2004 18:54:55 -0300, "Bruno R. Dias"
<br***@octantis.com.br> wrote in comp.lang.c:

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

And what about a fictional computer, such as one that works on an
entirely different way (such as a non-binary computer)?

It wouldn't be very useful, but it wouold be a very fun and very
interesting thing to hack on.

And what exactly does this have to do with the C language? I suspect
it is equally off-topic in comp.lang.python.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Nov 14 '05 #3

Andrew Dalke

Bruno R. Dias wrote:

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7).

Google "PDP-7 emulator".

http://www.aracnet.com/~healyzh/pdp7emu.html
And this is off-topic to comp.lang.python

Andrew
da***@dalkescientific.com

Nov 14 '05 #4

Bruno R. Dias

Jack Klein wrote:

On Sat, 23 Oct 2004 18:54:55 -0300, "Bruno R. Dias"
<br***@octantis.com.br> wrote in comp.lang.c:

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

And what about a fictional computer, such as one that works on an
entirely different way (such as a non-binary computer)?

It wouldn't be very useful, but it wouold be a very fun and very
interesting thing to hack on.

And what exactly does this have to do with the C language? I suspect
it is equally off-topic in comp.lang.python.

It would be *programmed* in a language, obviously. It's just that C is
rather appropriate for that kind of stuff, It's one of my favorite
languages, and It's a subject that should interest C programmers. The
same goes for Python.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/E/L d-- s+:+ a--- C++ UL+ P--- L++>+++ E W++ N+ o+ K++ w---
!O M-- V--PS++ PE++ Y>+ PGP>+ t++(+++) 5? X R+ tv@ b+++@ DI++++ D--- G+
e- h! r-- y
------END GEEK CODE BLOCK------

Nov 14 '05 #5

Bruno R. Dias

Nils O. Selåsdal wrote:

On Sat, 23 Oct 2004 18:54:55 -0300, Bruno R. Dias wrote:

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

http://www.aracnet.com/~healyzh/decemu.html
(There are emulators/virtual machines for most ancient computers out there
as well ;)
And what about a fictional computer, such as one that works on

an
entirely different way (such as a non-binary computer)?

Well, gcc supports mmix
http://www-cs-faculty.stanford.edu/~knuth/mmix.html , and something in the
same area; http://tph.tuwien.ac.at/~oemer/qcl.html

Thanks a lot, but the two last links don't work. :-)

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/E/L d-- s+:+ a--- C++ UL+ P--- L++>+++ E W++ N+ o+ K++ w---
!O M-- V--PS++ PE++ Y>+ PGP>+ t++(+++) 5? X R+ tv@ b+++@ DI++++ D--- G+
e- h! r-- y
------END GEEK CODE BLOCK------

Nov 14 '05 #6

Paul Foley

On Sat, 23 Oct 2004 18:54:55 -0300, Bruno R Dias wrote:

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

http://simh.trailing-edge.com

--
Malum est consilium quod mutari non potest -- Publilius Syrus

(setq reply-to
(concatenate 'string "Paul Foley " "<mycroft" '(#\@) "actrix.gen.nz>"))

Nov 14 '05 #7

CBFalconer

Jack Klein wrote:

<br***@octantis.com.br> wrote in comp.lang.c:
Perhaps it would be interesting to program a virtual machine
simulating an ancient computer (such as the pdp-7). Then, it
would be rather interesting to code for it (porting gcc to it
maybe?). I think it would be fun to play with the long-forgotten
art of coding in machine language.

And what about a fictional computer, such as one that works on
an entirely different way (such as a non-binary computer)?

It wouldn't be very useful, but it wouold be a very fun and
very interesting thing to hack on.

And what exactly does this have to do with the C language? I
suspect it is equally off-topic in comp.lang.python.

If he moves to alt.folklore.computers, he will find plenty of
people who have programmed such beasts, and even be on-topic.
Follow-ups set.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #8

Malcolm

"Bruno R. Dias" <br***@octantis.com.br> wrote

And what about a fictional computer, such as one that works on an
entirely different way (such as a non-binary computer)?

It would be *programmed* in a language, obviously. It's just that C is
rather appropriate for that kind of stuff, It's one of my favorite
languages, and It's a subject that should interest C programmers. The
same goes for Python.

Just because a program could be implemented in C doesn't make it on-topic
for comp.lang.c. However "is C the the most appropriate language for this
program?" is probably topical.

There are plenty of emulators out there, an emulator is not an especially
difficult program to write, and it is often useful. For instance if you want
to play 80's vintage Spectrum games from the comfort of your PC it is
possible using emulation software and program dumps (it is illegal to sell
such dumps unless you own the copyright, it is OK to make a copy of a game
you own for personal use, taking a copy from a friend without payment is a
grey area).

An interesting project would be a Fibonnaci computer. Instead of using a
exponent-based system (binary, decimal, hex etc) you represent numbers as
Fibonnaci sequences. This has some interesting properties, for instance
there are never two consecutive 1s in a valid number.

Nov 14 '05 #9

Bruno R. Dias

Malcolm wrote:
<snip>

An interesting project would be a Fibonnaci computer. Instead of using a
exponent-based system (binary, decimal, hex etc) you represent numbers as
Fibonnaci sequences. This has some interesting properties, for instance
there are never two consecutive 1s in a valid number.

It would be a bitch to code for such a machine, but it sure would be
interesting.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/E/L d-- s+:+ a--- C++ UL+ P--- L++>+++ E W++ N+ o+ K++ w---
!O M-- V--PS++ PE++ Y>+ PGP>+ t++(+++) 5? X R+ tv@ b+++@ DI++++ D--- G+
e- h! r-- y
------END GEEK CODE BLOCK------

Nov 14 '05 #10

Nils O. Selåsdal

>> Well, gcc supports mmix

http://www-cs-faculty.stanford.edu/~knuth/mmix.html , and something in
the
same area; http://tph.tuwien.ac.at/~oemer/qcl.html

Thanks a lot, but the two last links don't work. :-)

Works for me, something broken on your side ;)

--
Nils O. Selåsdal
www.utelsystems.com

Nov 14 '05 #11

Dave Vandervies

[comp.lang.python trimmed from crosspost list]

In article <2t*************@uni-berlin.de>,
Bruno R. Dias <br***@octantis.com.br> wrote:

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

And what about a fictional computer, such as one that works on an
entirely different way (such as a non-binary computer)?

It wouldn't be very useful, but it wouold be a very fun and very
interesting thing to hack on.

I'm reading this in comp.programming, but for the comp.lang.c readers,
here's a not entirely off-topic idea:

Why not build a DeathStation simulator?

Create a VM that allows aggressive testing for bad (especially not-
well-defined) code, and a compiler targeting it that optimizes for
checkability rather than performance or size.

Obviously you'd need pointers to be more than just a memory address
(segment/offset/size would work, with pointer arithmetic results
checked to make sure the offset is inside the segment; this would add
checkability for no-longer-valid (free()d or old automatic) segments).
This would also let us trap on invalid int-to-pointer conversions (and
possibly on invalid pointer-to-pointer conversions if it's done right).

If we've got heavyweight segments anyways, we can add an "initialized"
flag and trap on access-to-uninitialized-memory. Possibly even
arbitrarily set uninitialized bytes read as unsigned char to random values
(or values that are invalid for whatever other data is there - can we get
away with having non-mallocd memory typed? unions might be a problem).

Having the VM recognize sequence points would also let it trap assorted
types of undefined behavior that typically go unrecognized until they
cause bugs.

Standard library calls could check (with implementation magic) their
arguments and warn at runtime if, they're given bad arguments (f'rexample,
if they're given a buffer that's smaller than the buffer size argument,
so this:
char buf[10];
fgets(buf,20,stdin);
would produce a warning when it runs, in addition to trapping if the
input overflows the buffer).
Other thoughts?
Anybody with enough compiler/VM experience to comment intelligently on
just how much work this would be?
dave

--
Dave Vandervies dj******@csclub.uwaterloo.ca

Some of us even have a grip on reality. This is a university, remember?
--Giles Malet in uw.general

Nov 14 '05 #12

boa

Dave Vandervies wrote:

[comp.lang.python trimmed from crosspost list]

In article <2t*************@uni-berlin.de>,
Bruno R. Dias <br***@octantis.com.br> wrote:
Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

And what about a fictional computer, such as one that works on an
entirely different way (such as a non-binary computer)?

It wouldn't be very useful, but it wouold be a very fun and very
interesting thing to hack on.

I'm reading this in comp.programming, but for the comp.lang.c readers,
here's a not entirely off-topic idea:

Why not build a DeathStation simulator?

Create a VM that allows aggressive testing for bad (especially not-
well-defined) code, and a compiler targeting it that optimizes for
checkability rather than performance or size.

Obviously you'd need pointers to be more than just a memory address
(segment/offset/size would work, with pointer arithmetic results
checked to make sure the offset is inside the segment; this would add
checkability for no-longer-valid (free()d or old automatic) segments).
This would also let us trap on invalid int-to-pointer conversions (and
possibly on invalid pointer-to-pointer conversions if it's done right).

If we've got heavyweight segments anyways, we can add an "initialized"
flag and trap on access-to-uninitialized-memory. Possibly even
arbitrarily set uninitialized bytes read as unsigned char to random values
(or values that are invalid for whatever other data is there - can we get
away with having non-mallocd memory typed? unions might be a problem).

Having the VM recognize sequence points would also let it trap assorted
types of undefined behavior that typically go unrecognized until they
cause bugs.

Standard library calls could check (with implementation magic) their
arguments and warn at runtime if, they're given bad arguments (f'rexample,
if they're given a buffer that's smaller than the buffer size argument,
so this:
char buf[10];
fgets(buf,20,stdin);
would produce a warning when it runs, in addition to trapping if the
input overflows the buffer).
Other thoughts?
Anybody with enough compiler/VM experience to comment intelligently on
just how much work this would be?
dave

It exists already and is called valgrind ;-)

boa

Nov 14 '05 #13

Dave Vandervies

In article <b%*******************@juliett.dax.net>,
boa <ro**@localhost.com> wrote:

Dave Vandervies wrote:
Why not build a DeathStation simulator?

Create a VM that allows aggressive testing for bad (especially not-
well-defined) code, and a compiler targeting it that optimizes for
checkability rather than performance or size.
[Snip a few ideas]
Other thoughts?

It exists already and is called valgrind ;-)

That would be this valgrind (first hit on Google)?
"Valgrind, an open-source memory debugger for x86-linux"

How many of these will valgrind catch?
--------
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;
--------
int *foo(int *dummy)
{
int i;
return &i;
}
int bar(int *p)
{
int dummy=42;
return *p;
}

/*In a function somewhere*/
/*Ideally, we want to warn here, when a no-longer-valid pointer is stored
in a variable (or, better, immediately on return from foo when the
storage the pointer points at goes away)
*/
p=foo(&i);
/*p is an invalid pointer; typical stack-using implementations will
have it pointing at the dummy int in bar()
*/
i=bar(p);
--------
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------

Without knowing anything other than that it's a memory debugger, I'd give
part marks for the third one (no warning if the buffer doesn't overflow)
and a small chance at catching the second one.

Of course, it looks like it won't even get there if I try to run it on
a Mac.

So, quite obviously not what I was thinking of.
dave

--
Dave Vandervies dj******@csclub.uwaterloo.ca
I suspect all the things you mention are covered within the first 150 pages
of [K&R2]. At that point, most books on programming are still showing you
how to use the GUI editor. --Richard Heathfield in comp.lang.c

Nov 14 '05 #14

William Ahern

Dave Vandervies <dj******@csclub.uwaterloo.ca> wrote:
<snip>

That would be this valgrind (first hit on Google)?
"Valgrind, an open-source memory debugger for x86-linux" How many of these will valgrind catch?
--------
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;
Yes, if `i' has not been set yet. But, not quite what you were looking for.
--------
int *foo(int *dummy)
{
int i;
return &i;
}
int bar(int *p)
{
int dummy=42;
return *p;
}

/*In a function somewhere*/
/*Ideally, we want to warn here, when a no-longer-valid pointer is stored
in a variable (or, better, immediately on return from foo when the
storage the pointer points at goes away)
*/
p=foo(&i);
/*p is an invalid pointer; typical stack-using implementations will
have it pointing at the dummy int in bar()
*/
i=bar(p);
--------
I believe this will print an error because bar takes the value of an
uninitialized variable. Valgrind keeps track of which regions in memory have
been touched, and reading from an untouched memory region (whether from heap
or stack or where ever) is caught.

Valgrind's real weakness is with automatic variables. If you had initialized
i in foo() none of this may have been caught.
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------
Again, Valgrind would probably not catch this since buf is automatic.
However, OpenBSD has patches to GCC and their library definitions which
might catch this.
Without knowing anything other than that it's a memory debugger, I'd give
part marks for the third one (no warning if the buffer doesn't overflow)
and a small chance at catching the second one.

Of course, it looks like it won't even get there if I try to run it on
a Mac.

So, quite obviously not what I was thinking of.

Yep. Valgrind is a great tool but definitely has its limitations.

Nov 14 '05 #15

boa

Dave Vandervies wrote:

In article <b%*******************@juliett.dax.net>,
boa <ro**@localhost.com> wrote:
Dave Vandervies wrote:
Why not build a DeathStation simulator?

Create a VM that allows aggressive testing for bad (especially not-
well-defined) code, and a compiler targeting it that optimizes for
checkability rather than performance or size.

[Snip a few ideas]

Other thoughts?

It exists already and is called valgrind ;-)

That would be this valgrind (first hit on Google)?
"Valgrind, an open-source memory debugger for x86-linux"

How many of these will valgrind catch?
--------
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;
--------
int *foo(int *dummy)
{
int i;
return &i;
}
int bar(int *p)
{
int dummy=42;
return *p;
}

/*In a function somewhere*/
/*Ideally, we want to warn here, when a no-longer-valid pointer is stored
in a variable (or, better, immediately on return from foo when the
storage the pointer points at goes away)
*/
p=foo(&i);
/*p is an invalid pointer; typical stack-using implementations will
have it pointing at the dummy int in bar()
*/
i=bar(p);
--------
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------

Without knowing anything other than that it's a memory debugger, I'd give
part marks for the third one (no warning if the buffer doesn't overflow)
and a small chance at catching the second one.

Of course, it looks like it won't even get there if I try to run it on
a Mac.

So, quite obviously not what I was thinking of.

You're right. I was too quick recommending valgrind. It is a good tool,
though.

boa

dave

Nov 14 '05 #16

Dave Vandervies

In article <q7************@wilbur.25thandClement.com>,
William Ahern <wi*****@wilbur.25thandClement.com> wrote:

Dave Vandervies <dj******@csclub.uwaterloo.ca> wrote:
int *foo(int *dummy)
{
int i;
return &i;
}
int bar(int *p)
{
int dummy=42;
return *p;
}

/*In a function somewhere*/
/*Ideally, we want to warn here, when a no-longer-valid pointer is stored
in a variable (or, better, immediately on return from foo when the
storage the pointer points at goes away)
*/
p=foo(&i);
/*p is an invalid pointer; typical stack-using implementations will
have it pointing at the dummy int in bar()
*/
i=bar(p);
--------

I believe this will print an error because bar takes the value of an
uninitialized variable.

But it doesn't! It takes a pointer (that has been initialized) that
points to a region of automatic storage that was never initialized and
no longer exists.
Valgrind keeps track of which regions in memory have
been touched, and reading from an untouched memory region (whether from heap
or stack or where ever) is caught.

The pointer that bar() gets is (if we assume a few reasonable things
about the implementation) pointing at wherever i in foo() was; this is
likely to be the same place as dummy in bar() - which is initialized
before the pointer is dereferenced.

The problem is that that's no longer the i in foo() that we returned a
pointer to. Assigning a new (not recycled) segment descriptor for every
automatic variable (thus invalidating the aforementioned assumptions
about the implementation) would let this be caught as soon as we tried to
load the pointer after foo() returned (even before we try to follow it
in bar()). (Note that this would also be Bloody Slow if it was checked
every time a pointer value was handled.)
dave

--
Dave Vandervies dj******@csclub.uwaterloo.ca
[T]ry thinking someday. It might hurt a little at first, but you'll
be glad in retrospect.
--Joona I Palaste roasts a troll in comp.lang.c

Nov 14 '05 #17

William Ahern

Dave Vandervies <dj******@csclub.uwaterloo.ca> wrote:

In article <q7************@wilbur.25thandClement.com>,
William Ahern <wi*****@wilbur.25thandClement.com> wrote:
Dave Vandervies <dj******@csclub.uwaterloo.ca> wrote:
int *foo(int *dummy)
{
int i;
return &i;
}
int bar(int *p)
{
int dummy=42;
return *p;
}

/*In a function somewhere*/
/*Ideally, we want to warn here, when a no-longer-valid pointer is stored
in a variable (or, better, immediately on return from foo when the
storage the pointer points at goes away)
*/
p=foo(&i);
/*p is an invalid pointer; typical stack-using implementations will
have it pointing at the dummy int in bar()
*/
i=bar(p);
--------
I believe this will print an error because bar takes the value of an
uninitialized variable. But it doesn't! It takes a pointer (that has been initialized) that
points to a region of automatic storage that was never initialized and
no longer exists. Valgrind keeps track of which regions in memory have
been touched, and reading from an untouched memory region (whether from heap
or stack or where ever) is caught.

The pointer that bar() gets is (if we assume a few reasonable things
about the implementation) pointing at wherever i in foo() was; this is
likely to be the same place as dummy in bar() - which is initialized
before the pointer is dereferenced.

Ah. Damn, you were way ahead of me already. Valgrind is fooled by this. In
fact, Valgrind didn't even catch `i=i++' inside of main. Oh well.

Nov 14 '05 #18

Thad Smith

Dave Vandervies wrote:

In article <b%*******************@juliett.dax.net>,
boa <ro**@localhost.com> wrote:
Dave Vandervies wrote:
Why not build a DeathStation simulator?

Create a VM that allows aggressive testing for bad (especially not-
well-defined) code, and a compiler targeting it that optimizes for
checkability rather than performance or size.
[Snip a few ideas]
Other thoughts?

It exists already and is called valgrind ;-)

How many of these will valgrind catch?
--------
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;
--------
C or C++ undefined behavior because of multiple updates between
sequence points is a *language specification conformance* issue, not
an execution one. This is detected by static analysis before or
during translation. Once converted to a sequence of instructions,
such as
ld i
inc i
st i
or
ld i
st i
inc i

the results are well defined for the virtual machine.
--------
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------

There are different levels of warnings. In this example, if
strlen(str) is short enough, the behavior is well-defined, of course,
even though the construct isn't safe for arbitrarily long str
arguments. We can use static analysis in this case to determine that
sizeof(buf) < 20, indicating a questionable construct.

To protect against overflow, we really want
strncat(buf, str, sizeof(buf)-strlen(buf)-1);
That can be detected with static analysis, in some cases, as well. To
check dynamically for potential errors, we would verify that
len <= sizeof(buf)-strlen(buf)-1,
assuming that debug_strncat() has access to sizeof(buf).

Thad

Nov 14 '05 #19

Dave Vandervies

In article <41***************@acm.org>, Thad Smith <th*******@acm.org> wrote:

Dave Vandervies wrote:
>Dave Vandervies wrote:

>> Why not build a DeathStation simulator?
>>
>> Create a VM that allows aggressive testing for bad (especially not-
>> well-defined) code, and a compiler targeting it that optimizes for
>> checkability rather than performance or size.

[snippage]
--------
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;
--------

C or C++ undefined behavior because of multiple updates between
sequence points is a *language specification conformance* issue, not
an execution one. This is detected by static analysis before or
during translation. Once converted to a sequence of instructions,
such as
ld i
inc i
st i
or
ld i
st i
inc i

the results are well defined for the virtual machine.

Static analysis can't catch all problems of this sort. Consider:
--------
/*Somewhere*/
void foo(int *a,int *b)
{
*a=(*b)++;
}

/*Somewhere else*/
void bar(int x)
{
/*Do some stuff, including:*/
foo(&x,&x);
}
--------
If your static analyzer is smart enough to recognize that you're calling
foo() with equal pointers, then wrap a few more levels of indirection
around it until you've got enough to confuse it. Being able to
(especially unintentionally) construct arbitrarily complex code that can
still lead to this case makes static checking Highly Impractical at best.

On the other hand, handling this dynamically with a sequence-point-aware
VM would trap when foo() gets two pointers to the same int, and at that
point the debugger can be invoked to work out what led to that:
--------
seq_pt ;beginning of foo()
ld a0,arg2
ld a1,arg1
ld i0,(a0)
st i0,(a1) ;VM notes that *a has been modified since last sequence point
inc i0
st i0,(a0) ;traps if a==b: object modified twice between sequence points
seq_pt ;end of *a=(*b)++. Clear modified-object list.
--------

Keep in mind that the VM's purpose is to check for poorly-defined (or
otherwise bad) C code, even if that C code can be compiled to a set of
instructions that are well-defined in the VM.

Since it's constructed as a dynamic code checker for a language that
prohibits multiple updates (and some cases of both access and update)
between sequence points, the VM knows that even though the sequence of
instructions it sees is well-defined, it could only have been generated
from C code that isn't well-defined, so it can trap on that.

--------
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------

There are different levels of warnings.

Keep in mind that I introduced this with:
}Create a VM that allows aggressive testing for bad (especially not-
}well-defined) code,

I'm assuming that you wouldn't be using such a thing if you didn't want
something approaching the "pathologically paranoid" level of warnings.

In this example, if
strlen(str) is short enough, the behavior is well-defined, of course,
even though the construct isn't safe for arbitrarily long str
arguments. We can use static analysis in this case to determine that
sizeof(buf) < 20, indicating a questionable construct.
But, once again, static analysis is only enough for the trivial examples
that illustrate the point without confusing the reader, and is unlikely
to be enough to catch the cases where a similar problem shows up in
real code.

If a function gets a buffer size argument larger than the real buffer
size, that's a bug, even if what ends up being written into that buffer
does fit; we want to catch that bug as soon as possible even if the
behavior is actually well-defined until a user's cat starts sleeping
on the keyboard. (If the programmer knows that what's getting written
into the buffer won't overflow it, that's what the non-counted variants
of the functions (strcpy in this case) are for.)

To protect against overflow, we really want
strncat(buf, str, sizeof(buf)-strlen(buf)-1);
That can be detected with static analysis, in some cases, as well. To
check dynamically for potential errors, we would verify that
len <= sizeof(buf)-strlen(buf)-1,
assuming that debug_strncat() has access to sizeof(buf).

If we're storing pointers as segment-offset-size, then a little bit of
implementation magic will give it the appropriate size.
Note that this isn't directly available to the code the programmer sees if
(as is likely) the buffer isn't a local or global array; buffers passed in
(as a pointer) from elsewhere or obtained from malloc are the ones most
likely to have mismatched sizes, and sizeof won't give the size of the
buffer in those cases.
Once you're doing aggressive dynamic checking in the implementation's
runtime environment anyways, it's much simpler for all concerned to let
the library function check the sizes; it knows how buffer size and size
arguments are related, and has access to implementation magic to get at
the information it needs to check them.
dave

--
Dave Vandervies dj******@csclub.uwaterloo.ca
The guys in comp.std.c would make excellent politicians, if only they
weren't so honest.
--Richard Heathfield in comp.lang.c

Nov 14 '05 #20

Thad Smith

Dave Vandervies wrote:

In article <41***************@acm.org>, Thad Smith <th*******@acm.org> wrote:
Dave Vandervies wrote:
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;
--------
C or C++ undefined behavior because of multiple updates between
sequence points is a *language specification conformance* issue, not
an execution one. This is detected by static analysis before or
during translation. Once converted to a sequence of instructions,
such as
ld i
inc i
st i
or
ld i
st i
inc i

the results are well defined for the virtual machine.

Static analysis can't catch all problems of this sort. Consider:
--------
/*Somewhere*/
void foo(int *a,int *b)
{
*a=(*b)++;
}

/*Somewhere else*/
void bar(int x)
{
/*Do some stuff, including:*/
foo(&x,&x);
}
--------

Good point, Dave. While that can be done by a diagnostic virtual
machine, it could also be done by a diagnostic compiler generating
native code, which would insert explicit tests in the generated code for
duplicate pointers which must be different for well-defined behavior.
The example code for the VM (snipped) required that the compiler
identify sequence points, a diagnostic-only feature. If the compiler
generates diagnostic code, it could generate tests in native code (or
function calls) as well.

In your example of pointers passed to a function, detection of the error
depends on the run-time behavior of the program. If pointers are chosen
in an apparently random manner, the code must actually execute the case
in which the pointers match in order to determine an error. I suppose
it is possible to make a static/dynamic diagnostic system that takes the
approach that if it can't prove that the pointers are different, and
there are multiple updates through the pointer pairs between sequence
points, it would issue a warning. It would then be up to the programmer
to prove to the compiler/interpreter than the pointers can't match or to
recode such that it doesn't matter.

--------
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------

If a function gets a buffer size argument larger than the real buffer
size, that's a bug, even if what ends up being written into that buffer
does fit; we want to catch that bug as soon as possible even if the
behavior is actually well-defined until a user's cat starts sleeping
on the keyboard.
To protect against overflow, we really want
strncat(buf, str, sizeof(buf)-strlen(buf)-1);
That can be detected with static analysis, in some cases, as well. To
check dynamically for potential errors, we would verify that
len <= sizeof(buf)-strlen(buf)-1,
assuming that debug_strncat() has access to sizeof(buf).

If we're storing pointers as segment-offset-size, then a little bit of
implementation magic will give it the appropriate size.
Note that this isn't directly available to the code the programmer sees if
(as is likely) the buffer isn't a local or global array; buffers passed in
(as a pointer) from elsewhere or obtained from malloc are the ones most
likely to have mismatched sizes, and sizeof won't give the size of the
buffer in those cases.

I lean more towards a diagnostic compiler and library for catching these
types of errors. So pointers _could_ be implemented as an
(address,length) pair, which would include values returned by malloc.
This would allow a diagnostic implementation of strncat() to do the
checking mentioned above.
Once you're doing aggressive dynamic checking in the implementation's
runtime environment anyways, it's much simpler for all concerned to let
the library function check the sizes; it knows how buffer size and size
arguments are related, and has access to implementation magic to get at
the information it needs to check them.

Agreed.

Thad

Nov 14 '05 #21

Michael Mendelsohn

Thad Smith schrieb:

Dave Vandervies wrote:
Static analysis can't catch all problems of this sort. Consider:
--------
/*Somewhere*/
void foo(int *a,int *b)
{
*a=(*b)++;
}

/*Somewhere else*/
void bar(int x)
{
/*Do some stuff, including:*/
foo(&x,&x);
}
--------

Good point, Dave. While that can be done by a diagnostic virtual
machine, it could also be done by a diagnostic compiler generating
native code, which would insert explicit tests in the generated code for
duplicate pointers which must be different for well-defined behavior.

In this case, shouldn't the programmer actually be the one to insert the
diagnostic?
A simple assert(a!=b) would do, and it would document to even the most
inexperienced reader that there's a precondition to using this function.
(With Douglas Adams, it would also serve to remind the _programmer_ that
there might be a better way to do this.)

With this kind of approach, the compiler ought to issue a warning unless
the assert statement is also present.

I somehow doubt that there are C compilers out there that actually use
the assert statements meaningfully.

Cheers
Michael
--
Still an attentive ear he lent Her speech hath caused this pain
But could not fathom what she meant Easier I count it to explain
She was not deep, nor eloquent. The jargon of the howling main
-- from Lewis Carroll: The Three Usenet Trolls

Nov 14 '05 #22

Thad Smith

Michael Mendelsohn wrote:

Thad Smith schrieb:
Dave Vandervies wrote:
Static analysis can't catch all problems of this sort. Consider:
--------
/*Somewhere*/
void foo(int *a,int *b)
{
*a=(*b)++;
}

/*Somewhere else*/
void bar(int x)
{
/*Do some stuff, including:*/
foo(&x,&x);
}
--------
Good point, Dave. While that can be done by a diagnostic virtual
machine, it could also be done by a diagnostic compiler generating
native code, which would insert explicit tests in the generated code for
duplicate pointers which must be different for well-defined behavior.

In this case, shouldn't the programmer actually be the one to insert the
diagnostic?
A simple assert(a!=b) would do, and it would document to even the most
inexperienced reader that there's a precondition to using this function.

Such augmentation of code would help find the pesky conditions and MAY
help with overall clarity and reduction of errors.
With this kind of approach, the compiler ought to issue a warning unless
the assert statement is also present.

You run the risk of making the code overly cluttered with assertions
that might be obvious, for other reasons, to the programmer. For
example, if you wrote
int i;
...
i = diceroll();
i++;

your diagnostic compiler might complain if you didn't insert
assert (i < INT_MAX);
before incrementing i, but you know that diceroll() can't return
anything larger than 6.

The approach discussed earlier in the thread was on automating such
tests for existing code through a diagnostic virtual machine interpreter
and/or diagnostic compiler + library. Sometimes I think it makes sense
to have multiple levels of code: the code that specifies the work to be
done, optional code that can be enabled for different build versions,
the comments explaining all the higher-level considerations for the
human reader, and the tests / assertions that can be sprinkled in the
code to aid failure detection. Sometimes putting them all together
makes the code hard to read.

Thad

Nov 14 '05 #23

Michael Mendelsohn

Thad Smith schrieb:

Michael Mendelsohn wrote:
Dave Vandervies wrote:
Static analysis can't catch all problems of this sort. Consider:
--------
/*Somewhere*/
void foo(int *a,int *b)
{
*a=(*b)++;
}
In this case, shouldn't the programmer actually be the one to insert the
diagnostic?
A simple assert(a!=b) would do, and it would document to even the most
inexperienced reader that there's a precondition to using this function. With this kind of approach, the compiler ought to issue a warning unless
the assert statement is also present.
You run the risk of making the code overly cluttered with assertions
that might be obvious, for other reasons, to the programmer. For
example, if you wrote
int i;
...
i = diceroll();
i++;

your diagnostic compiler might complain if you didn't insert
assert (i < INT_MAX);
before incrementing i, but you know that diceroll() can't return
anything larger than 6.

I suddenly understand that postconditions have a value that goes beyond
helping human readers and error detection in the postconditioned module
itself.
The approach discussed earlier in the thread was on automating such
tests for existing code through a diagnostic virtual machine interpreter
and/or diagnostic compiler + library.
I understand that.
I think "Design by Contract" ought to help here.
Sometimes I think it makes sense
to have multiple levels of code: the code that specifies the work to be
done, optional code that can be enabled for different build versions,
the comments explaining all the higher-level considerations for the
human reader, and the tests / assertions that can be sprinkled in the
code to aid failure detection. Sometimes putting them all together
makes the code hard to read.

I believe that's why the assertions get their own section in a function
definition in some languages.

Mind you, I'm no evangelist, in fact, I've never used a language that
had contracts designed into it from the outset (Eiffel comes to mind),
but they seem to be naturally suited if you want "more aggressive
testing" as the program is running.

Cheers
Michael
--
Still an attentive ear he lent Her speech hath caused this pain
But could not fathom what she meant Easier I count it to explain
She was not deep, nor eloquent. The jargon of the howling main
-- from Lewis Carroll: The Three Usenet Trolls

Nov 14 '05 #24

Interesting coding idea

Similar topics