What is Expressiveness in a Computer Language

Xah Lee

in March, i posted a essay â€œWhat is Expressiveness in a Computer
Languageâ€, archived at:
http://xahlee.org/perl-python/what_i...esiveness.html

I was informed then that there is a academic paper written on this
subject.

On the Expressive Power of Programming Languages, by Matthias
Felleisen, 1990.
http://www.ccs.neu.edu/home/cobbe/pl...ive-slides.pdf

Has anyone read this paper? And, would anyone be interested in giving a
summary?

thanks.

Xah
xa*@xahlee.org
âˆ‘ http://xahlee.org/

Jun 9 '06

Subscribe Post Reply

669

25338

« First
<
3
11
12
13
14
>

Rob Warnock

Marshall <ma*************@gmail.comwrote:
+---------------
| Joachim Durchholz wrote:
| Actually SQL has references - they are called "primary keys", but they
| are references nevertheless.
|
| I strongly object; this is quite incorrect. I grant you that from the
| 50,000 foot level they appear identical, but they are not.
+---------------

Agreed. The only thing different about "primary" keys from any other
key is uniqueness -- a selection by primary key will return only one
record. Other than that constraint, many databases treat them exactly
the same as non-primary keys [e.g., can form indexes on them, etc.].

+---------------
| To qualify as a reference, there need to be reference and dereference
| operations on the reference datatype; there is no such operation is SQL.
+---------------

Not in "ANSI SQL92", say, but there might be in most SQL databases!
[See below re OIDs. Also, SQL:1999 had a "REF" type that was essentially
and OID.]

+---------------
| Would you say the relational algebra has references?
+---------------

Don't confuse "SQL" & "relational algebra"!! You'll get real
relational algebraists *way* bent out of shape if you do that!

+---------------
| (Some SQL dialects also offer synthetic "ID" fields that are
| guaranteed to remain stable over the lifetime of a record.
|
| Primary keys are updatable; there is nothing special about them.
+---------------

I think he's probably talking about "OIDs" (object IDs). Most
current SQL-based databases provide them, usually as a normally-
invisible "system column" that doesn't show up when you say
"SELECT * FROM", but that *does* appear if you say "SELECT oid, *",
and may be used as a "primary" key even on tables with no actual
primary key:

rpw3=# select * from toy limit 4;
c1 | c2 | c3 | upd
--------+-------+--------------------------------+-----
fall | tape | My Favorite Thanksgiving | 16
xmas | book | My Favorite Christmas | 2
xmas | video | The Grinch who Stole Christmas | 4
summer | book | Unusual 4ths of July | 17
(4 rows)

rpw3=# select oid, * from toy limit 4;
oid | c1 | c2 | c3 | upd
-------+--------+-------+--------------------------------+-----
19997 | fall | tape | My Favorite Thanksgiving | 16
19998 | xmas | book | My Favorite Christmas | 2
19999 | xmas | video | The Grinch who Stole Christmas | 4
20000 | summer | book | Unusual 4ths of July | 17
(4 rows)

rpw3=# select * from toy where oid = 19998;
c1 | c2 | c3 | upd
------+------+-----------------------+-----
xmas | book | My Favorite Christmas | 2
(1 row)

rpw3=# insert into toy values ('fall','book','Glory Road');
INSERT 32785 1

rpw3=# select oid, * from toy where oid = 32785;
oid | c1 | c2 | c3 | upd
-------+------+------+------------+-----
32785 | fall | book | Glory Road | 21
(1 row)

rpw3=#

See <http://www.postgresql.org/docs/8.1/static/datatype-oid.html>
for how PostgreSQL treats OIDs [including some critical limitations].
-Rob

-----
Rob Warnock <rp**@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

Jul 13 '06 #601

Marshall

Joe Marshall wrote:

Marshall wrote:

Consider the following Java fragment:

void foo() {
int i = 0;
int j = 0;

// put any code here you want

j = 1;
i = 2;
// check value of j here. It is still 1, no matter what you filled in
above.
// The assignment to i cannot be made to affect the value of j.

}

True, but you have hidden the pointers. Semantically, the identifiers
i and j refer not to integers but to locations that hold integers. The
assignment modifies the location.

What the implementation looks like shouldn't affect how we speak
of the logical model. In the above code, there are no pointers.

By your definition, "pointer" and "variable" are synonyms. That doesn't
seem like a good idea to me. (What if i and j end up in registers?
I have not heard it said that registers have addresses.)

Those two local primitive variables cannot be made to have the same
identity. But you can update them, so this is an example of mutability
without the possibility of identity.

The identity is temporal: You use the same variable name at two
different times. Do you intend for the second `i' to mean the same
variable as the first `i'?

Okay, so "i" and "i" have the same identity. I suppose you could argue
that the language's namespace is an address-space, and variable names
are addresses. At this point, though, you've made the concept of
identity
so broad that it is now necessarily a property of all languages that
use
named values, whether updatable or not. I think you would also have
to call "3" a void pointer by this scheme.

Clearly there is *some* difference between a language which allows
explicit pointers and thus aliasing and one that doesn't. What term
would you use? First-class variables?
Marshall

Jul 13 '06 #602

Marshall

Chris Smith wrote:

Darren New <dn**@san.rr.comwrote:
Chris Smith wrote:
Unless I'm missing your point, I disagree with your disagreement.
Mutability only makes sense because of object identity (in the generic
sense; no OO going on here).
Depends what you mean by "object".

int x = 6; int y = 5; x = y;

I'd say x was mutable, with no "identity" problems involved?

The variable x definitely has identity that's independent of its value.

I'm not sure what you mean by that.

I also see, though, that the majority (so far, I'd
say all) of the potential uses for which it's worth introducing mutation
into an otherwise mutation-free language allow the possibility of
aliasing, which sorta makes me wonder whether this problem is worth
solving.

What about my example of SQL? Mutation, no pointers, no aliasing.
Yet: useful.
Marshall

Jul 13 '06 #603

Chris Smith

Marshall <ma*************@gmail.comwrote:

Chris Smith wrote:
Darren New <dn**@san.rr.comwrote:
Chris Smith wrote:
Unless I'm missing your point, I disagree with your disagreement.
Mutability only makes sense because of object identity (in the generic
sense; no OO going on here).
>
Depends what you mean by "object".
>
int x = 6; int y = 5; x = y;
>
I'd say x was mutable, with no "identity" problems involved?
The variable x definitely has identity that's independent of its value.

I'm not sure what you mean by that.

I mean, simply, that when you can assign a value to a variable, then you
care that it is that variable and not a different one. That's identity
in the normal sense of the word. The code elsewhere in the procedure is
able to access the value of 'x', and that has meaning even though you
don't know what value 'x' has. This has definite implications, and is a
useful concept; for example, it means that the pure lambda calculus no
longer sufficient to express the semantics of the programming language,
but instead something else is required.

What you are asking for is some subset of identity, and I've not yet
succeeded in understanding exactly what it is or what its limits are...
except that so far, it seems to have everything to do with pointers or
aliasing.

I also see, though, that the majority (so far, I'd
say all) of the potential uses for which it's worth introducing mutation
into an otherwise mutation-free language allow the possibility of
aliasing, which sorta makes me wonder whether this problem is worth
solving.

What about my example of SQL? Mutation, no pointers, no aliasing.
Yet: useful.

I'm not yet convinced that this is any different from a language with
standard pointer aliasing. If I have two tables A and B, and a foreign
key from A into B, then I run into the same problems with enforcing
constraints that I would see with a pointer model... when I update a
relation, I need to potentially check every other relation that contains
a foreign key into it, in order to ensure that its constraints are not
violated by that constraint. That's the same thing that is being
pointed out as a negative consequence of aliasing in other languages.
For example, executing:

UPDATE P SET x = 5 WHERE y = 43;

may result in the database having to re-evaluate the constraint that
says that for all P(x, y, z), x must be less than 4 when z = 17. One
difference is that while in general purpose programming languages this
appears to be a daunting task, databases are set up to do these kinds of
things all the time and contain optimizations for it... but the problem
is still the same, and it would still present the same difficulties in
doing formal proofs that running the above UPDATE statement doesn't
violate any invariants.

(If I'm wrong about that, please let me know; I'd very interested if
that's so.)

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Jul 14 '06 #604

David Hopwood

Chris Smith wrote:

David Hopwood <da******************@blueyonder.co.ukwrote:

>>This is true, but note that postconditions also need to be efficient
if we are going to execute them.

If checked by execution, yes. In which case, I am trying to get my head
around how it's any more true to say that functional languages are
compilable postconditions than to say the same of imperative languages.

A postcondition must, by definition[*], be a (pure) assertion about the
values that relevant variables will take after the execution of a subprogram.

If a subprogram is intended to have side effects, then its postcondition
can describe the results of the side effects, but must not reexecute them.

[*] E.g. see
<http://www.spatial.maine.edu/~worboys/processes/hoare%20axiomatic.pdf>,
although the term "postcondition" was introduced later than this paper.

--
David Hopwood <da******************@blueyonder.co.uk>

Jul 14 '06 #605

Marshall

Chris Smith wrote:

Marshall <ma*************@gmail.comwrote:
Chris Smith wrote:
Darren New <dn**@san.rr.comwrote:
Chris Smith wrote:
Unless I'm missing your point, I disagree with your disagreement.
Mutability only makes sense because of object identity (in the generic
sense; no OO going on here).

Depends what you mean by "object".

int x = 6; int y = 5; x = y;

I'd say x was mutable, with no "identity" problems involved?
>
The variable x definitely has identity that's independent of its value.
I'm not sure what you mean by that.

I mean, simply, that when you can assign a value to a variable, then you
care that it is that variable and not a different one. That's identity
in the normal sense of the word.

I guess it is, now that you mention it.

The code elsewhere in the procedure is
able to access the value of 'x', and that has meaning even though you
don't know what value 'x' has. This has definite implications, and is a
useful concept; for example, it means that the pure lambda calculus no
longer sufficient to express the semantics of the programming language,
but instead something else is required.

What you are asking for is some subset of identity, and I've not yet
succeeded in understanding exactly what it is or what its limits are...
except that so far, it seems to have everything to do with pointers or
aliasing.

Perhaps it is specifically first-class identity, rather than identity
per se.

I also see, though, that the majority (so far, I'd
say all) of the potential uses for which it's worth introducing mutation
into an otherwise mutation-free language allow the possibility of
aliasing, which sorta makes me wonder whether this problem is worth
solving.
What about my example of SQL? Mutation, no pointers, no aliasing.
Yet: useful.

I'm not yet convinced that this is any different from a language with
standard pointer aliasing. If I have two tables A and B, and a foreign
key from A into B, then I run into the same problems with enforcing
constraints that I would see with a pointer model... when I update a
relation, I need to potentially check every other relation that contains
a foreign key into it, in order to ensure that its constraints are not
violated by that constraint. That's the same thing that is being
pointed out as a negative consequence of aliasing in other languages.

No, that's not the same thing. What you are describing here is
not an aliasing issue, but simply the consequences of allowing
constraints to mention more than one variable.

In our i-and-j example above, suppose there was a constraint
such that i < j. We have to re-check this constraint if we
update either i or j. That's not the same thing as saying
that i and j are aliased.

A foreign key constraint is a multi-variable constraint.
Specifically, a foreign key from table A, attribute a
to table B, attribute b is the constraint:

forall a in A, exists b in B such that a = b.

Note that two variables, A and B, are referenced in
the constraint. In general, any constraint on two
variables will have to be rechecked upon update
to either.

For example, executing:

UPDATE P SET x = 5 WHERE y = 43;

may result in the database having to re-evaluate the constraint that
says that for all P(x, y, z), x must be less than 4 when z = 17.

I don't see any aliasing in this example either.

But consider how much worse this problem is if real aliasing
is possible. We have some pointer variable, and initialize
it with the return value from some function. We don't know
anything about what variable is involved. We update through
the pointer. What constraints must we recheck? Apparently
all of them; unless we have perfect alias analysis, we can't
tell what variables are affected by our update.

One
difference is that while in general purpose programming languages this
appears to be a daunting task, databases are set up to do these kinds of
things all the time and contain optimizations for it... but the problem
is still the same, and it would still present the same difficulties in
doing formal proofs that running the above UPDATE statement doesn't
violate any invariants.

(If I'm wrong about that, please let me know; I'd very interested if
that's so.)

I'm interested to hear your reply.
Marshall

Jul 14 '06 #606

Chris Smith

Marshall <ma*************@gmail.comwrote:

What you are asking for is some subset of identity, and I've not yet
succeeded in understanding exactly what it is or what its limits are...
except that so far, it seems to have everything to do with pointers or
aliasing.

Perhaps it is specifically first-class identity, rather than identity
per se.

As in: "the value of one variable can (be/refer to/depend on) the
identity of another variable"? I can certainly see this as as
reasonable concept to consider.

I'm not yet convinced that this is any different from a language with
standard pointer aliasing. If I have two tables A and B, and a foreign
key from A into B, then I run into the same problems with enforcing
constraints that I would see with a pointer model... when I update a
relation, I need to potentially check every other relation that contains
a foreign key into it, in order to ensure that its constraints are not
violated by that constraint. That's the same thing that is being
pointed out as a negative consequence of aliasing in other languages.

No, that's not the same thing. What you are describing here is
not an aliasing issue, but simply the consequences of allowing
constraints to mention more than one variable.

A foreign key constraint is a multi-variable constraint.
Specifically, a foreign key from table A, attribute a
to table B, attribute b is the constraint:

forall a in A, exists b in B such that a = b.

Note that two variables, A and B, are referenced in
the constraint.

There's confusion here coming from different usages of the word
variable. Let us talk instead of values, and of the abstract structures
that gives them meaning. In both cases (invariants in a hypothetical
imperative language, and in a relational database), the constraints make
reference to these structures of values (relations, for example, or
various kinds of data structures), and not to the individual values or
objects that they contain. In both cases, the problem is not that we
don't know what structures to check to verify the invariant; rather,
it's that we have to check ALL of the values in that structure.

As someone pointed out, this is to be expected in a world of mutable
things with identity that are globally locatable. It is simple fact
that if I tell you "I spoke to Barbara's husband", you may need to trace
down who Barbara's husband is before you could discover that, for
example, maybe I actually spoke to your boss, or to your nephew's best-
friend's father. If databases are capable of modeling these kinds of
relationships (and of course they are), then they are as susceptible to
"aliasing" -- in a logical sense that avoids mention of pointer -- as
anyone else.

I don't see any aliasing in this example either.

Actually, this was probably a bad example. Let's stick to the others
involving relationships between tuples.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Jul 14 '06 #607

George Neuner

On 13 Jul 2006 08:45:49 -0700, "Marshall" <ma*************@gmail.com>
wrote:

>
On the other hand, there is no problem domain for which pointers
are a requirement. I agree they are deucedly convenient, though.

I would argue that pointers/references _are_ a requirement for I/O. I
know of no workable method for interpreting raw bits as meaningful
data other than to overlay a typed template upon them.

Categorically disallowing address manipulation functionally cripples
the language because an important class of programs (system programs)
cannot be written.

Of course, languages can go overboard the other way too. IMO, C did
not need to provide address arithmetic at the language level,
reinterpretable references and array indexing would have sufficed for
any use. Modula 3's type safe view is an example of getting it right.

It is quite reasonable to say "I don't write _____ so I don't need
[whatever language feature enables writing it]". It is important,
however, to be aware of the limitation and make your choice
deliberately.
George
--
for email reply remove "/" from address

Jul 14 '06 #608

Marshall

George Neuner wrote:

On 13 Jul 2006 08:45:49 -0700, "Marshall" <ma*************@gmail.com>
wrote:

On the other hand, there is no problem domain for which pointers
are a requirement. I agree they are deucedly convenient, though.

I would argue that pointers/references _are_ a requirement for I/O. I
know of no workable method for interpreting raw bits as meaningful
data other than to overlay a typed template upon them.

I think I know what you mean. I agree that pointers are necessary
for, e.g., device drivers. So I have to weaken my earlier statement.

Categorically disallowing address manipulation functionally cripples
the language because an important class of programs (system programs)
cannot be written.

That's fair, although I could argue how important systems programming
is these days. (And C/C++ are cock-of-the-walk there anyway.)

Of course, languages can go overboard the other way too. IMO, C did
not need to provide address arithmetic at the language level,
reinterpretable references and array indexing would have sufficed for
any use. Modula 3's type safe view is an example of getting it right.

It is quite reasonable to say "I don't write _____ so I don't need
[whatever language feature enables writing it]". It is important,
however, to be aware of the limitation and make your choice
deliberately.

Agreed.
Marshall

Jul 14 '06 #609

Marshall

Chris Smith wrote:

Marshall <ma*************@gmail.comwrote:

What you are asking for is some subset of identity, and I've not yet
succeeded in understanding exactly what it is or what its limits are...
except that so far, it seems to have everything to do with pointers or
aliasing.
Perhaps it is specifically first-class identity, rather than identity
per se.

As in: "the value of one variable can (be/refer to/depend on) the
identity of another variable"? I can certainly see this as as
reasonable concept to consider.

Yes, I think it's important. We've become so accustomed, in
modern languages, to considering variables as first-class,
that we cannot see any more that many of the problems
attributed to variables (such as aliasing) are actually
problems only with *first-class* variables. The historical
reality that non-first-class variables are associated with
more antique languages (Fortran/pre-OO) makes us shy
away from even considering removing first-class status
from variables. But I think it's an important point in the
design space (although I certainly respect Andreas'
reluctance; to paraphrase, "first class or bust." :-) )

After all, what are the alternatives? Purely-functional
languages remove themselves from a large class of
problems that I consider important: data management.
I have explored the OO path to its bitter end and am
convinced it is not the way. So what is left? Uniqueness
types and logic programming, I suppose. I enjoy logic
programming but it doesn't seem quite right. But notice:
no pointers there! And it doesn't seem to suffer from the
lack. I really don't know enough about uniqueness types
to evaluate them. But again, I already have my proposed
solution: variables but no first-class variables. (And yes,
I'm acutely aware of how problematic that makes
closures. :-()

I'm not yet convinced that this is any different from a language with
standard pointer aliasing. If I have two tables A and B, and a foreign
key from A into B, then I run into the same problems with enforcing
constraints that I would see with a pointer model... when I update a
relation, I need to potentially check every other relation that contains
a foreign key into it, in order to ensure that its constraints are not
violated by that constraint. That's the same thing that is being
pointed out as a negative consequence of aliasing in other languages.
No, that's not the same thing. What you are describing here is
not an aliasing issue, but simply the consequences of allowing
constraints to mention more than one variable.

A foreign key constraint is a multi-variable constraint.
Specifically, a foreign key from table A, attribute a
to table B, attribute b is the constraint:

forall a in A, exists b in B such that a = b.

Note that two variables, A and B, are referenced in
the constraint.

There's confusion here coming from different usages of the word
variable. Let us talk instead of values, and of the abstract structures
that gives them meaning.

I don't think that will work. We cannot discuss constraints, nor
aliasing, without bringing variables in to the picture. (Aliasing
may exist without variables but it is a non-problem then.)

In both cases (invariants in a hypothetical
imperative language, and in a relational database), the constraints make
reference to these structures of values (relations, for example, or
various kinds of data structures), and not to the individual values or
objects that they contain.

I am not certain what this means.

In both cases, the problem is not that we
don't know what structures to check to verify the invariant; rather,
it's that we have to check ALL of the values in that structure.

But not all values in all structures, as you do in the presence
of aliasing.

As someone pointed out, this is to be expected in a world of mutable
things with identity that are globally locatable. It is simple fact
that if I tell you "I spoke to Barbara's husband", you may need to trace
down who Barbara's husband is before you could discover that, for
example, maybe I actually spoke to your boss, or to your nephew's best-
friend's father. If databases are capable of modeling these kinds of
relationships (and of course they are), then they are as susceptible to
"aliasing" -- in a logical sense that avoids mention of pointer -- as
anyone else.

I don't see that they're the same thing. Similar is some ways, yes,
but not the same.

As I said much earlier, the terminology is problematic, and it may
be that we (or just I) simply lack the precision of terms necessary
for the conversation.
Marshall

Jul 14 '06 #610

Joachim Durchholz

Marshall schrieb:

What about my example of SQL? Mutation, no pointers, no aliasing.
Yet: useful.

Sorry, but SQL does have aliasing.

E.g. if you have records that have name="John", surname="Doe", the
statements
SELECT * FROM persons WHERE name = "John"
and
SELECT * FROM persons WHERE name = "Doe"
are aliases of each other.

The alias is actually in the WHERE clause. And this *can* get you into
trouble if you have something that does
UPDATE ... WHERE name = "John"
and
UPDATE ... WHERE surname = "Doe"
e.g. doing something with the Johns, then updating the names of all
Does, and finally restoring the Johns (but not recognizing that changing
the names of all Does might have changed your set of Johns).

Conceptually, this is just the same as having two different access path
to the same memory cell. Or accessing the same global variable through a
call-by-reference parameter and via its global name.

BTW with views, you get not just aliasing but what makes aliasing really
dangerous. Without views, you can simply survey all the queries that you
are working with and lexically compare table and field names to see
whether there's aliasing. With views, the names that you see in a
lexical scope are not enough to determine aliasing.
E.g. if you use a view that builds upon the set of Johns but aren't
aware of that (possibly due to abstraction barriers), and you change the
name field of all Does, then you're altering the view without a chance
to locally catch the bug. That's just as bad as with any pointer
aliasing problem.

Regards,
Jo

Jul 14 '06 #611

Andreas Rossberg

Darren New wrote:

Andreas Rossberg wrote:

>Yes, technically you are right. But this makes a pretty weak notion of
mutability. All stateful data structures had to stay within their
lexical scope, and could never be passed to a function.

Not really. The way Hermes handles this is with destructive assignment.
Each variable holds a value, and you (almost) cannot have multiple
variables referring to the same value.

OK, this is interesting. I don't know Hermes, is this sort of like a
dynamically checked equivalent of linear or uniqueness typing?

If you want to assign Y to X, you use
X := Y
after which Y is considered to be uninitialized. If you want X and Y to
have the same value, you use
X := copy of Y
after which X and Y have the same value but are otherwise unrelated, and
changes to one don't affect the other.

Mh, but if I understand correctly, this seems to require performing a
deep copy - which is well-known to be problematic, and particularly
breaks all kinds of abstractions. Not to mention the issue with
uninitialized variables that I would expect occuring all over the place.
So unless I'm misunderstanding something, this feels like trading one
evil for an even greater one.

- Andreas

Jul 14 '06 #612

Andreas Rossberg

Marshall wrote:

>
After all, what are the alternatives? Purely-functional
languages remove themselves from a large class of
problems that I consider important: data management.

Maybe, but I have yet to see how second-class variables are really more
adequate in dealing with it.

And note that even with second-class state you can still have aliasing
issues - you just need mutable arrays and pass around indices. Keys in
databases are a more general form of the same problem.

I have explored the OO path to its bitter end and am
convinced it is not the way. So what is left? Uniqueness
types and logic programming, I suppose. I enjoy logic
programming but it doesn't seem quite right. But notice:
no pointers there! And it doesn't seem to suffer from the
lack.

Uh, aliasing all over the place! Actually, I think that logic
programming, usually based on deep unification, brings by far the worst
incarnation of aliasing issues to the table.

- Andreas

--
Andreas Rossberg, ro******@ps.uni-sb.de

Jul 14 '06 #613

Joachim Durchholz

Marshall schrieb:

void foo() {
int i = 0;
int j = 0;
j = 1;
i = 2;
// check value of j here. It is still 1, no matter what you filled
// in above.
// The assignment to i cannot be made to affect the value of j.
}

Those two local primitive variables cannot be made to have the same
identity.

Sure. To state it more clearly, they cannot be aliases.

But you can update them, so this is an example of mutability
without the possibility of identity.

You're being a bit sloppy with terminology here. "Identity" in the
phrase above refers to two entities, in the sense of "i and j cannot be
identical".
Identity is actually a property of a single entity, namely that what
remains constant regardless of what you do with the entity.

Regards,
Jo

Jul 14 '06 #614

Joachim Durchholz

Marshall schrieb:

By your definition, "pointer" and "variable" are synonyms. That doesn't
seem like a good idea to me. (What if i and j end up in registers?
I have not heard it said that registers have addresses.)

There are no registers in the JVM ;-P

More specifically, where Joe said "pointer" he really meant "reference".
i refers to a variable, and you really want to make sure that the first
i refers to the same variable as the second i, so whatever is referred
to by i is really an identity.

>>Those two local primitive variables cannot be made to have the same
identity. But you can update them, so this is an example of mutability
without the possibility of identity.
The identity is temporal: You use the same variable name at two
different times. Do you intend for the second `i' to mean the same
variable as the first `i'?

Okay, so "i" and "i" have the same identity.

Vacuous but true.

I suppose you could argue that the language's namespace is an
address-space, and variable names are addresses.

Correct.

At this point, though, you've made the concept of identity so broad
that it is now necessarily a property of all languages that use named
values, whether updatable or not. I think you would also have to call
"3" a void pointer by this scheme.

It's not a void pointer, it's a pointer to an integer, but you're
correct: "3", when written in a program text, is a reference to the
successor of the successor of the successor of zero.

If you find that silly and would like to stick with equality, then
you're entirely correct. Natural numbers are entirely immutable by
definition, and I'm definitely not the first one to observe that
identity and equality are indistinguishable for immutable objects.

Clearly there is *some* difference between a language which allows
explicit pointers and thus aliasing and one that doesn't.

You can have aliasing without pointers; e.g. arrays are fully sufficient.
If i = j, then a [i] and a [j] are aliases of the same object.

(After I observed that, I found it no longer a surprise that array
optimizations are what Fortran compiler teams sink most time into. The
aliasing problems are *exactly* the same as those with pointers in C -
though in practice, the Fortranistas have the advantage that the
compiler will usually know that a [i] and b [j] cannot be aliases of
each other, so while they have the same problems, the solutions give the
Fortranistas more leverage.)

What term would you use? First-class variables?

I think it's more a quasi-continuous spectrum.

On one side, we have alias-free toy languages (no arrays, no pointers,
no call-by-reference, just to name the most common sources of aliasing).

SQL is slightly more dangerous: it does have aliases, but they are
rarely a problem (mostly because views aren't a very common thing, and
even if they are used, they aren't usually so abstract that they really
hide the underlying dependencies).

Next are languages with arrays and call-by-reference. "Pascal without
pointers" would be a candidate.
Here, aliasing occurs, and it can and does hide behind abstraction
barriers, but it's not a serious problem unless the software becomes
*really* large.

The end of the line are languages that use references to mutable data
everywhere. All OO languages are a typical example of that.

Regards,
Jo

Jul 14 '06 #615

Joachim Durchholz

Marshall schrieb:

Joachim Durchholz wrote:
>It's just that I know that it's viable to give up destructive updates.
Giving up pointers is a far more massive restriction.

Oddly, I feel the opposite. While it's true there are many domains
for which purely functional programming is a fine choice, there
are some domains for which it is insufficient. Any kind of data
managament, for example, requires that you be able to update
the information.

Sure, the data itself cannot be managed non-destructively.

However, the programs that operate on the data need not do destructive
updates internally. That way, I can have pointers inside the programs
without giving up aliasing.
(Aliasing is really, really, really important. It's the reason why
mathematicians can handle infinite objects - they copy just the alias,
i.e. the name or description, instead of the object itself. It's one of
the things that make abstraction useful.)

>>Right. To me the response to this clear: give up pointers. Imperative
operations are too useful to give up; indeed they are a requirement
for certain problems.
I don't know any.
In some cases, you need an additional level of conceptual indirection -
instead of *doing* the updates, you write a function that *describes* them.

But then what do you do with that function?

I pass it to an engine that's imperative.

However, that engine doesn't need to do aliasing anymore.

In other words, I'm separating mutability and aliasing, so that they
can't combine and explode.

Let's say I have an
employee database. John Smith just got hired on 1/1/2006 with
a salary of $10,000. I need to record this fact somewhere. How
do I do that without variables? Current-employees is a variable.
Even if I have the space to keep all historical data, so I'm not
deleting anything, I still have to have a variable for the latest
version of the accumulated data. I can solve this without
pointers, but I can't solve it without variables.

As I said elsewhere, the record has an identity even though it isn't
explicit in SQL.
(SQL's data manipulation language is generally not considered half as
"clean" as the query language; I think the core of the problemsis that
DML combines mutability and identity.)

I should like to learn more about these. I have some vague
perception of the existence of linear logic, but not much
else. However, I also already have an excellent solution
to the pointer aliasing problem, so I'm less motivated.

Pointers are just the most obvious form of aliasing. As I said
elsewhere, there are others: array indexing, by-reference parameters, or
even WHERE clauses in SQL statements. In fact, anything that selects an
updatable piece of data from a collection is an alias, and incurs the
same problems as pointers, though they may come in different disguises.

>Actually SQL has references - they are called "primary keys", but they
are references nevertheless.

I strongly object; this is quite incorrect. I grant you that from the
50,000 foot level they appear identical, but they are not. To
qualify as a reference, there need to be reference and dereference
operations on the reference datatype; there is no such operation
is SQL.

Would you say the relational algebra has references?

Yes.

Or, consider the classic prolog ancestor query. Let's say we're
setting up as follows

father(bob, joe).
father(joe, john).

Is "joe" a reference, here? After all, when we do the ancestor
query for john, we'll see his father is joe and then use that to
find joe's father. Keys in SQL are isomorphic to joe in the
above prolog.

Agreed. They are both references ;-P

>(Some SQL dialects also offer synthetic
"ID" fields that are guaranteed to remain stable over the lifetime of a
record.

Primary keys are updatable; there is nothing special about them.

Right - I was wrong with identifying keys and identities. In fact the
identity of an SQL record cannot be directly obtained (unless via those
ID fields).
The records still have identities. It's possible to have two WHERE
clauses that refer to the same record, and if you update the record
using one WHERE clause, the record returned by the other WHERE clause
will have changed, too.

>With a "repeatable read" isolation level, you actually return to a
declarative view of the database: whatever you do with it, you won't see
it until you commit the transaction. (As soon as you commit, the
declarative peace is over and you better watch out that your data
doesn't become inconsistent due to aliasing.)

Alas, transaction isolation levels are a performance hack.
I cannot defend them on any logical basis. (Aside: did you mean
serializable, rather than repeatable read?)

Possibly. There are so many isolation levels that I have to look them up
whenever I want to get the terminology 100% correct.

>The only thing that can really be done about it is not adding it
artificially into a program. In those cases where aliasing is part of
the modelled domain, you really have to carefully inspect all
interactions and never, never, never dream about abstracting it away.

Yes, aliasing introduces a lot of problems. This is one reason
why closures make me nervous.

Me too.
On the other hand, they are so immensely useful.
Simply don't create closures with mutable data in them. Or, to the very
least, make sure that no aliases outside the closure exist.

Regards,
Jo

Jul 14 '06 #616

Marshall

Joachim Durchholz wrote:

Marshall schrieb:
What about my example of SQL? Mutation, no pointers, no aliasing.
Yet: useful.

Sorry, but SQL does have aliasing.

Well. I suppose we do not have an agreed upon definition
of aliasing, so it is hard to evaluate either way. I would
propose using the same one you used for identity:

if there are two variables and modifying one also modifies
the other, then there is aliasing between them.

I avoided mentioning equality to include, for example,
having an array i that is an alias to a subset of array j.
Please feel free to critque this definition, and/or supply
an alternative.

E.g. if you have records that have name="John", surname="Doe", the
statements
SELECT * FROM persons WHERE name = "John"
and
SELECT * FROM persons WHERE name = "Doe"
are aliases of each other.

The alias is actually in the WHERE clause.

Not by my definition, because there is only one variable here.

And this *can* get you into
trouble if you have something that does
UPDATE ... WHERE name = "John"
and
UPDATE ... WHERE surname = "Doe"
e.g. doing something with the Johns, then updating the names of all
Does, and finally restoring the Johns (but not recognizing that changing
the names of all Does might have changed your set of Johns).

The fact that some person might get confused about the
semantics of what they are doing does not indicate aliasing.
It is easy enough to do an analysis of your updates and
understand what will happen; this same analysis is impossible
with two arbitrary pointers, unless one has a whole-program
trace. That strikes me as a significant difference.

Conceptually, this is just the same as having two different access path
to the same memory cell. Or accessing the same global variable through a
call-by-reference parameter and via its global name.

There are similarities, but they are not the same.

BTW with views, you get not just aliasing but what makes aliasing really
dangerous. Without views, you can simply survey all the queries that you
are working with and lexically compare table and field names to see
whether there's aliasing. With views, the names that you see in a
lexical scope are not enough to determine aliasing.
E.g. if you use a view that builds upon the set of Johns but aren't
aware of that (possibly due to abstraction barriers), and you change the
name field of all Does, then you're altering the view without a chance
to locally catch the bug. That's just as bad as with any pointer
aliasing problem.

It is certainly aliasing, but I would not call it "just as bad." There
are
elaborate declarative constraint mechanisms in place, for example.
And the definition of the view itsef is a declarative, semantic
entity. Whereas a pointer is an opaque, semantics-free address.
Marshall

Jul 14 '06 #617

Marshall

Andreas Rossberg wrote:

Marshall wrote:

After all, what are the alternatives? Purely-functional
languages remove themselves from a large class of
problems that I consider important: data management.

Maybe, but I have yet to see how second-class variables are really more
adequate in dealing with it.

And note that even with second-class state you can still have aliasing
issues - you just need mutable arrays and pass around indices. Keys in
databases are a more general form of the same problem.

So for array a, you would claim that "a[5]" is an alias for
(a part of) "a"? That seems to stretch the idea of aliasing
to me. With these two expressions, it is obvious enough
what is going on; with two arbitrary pointers, it is not.

It seems to me the source of the problem is the opacity
of pointers, but perhaps I am missing something.

I have explored the OO path to its bitter end and am
convinced it is not the way. So what is left? Uniqueness
types and logic programming, I suppose. I enjoy logic
programming but it doesn't seem quite right. But notice:
no pointers there! And it doesn't seem to suffer from the
lack.

Uh, aliasing all over the place! Actually, I think that logic
programming, usually based on deep unification, brings by far the worst
incarnation of aliasing issues to the table.

Hmmm. Can you elaborate just a bit?
Marshall

Jul 14 '06 #618

Marshall

Joachim Durchholz wrote:

Marshall schrieb:
void foo() {
int i = 0;
int j = 0;
j = 1;
i = 2;
// check value of j here. It is still 1, no matter what you filled
// in above.
// The assignment to i cannot be made to affect the value of j.
}

Those two local primitive variables cannot be made to have the same
identity.

Sure. To state it more clearly, they cannot be aliases.

Yes.

But you can update them, so this is an example of mutability
without the possibility of identity.

You're being a bit sloppy with terminology here. "Identity" in the
phrase above refers to two entities, in the sense of "i and j cannot be
identical".
Identity is actually a property of a single entity, namely that what
remains constant regardless of what you do with the entity.

It is a fair critque. I'll try to be more precise.
Marshall

Jul 14 '06 #619

Marshall

Joachim Durchholz wrote:

>
You can have aliasing without pointers; e.g. arrays are fully sufficient.
If i = j, then a [i] and a [j] are aliases of the same object.

I am having a hard time with this very broad definition of aliasing.
Would we also say that a[1+1] and a[2] are aliases? It seems
to me, above, that we have only a, and with only one variable
there can be no aliasing.

A further question:

given a 32 bit integer variable x, and offsets i and j (equal as in
the above example) would you say that

x &= (1 << i)
and
x &= (1 << j)

are aliased expressions for setting a particular bit in x?

I am not being facetious; I am trying to understand the limits
of your definition for aliasing.

(After I observed that, I found it no longer a surprise that array
optimizations are what Fortran compiler teams sink most time into. The
aliasing problems are *exactly* the same as those with pointers in C -
though in practice, the Fortranistas have the advantage that the
compiler will usually know that a [i] and b [j] cannot be aliases of
each other, so while they have the same problems, the solutions give the
Fortranistas more leverage.)

I don't understand this paragraph. On the one hand, it seems you
are saying that C and Fortran are identically burdened with the
troubles caused by aliasing, and then a moment later it seems
you are saying the situation is distinctly better with Fortran.

What term would you use? First-class variables?

I think it's more a quasi-continuous spectrum.

On one side, we have alias-free toy languages (no arrays, no pointers,
no call-by-reference, just to name the most common sources of aliasing).

SQL is slightly more dangerous: it does have aliases, but they are
rarely a problem (mostly because views aren't a very common thing, and
even if they are used, they aren't usually so abstract that they really
hide the underlying dependencies).

Next are languages with arrays and call-by-reference. "Pascal without
pointers" would be a candidate.
Here, aliasing occurs, and it can and does hide behind abstraction
barriers, but it's not a serious problem unless the software becomes
*really* large.

The end of the line are languages that use references to mutable data
everywhere. All OO languages are a typical example of that.

Now with this, it appears you are agreeing that SQL has an advantage
vis-a-vis aliasing compared to OO languages. Yes? If so, we are
agreeing on the part I care about, and the specifics of just what
we call aliasing are not so important to me.
Marshall

Jul 14 '06 #620

rossberg

Marshall wrote:

Andreas Rossberg wrote:

And note that even with second-class state you can still have aliasing
issues - you just need mutable arrays and pass around indices. Keys in
databases are a more general form of the same problem.

So for array a, you would claim that "a[5]" is an alias for
(a part of) "a"? That seems to stretch the idea of aliasing
to me.

Not at all, I'd say. In my book, aliasing occurs whenever you have two
different "lvalues" that denote the same mutable cell. I think it would
be rather artificial to restrict the definition to lvalues that happen
to be variables or dereferenced pointers.

So of course, a[i] aliases a[j] when i=j, which in fact is a well-known
problem in some array or matrix routines (e.g. copying array slices
must always consider overlapping slices as special cases).

Uh, aliasing all over the place! Actually, I think that logic
programming, usually based on deep unification, brings by far the worst
incarnation of aliasing issues to the table.

Hmmm. Can you elaborate just a bit?

Logic variables immediately become aliases when you unify them.
Afterwards, after you bind one, you also bind the other - or fail,
because both got already bound the other way round. Unfortunately,
unification also is a deep operation, that is, aliasing can be induced
into variables deeply nested into some terms that happen to get
unified, possibly without you being aware of any of this (the
unification, nor the variable containment). To avoid accidental
aliasing you essentially must keep track of all potential unifications
performed by any given predicate (including transitively, by its
subgoals), which totally runs square to all concerns of modularity and
abstraction.

I've never been much of a logic programmer myself, but our language
group has a dedicated LP and CP background, and people here have
developed very strong opinions about the adequateness of unrestricted
logic variables and particularly unification in practice. I remember
somebody calling Prolog "the worst imperative language ever invented".

- Andreas

Jul 14 '06 #621

Darren New

Andreas Rossberg wrote:

OK, this is interesting. I don't know Hermes, is this sort of like a
dynamically checked equivalent of linear or uniqueness typing?

I'm not sure what linear or uniqueness typing is. It's typestate, and if
I remember correctly the papers I read 10 years ago, the folks at
TJWatson that invented Hermes also invented the concept of typestate.
They at least claim to have coined the term.

It's essentially a dataflow analysis that allows you to do the same
sorts of things that "don't read variables that may not yet have been
assigned to", except that you could annotate that variables change to
the state of "uninitialized" after they've already been initialized.

Mh, but if I understand correctly, this seems to require performing a
deep copy - which is well-known to be problematic, and particularly
breaks all kinds of abstractions.

Um, no, because there are no aliases. There's only one name for any
given value, so there's no "deep copy" problems. A deep copy and a
shallow copy are the same thing. And there are types of values you can
assign but not copy, such as callmessages (which are problematic to copy
for the same reason a stack frame would be problematic to copy).

I believe, internally, that there were cases where the copy was "deep"
and cases where it was "shallow", depending on the surrounding code.
Making a copy of a table and passing it to another process had to be a
"deep" copy (given that a column could contain another table, for
example). Making a copy of a table and using it for read-only purposes
in the same process would likely make a shallow copy of the table.
Iterating over a table and making changes during the iteration made a
copy-on-write subtable, then merged it back into the original table when
it was done the loop, since the high-level semantic definition of
looping over a table is that you iterate over a copy of the table.

The only thing close to aliases are references to some other process's
input ports (i.e., multiple client-side sockets connected to a
server-side socket). If you want to share data (such as a file system or
program library), you put the data in a table in a process, and you hand
out client-side connections to the process. Mostly, you'd define such
connections as accepting a data value (for the file contents) with the
parameter being undefined upon return from the call, and the file name
as being read-only, for example. If you wanted to store the file, you
could just pass a pointer to its data (in the implementation). If you
wanted a copy of it, you would either copy it and pass the pointer, or
you'd pass the pointer with a flag indicating it's copy-on-write, or you
could pass the pointer and have the caller copy it at some point before
returning, depending on what the caller did with it. The semantics were
high-level with the intent to allow the compiler lots of leeway in
implementation, not unlike SQL.

Not to mention the issue with
uninitialized variables that I would expect occuring all over the place.

The typestate tracks this, and prevents you from using uninitialized
variables. If you do a read (say, from a socket) and it throws an "end
of data" exception, the compiler prevents you from using the buffer you
just tried but failed to read.

Indeed, that's the very point of it all. By doing this, you can run
"untrusted" code in the same address space as trusted code, and be
assured that the compiler will prevent the untrusted code from messing
up the trusted code. The predecessor of Hermes (NIL) was designed to let
IBM's customers write efficient networking code and emulations and such
that ran in IBM's routers, without the need for expensive (in
performance or money) hardware yet with the safety that they couldn't
screw up IBM's code and hence cause customer service problems.

So unless I'm misunderstanding something, this feels like trading one
evil for an even greater one.

In truth, it was pretty annoying. But more because you wound up having
to write extensive declarations and compile the declarations before
compiling the code that implements them and such. That you didn't get to
use uninitialized variables was a relatively minor thing, especially
given that many languages nowadays complain about uninitialized
variables, dead code, etc. But for lots of types of programs, it let you
do all kinds of things with a good assurance that they'd work safely and
efficiently. It was really a language for writing operating systems in,
when you get right down to it.

--
Darren New / San Diego, CA, USA (PST)
This octopus isn't tasty. Too many
tentacles, not enough chops.

Jul 14 '06 #622

Joe Marshall

Marshall wrote:

Joe Marshall wrote:
Marshall wrote:
>
Consider the following Java fragment:
>
void foo() {
int i = 0;
int j = 0;
>
// put any code here you want
>
j = 1;
i = 2;
// check value of j here. It is still 1, no matter what you filled in
above.
// The assignment to i cannot be made to affect the value of j.
>
}
True, but you have hidden the pointers. Semantically, the identifiers
i and j refer not to integers but to locations that hold integers. The
assignment modifies the location.

What the implementation looks like shouldn't affect how we speak
of the logical model. In the above code, there are no pointers.

By your definition, "pointer" and "variable" are synonyms. That doesn't
seem like a good idea to me. (What if i and j end up in registers?
I have not heard it said that registers have addresses.)

You probably won't like this, but....

The example you give above is a bit ambiguous as to whether or not it
shows `true' mutation. So what do I mean by `true' mutation? The code
above could be transformed by alpha-renaming into an equivalent version
that does no mutation:

void foo() {
int i = 0;
int j = 0;

// put any code here you want

int jj = 1;
int ii = 2;
// rename all uses of i to ii and j to jj after this point.

}

Once I've done the renaming, we're left with a pure function. I claim
that this sort of `mutation' is uninteresting because it can be
trivially eliminated through renaming. Since we can eliminate it
through renaming, we can use a simplified semantics that does not
involve a `store', and use a substitution model of evaluation.

The more interesting kind of mutation cannot be eliminated through
renaming. The semantic model requires the notion of a stateful
`store', and the model is not referentially transparent: substitution
does not work. These qualities are what make mutation problematic. My
claim is that this kind of mutation cannot be had without some model of
pointers and references.

There is a further distinguishing characteristic: Can I write a
program that detects the mutation (and acts differently depending on
whether there is mutation or not)? It is unclear from your example
above whether you intended this to be possible, but certainly there is
no way for a caller of foo to tell that foo reassigns an internal
variable. If there a procedure is extrinsically a pure function, then
there is no real meaning to any `mutation' that goes on inside; there
is always a way to turn the internal mutation into a pure function
through alpha-renaming.

>
Clearly there is *some* difference between a language which allows
explicit pointers and thus aliasing and one that doesn't. What term
would you use? First-class variables?

My argument to you in re this statement:

Again, I disagree: it is posible to have mutability without
pointers/identity/objects.

is that mutability without a notion of pointers, identity, or objects
is trivially eliminated through alpha renaming and thus isn't the
`mutability' of interest.

Jul 14 '06 #623

Joachim Durchholz

Marshall schrieb:

Joachim Durchholz wrote:
>Marshall schrieb:
>>What about my example of SQL? Mutation, no pointers, no aliasing.
Yet: useful.
Sorry, but SQL does have aliasing.

Well. I suppose we do not have an agreed upon definition
of aliasing, so it is hard to evaluate either way. I would
propose using the same one you used for identity:

if there are two variables and modifying one also modifies
the other, then there is aliasing between them.

I think that's an adequate example.
For a definition, I'd say it's a bit too narrow unless we use a fairly
broad definition for "variable". I.e. in a C context, I'd say that a->b
is a variable, too, as would be foo(blah)->x.

I avoided mentioning equality to include, for example,
having an array i that is an alias to a subset of array j.

This would mean that there's aliasing between, say, a list of
transaction records and the balance of the account (since modifying the
list of transactions will change the balance, unless the object isn't
properly encapsulated).
For purposes of this discussion, it's probably fair to say "that's a
form of aliasing, too", even though it's quite indirect.

>E.g. if you have records that have name="John", surname="Doe", the
statements
SELECT * FROM persons WHERE name = "John"
and
SELECT * FROM persons WHERE name = "Doe"
are aliases of each other.

Arrrrrgh... I made a most braindamaged, stupid mistake here. The second
SELECT should have used the *surname* field, so it would be

> SELECT * FROM persons WHERE surname = "Doe"

Then, if there's a record that has name = "John", surname = "Doe", the
two WHERE clauses have aliasing in the form of overlapping result sets.

>The alias is actually in the WHERE clause.

Not by my definition, because there is only one variable here.

Sorry, my wording was sloppy.

I meant to say that 'in SQL, you identify records via clauses like WHERE
name = "John", i.e. WHERE clauses are a kind of identity'.

This is still not precise - the identity of an SQL record isn't
explicitly accessible (except the nonstandard OID facility that some SQL
engines offer). Nevertheless, they do have an identity, and there's a
possibility of aliasing - if you change all Johns, you may also change a
Doe.

>And this *can* get you into
trouble if you have something that does
UPDATE ... WHERE name = "John"
and
UPDATE ... WHERE surname = "Doe"
e.g. doing something with the Johns, then updating the names of all
Does, and finally restoring the Johns (but not recognizing that changing
the names of all Does might have changed your set of Johns).

The fact that some person might get confused about the
semantics of what they are doing does not indicate aliasing.
It is easy enough to do an analysis of your updates and
understand what will happen; this same analysis is impossible
with two arbitrary pointers, unless one has a whole-program
trace. That strikes me as a significant difference.

Sure. I said that aliases in SQL aren't as bad as in other programs.

Once you get abstraction mixed in, the analysis becomes less
straightforward. In the case of SQL, views are such an abstraction
facility, and they indeed can obscure what you're doing and make the
analysis more difficult. If it's just SQL we're talking about, you
indeed have to look at the whole SQL to check whether there's a view
that may be involved in the queries you're analysing, so the situation
isn't *that* different from pointers - it's just not a problem because
the amount of code is so tiny!

>Conceptually, this is just the same as having two different access path
to the same memory cell. Or accessing the same global variable through a
call-by-reference parameter and via its global name.

There are similarities, but they are not the same.

What are the relevant differences? How does the semantics of a WHERE
clause differ from that of a pointer, in terms of potential aliasing?

My line of argument would be this:
Pointers can be simulated using arrays, so it's no surprise that arrays
can emulate all the aliasing of pointers.
Arrays can be simulated using WHERE (with SELECT and UPDATE), so I'd say
that SQL can emulate all the aliasing of arrays, and hence that of pointers.

>BTW with views, you get not just aliasing but what makes aliasing really
dangerous. Without views, you can simply survey all the queries that you
are working with and lexically compare table and field names to see
whether there's aliasing. With views, the names that you see in a
lexical scope are not enough to determine aliasing.
E.g. if you use a view that builds upon the set of Johns but aren't
aware of that (possibly due to abstraction barriers), and you change the
name field of all Does, then you're altering the view without a chance
to locally catch the bug. That's just as bad as with any pointer
aliasing problem.

It is certainly aliasing, but I would not call it "just as bad." There
are
elaborate declarative constraint mechanisms in place, for example.

Yes, but they can give you unexpected results.
It smells a bit like closing the gates after the horses are out.

On the other hand, *if* there is identity and updates, no matter how we
handle them, there must be *some* way to deal with the problems that
ensue. Having declarative constraints doesn't sound like the worst way
to do that.

And the definition of the view itsef is a declarative, semantic
entity. Whereas a pointer is an opaque, semantics-free address.

Oh, that's a very BCPL view. It was still somewhat true for K&R C, but
it's not really applicable to ANSI C, not at all to C++, and even less
to languages that associate well-encapsulated types with pointers (such
as Ada, Eiffel, or Java).
Unless, of course, you mean "address" if you say "pointer" ;-)

Regards,
Jo

Jul 14 '06 #624

Joachim Durchholz

Marshall schrieb:

Joachim Durchholz wrote:
>You can have aliasing without pointers; e.g. arrays are fully sufficient.
If i = j, then a [i] and a [j] are aliases of the same object.

I am having a hard time with this very broad definition of aliasing.
Would we also say that a[1+1] and a[2] are aliases? It seems
to me, above, that we have only a, and with only one variable
there can be no aliasing.

a[1] is a variable, too. You can assign values to it.

Inside function foo when called as foo(a[1]), it may even be referenced
with yet another name and updated in isolation; foo may not even know
it's an array element. (This is sort-of true in C, and definitely true
in languages that don't have pointer arithmetic. If you pass a[1] to a
call-by-reference parameter in Pascal, the callee has no way to access
the parameter as if it were an array element.)

A further question:

given a 32 bit integer variable x, and offsets i and j (equal as in
the above example) would you say that

x &= (1 << i)
and
x &= (1 << j)

are aliased expressions for setting a particular bit in x?

Yes.

I am not being facetious; I am trying to understand the limits
of your definition for aliasing.

Understood and appreciated.

>(After I observed that, I found it no longer a surprise that array
optimizations are what Fortran compiler teams sink most time into. The
aliasing problems are *exactly* the same as those with pointers in C -
though in practice, the Fortranistas have the advantage that the
compiler will usually know that a [i] and b [j] cannot be aliases of
each other, so while they have the same problems, the solutions give the
Fortranistas more leverage.)

I don't understand this paragraph. On the one hand, it seems you
are saying that C and Fortran are identically burdened with the
troubles caused by aliasing, and then a moment later it seems
you are saying the situation is distinctly better with Fortran.

That's exactly the situation.
Aliasing is present in both language, but in Fortran, the guarantee that
two names aren't aliased are easier to come by.
Even more non-aliasing guarantees exist if the language has a more
expressive type system. In most cases, if you know that two expressions
have different types, you can infer that they cannot be aliases.

Of course, the life of compiler writers if of less importance to us; I
added them only because the reports of Fortran compilers having aliasing
problems due to array indexes made me aware that pointers aren't the
only source of aliasing. That was quite a surprise to me then.

Now with this, it appears you are agreeing that SQL has an advantage
vis-a-vis aliasing compared to OO languages. Yes?

Sure.
I think that making an aliasing bug in SQL is just as easy as in any
pointer-ridden language, but transactions and constraints (plus the
considerably smaller typical size of SQL code) make it far easier to
diagnose any such bugs.

If so, we are agreeing on the part I care about, and the specifics of
just what we call aliasing are not so important to me.

I'm not sure whether I'm getting the same mileage out of this, but the
relevance of a problem is obviously a function on what one is doing on a
day-to-day basis, so I can readily agree with that :-)

Regards,
Jo

Jul 14 '06 #625

Chris Smith

Marshall wrote...

I am having a hard time with this very broad definition of aliasing.
Would we also say that a[1+1] and a[2] are aliases? It seems
to me, above, that we have only a, and with only one variable
there can be no aliasing.

The problem with this (and with the relational one as well) is that the
practical problems don't go away by redefining "variable" to mean larger
collections of values.

given a 32 bit integer variable x, and offsets i and j (equal as in
the above example) would you say that

x &= (1 << i)
and
x &= (1 << j)

are aliased expressions for setting a particular bit in x?

I don't know about Joachim, but yes, I would.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Jul 15 '06 #626

Chris F Clark

Joachim Durchholz wrote:

>
You can have aliasing without pointers; e.g. arrays are fully sufficient.
If i = j, then a [i] and a [j] are aliases of the same object.

"Marshall" <ma*************@gmail.comwrites:

I am having a hard time with this very broad definition of aliasing.
Would we also say that a[1+1] and a[2] are aliases? It seems
to me, above, that we have only a, and with only one variable
there can be no aliasing.

As you can see from the replies, all these things which you do not
consider aliasing are considered aliasing (by most everyone else, in
fact these are exactly what is meant by aliasing). There is good
reason for this. Let us explain this by looking again at the SQL
example that has been kicked around some.

> SELECT * FROM persons WHERE name = "John"
and
SELECT * FROM persons WHERE surname = "Doe"

Some set of records are shared by both select clauses. Those records
are the aliased records. If the set is empty there is no problem. If
we do not update the records from one of the select clauses, we also
have no problem. However, if we update the records for one of the
select clauses and the set of aliased records is not empty, the
records returned by the other select clause have changed. Therefore,
everything we knew to be true about the "other" select clause may or
may not still be true. It's the may not case we are worried about.

For example, in the "imperative Mi-5" Q asks Moneypenny to run the
first query (name = John) to get a list of agents for infiltrating
Spectre. In another department, they're doing sexual reassignments
and updating the database with the second query (surname = Doe) and
making the name become Jane (along with other changes to make the
transformation correct). So, when Q select "John Doe" from the first
list and sends that agent on a mission Q is really sending "Jane Doe"
and Spectre realizes that the agent is one and kills her. Not a happy
result.

In the "functional Mi-5", the sexual reassignment group, makes clones
of the agents reassigned, so that there is now both a "John Doe" and a
"Jane Doe". Of course, the payroll is a little more bloated, so that
when Q says "John Doe" isn't needed for any further assignments, the
"Garbage Collection" department does a reduction in force and gets rid
of "John Doe". The good news is that they didn't send Jane Doe to her
death.

The problem with aliasing, we want the things we knew true in one part
of our program, i.e. the Johns on Q's list are still Johns and not
Janes, to be true after we've run another part of our program. If the
second part of our program can change what was true after the first
part of our program, we can't depend on the results from the first
part of our program, i.e. Q can send Jane Doe into certain death.

The problem is compounded by the complexities of our programs. When Q
selected his list, he didn't know of the department doing the
reassignments (and couldn't know because they were part of a
top-secret project). So, since bureaucracies grow to meet the
increasing needs of the bureaucracy, often the solution is increased
complexity, more regulations and paperwork to fill out, semaphores and
locks to hold on critical data, read-only data, status requests, etc.
All to keep Q from killing Jane by accident. Sometimes they work. the
reassignment department has to wait 30-days for the clearance to
perform their operation in which time John Doe completes the
infiltration of Spectre saves the world from destruction and is ready
for his next assignment.

The key point is that each record (and even each field in each record)
if it can be known by two names, is an alias. It is not sufficient to
talk about "whole" variables as not being aliased if there is some way
to refer to some part of the variable and change that part of the
variable. Thus, a[1+1] is an alias for a[2] if you can have one part
of the code talking about one and another part of the code talking
about the other.

To put it one still final way, consider the following code;
assert(sorted(array a));
a[1] = 2;
assert(sorted(array a)); // is this still true?

Because a[1] is an alias of array a, you cannot be certain that the
2nd assert will not fire (fail) without additional analysis.

assert(sorted(array a));
assert(a[0] <= 2);
assert(2 <= a[2]);
a[1] = 2;
assert(sorted(array a)); // this is still true!

-Chris

Jul 15 '06 #627

Marshall

Joachim Durchholz wrote:

Marshall schrieb:

In some cases, you need an additional level of conceptual indirection -
instead of *doing* the updates, you write a function that *describes* them.
But then what do you do with that function?

I pass it to an engine that's imperative.

However, that engine doesn't need to do aliasing anymore.

In other words, I'm separating mutability and aliasing, so that they
can't combine and explode.

This would seem to be identical to what I am proposing.

Let's say I have an
employee database. John Smith just got hired on 1/1/2006 with
a salary of $10,000. I need to record this fact somewhere. How
do I do that without variables? Current-employees is a variable.
Even if I have the space to keep all historical data, so I'm not
deleting anything, I still have to have a variable for the latest
version of the accumulated data. I can solve this without
pointers, but I can't solve it without variables.

As I said elsewhere, the record has an identity even though it isn't
explicit in SQL.

Hmmmm. What can this mean?

In general, I feel that "records" are not the right conceptual
level to think about.

In any event, I am not sure what you mean by non-explicit
identity. I would say, records in SQL have value, and their
identity is exactly their value. I do not see that they have
any identity outside of their value. We can uniquely identify
any particular record via a key, but a table may have more
than one key, and an update may change the values of one
key but not another. So it is not possible in general to
definitely and uniquely assign a mapping from each record
of a table after an update to each record of the table before
the update, and if you can't do that, then where
is the record identity?

(The situation is even more dire when one considers that
SQL actually uses bag semantics and not set semantics.)
[...]

I should like to learn more about these. I have some vague
perception of the existence of linear logic, but not much
else. However, I also already have an excellent solution
to the pointer aliasing problem, so I'm less motivated.

Pointers are just the most obvious form of aliasing. As I said
elsewhere, there are others: array indexing, by-reference parameters, or
even WHERE clauses in SQL statements. In fact, anything that selects an
updatable piece of data from a collection is an alias, and incurs the
same problems as pointers, though they may come in different disguises.

Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.
That is so general that I am concerned it has lost its ability to
identify problems specific to pointers.

Actually SQL has references - they are called "primary keys", but they
are references nevertheless.
I strongly object; this is quite incorrect. I grant you that from the
50,000 foot level they appear identical, but they are not. To
qualify as a reference, there need to be reference and dereference
operations on the reference datatype; there is no such operation
is SQL.

Would you say the relational algebra has references?

Yes.

Or, consider the classic prolog ancestor query. Let's say we're
setting up as follows

father(bob, joe).
father(joe, john).

Is "joe" a reference, here? After all, when we do the ancestor
query for john, we'll see his father is joe and then use that to
find joe's father. Keys in SQL are isomorphic to joe in the
above prolog.

Agreed. They are both references ;-P

Again, by generalizing the term this far, I am concerned with a
loss of precision. If "joe" in the prolog is a references, then
"reference" is just a term for "data" that is being used in a
certain way. The conection with a specfic address space
has been lost in favor of the full domain of the datatype.

(Some SQL dialects also offer synthetic
"ID" fields that are guaranteed to remain stable over the lifetime of a
record.
Primary keys are updatable; there is nothing special about them.

Right - I was wrong with identifying keys and identities. In fact the
identity of an SQL record cannot be directly obtained (unless via those
ID fields).
The records still have identities. It's possible to have two WHERE
clauses that refer to the same record, and if you update the record
using one WHERE clause, the record returned by the other WHERE clause
will have changed, too.

Is this any different from saying that an expression that includes
a variable will produce a different value if the variable changes?

In other words, "i+1" will return a different value if we update i.

It seems odd to me to suggest that "i+1" has identity. I can see
that i has identity, but I would say that "i+1" has only value.

But perhaps the ultimate upshoot of this thread is that my use
of terminology is nonstandard.

With a "repeatable read" isolation level, you actually return to a
declarative view of the database: whatever you do with it, you won't see
it until you commit the transaction. (As soon as you commit, the
declarative peace is over and you better watch out that your data
doesn't become inconsistent due to aliasing.)
Alas, transaction isolation levels are a performance hack.
I cannot defend them on any logical basis. (Aside: did you mean
serializable, rather than repeatable read?)

Possibly. There are so many isolation levels that I have to look them up
whenever I want to get the terminology 100% correct.

Hmmm. Is it that there are so many, or that they are simply not
part of our daily concern? It seems to me there are more different
styles of parameter passing than there are isolation levels, but
I don't usually see (competent) people (such as yourself) getting
call-by-value confused with call-by-reference.

The only thing that can really be done about it is not adding it
artificially into a program. In those cases where aliasing is part of
the modelled domain, you really have to carefully inspect all
interactions and never, never, never dream about abstracting it away.
Yes, aliasing introduces a lot of problems. This is one reason
why closures make me nervous.

Me too.
On the other hand, they are so immensely useful.
Simply don't create closures with mutable data in them. Or, to the very
least, make sure that no aliases outside the closure exist.

Thank you for the interesting exchange.
Marshall

Jul 15 '06 #628

Joachim Durchholz

Marshall schrieb:

Joachim Durchholz wrote:
>As I said elsewhere, the record has an identity even though it isn't
explicit in SQL.

Hmmmm. What can this mean?

In general, I feel that "records" are not the right conceptual
level to think about.

They are, when it comes to aliasing of mutable data. I think it's
justified by the fact that aliased mutable data has a galling tendency
to break abstraction barriers. (More on this on request.)

In any event, I am not sure what you mean by non-explicit
identity.

The identity isn't visible from inside SQL. (Unless there's an OID
facility available, which *is* an explicit identity.)

I would say, records in SQL have value, and their
identity is exactly their value.

Definitely not. You can have two equal records and update just one of
them, yielding non-equal records; by my definition (and by intuition),
this means that the records were equal but not identical.

I do not see that they have
any identity outside of their value. We can uniquely identify
any particular record via a key, but a table may have more
than one key, and an update may change the values of one
key but not another. So it is not possible in general to
definitely and uniquely assign a mapping from each record
of a table after an update to each record of the table before
the update, and if you can't do that, then where
is the record identity?

Such a mapping is indeed possible. Simply extend the table with a new
column, number the columns consecutively, and identify the records via
that column.

But even if you don't do that, there's still identity. It is irrelevant
whether the programs can directly read the value of the identity field;
the adverse effects happen because updates are in-place. (If every
update generated a new record, then we'd indeed have no identity.)

Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.

No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.

There's another aspect here. If two expressions are always aliases to
the same mutable, that's usually easy to determine; this kind of
aliasing is usually not much of a problem.
What's more of a problem are those cases where there's occasional
aliasing. I.e. a[i] and a[j] may or may not be aliases of each other,
depending on the current value of i and j, and *that* is a problem
because the number of code paths to be tested doubles. It's even more of
a problem because testing with random data will usually not uncover the
case where the aliasing actually happens; you have to go around and
construct test cases specifically for the code paths that have aliasing.
Given that references may cross abstraction barriers (actually that's
often the purpose of constructing a reference in the first place), this
means you have to look for your test cases at multiple levels of
software abstraction, and *that* is really, really bad.

That is so general that I am concerned it has lost its ability to
identify problems specific to pointers.

If the reference to "pointers" above means "references", then I don't
know about any pointer problems that cannot be reproduced, in one form
or the other, in any of the other aliasing mechanisms.

Again, by generalizing the term this far, I am concerned with a
loss of precision. If "joe" in the prolog is a references, then
"reference" is just a term for "data" that is being used in a
certain way. The conection with a specfic address space
has been lost in favor of the full domain of the datatype.

Aliasing is indeed a more general idea that goes beyond address spaces.

However, identity and aliasing can be defined in fully abstract terms,
so I welcome this opportunity to get rid of a too-concrete model.

>The records still have identities. It's possible to have two WHERE
clauses that refer to the same record, and if you update the record
using one WHERE clause, the record returned by the other WHERE clause
will have changed, too.

Is this any different from saying that an expression that includes
a variable will produce a different value if the variable changes?

Yes.
Note that the WHERE clause properly includes array indexing (just set up
a table that has continuous numeric primary key, and a single other column).

I.e. I'm not talking about how a[i] is an alias of a[i+1] after updating
i, I'm talking about how a[i] may be an alias of a[j].

It seems odd to me to suggest that "i+1" has identity.

It doesn't (unless it's passed around as a closure, but that's
irrelevant to this discussion).
"i" does have identity. "a[i]" does have identity. "a[i+1]" does have
identity.
Let me say that for purposes of this discussion, if it can be assigned
to (or otherwise mutated), it has identity. (We *can* assign identity to
immutable things, but it's equivalent to equality and not interesting
for this discussion.)

I can see
that i has identity, but I would say that "i+1" has only value.

Agreed.

But perhaps the ultimate upshoot of this thread is that my use
of terminology is nonstandard.

It's somewhat restricted, but not really nonstandard.

>Possibly. There are so many isolation levels that I have to look them up
whenever I want to get the terminology 100% correct.

Hmmm. Is it that there are so many, or that they are simply not
part of our daily concern?

I guess it's the latter. IIRC there are four or five isolation levels.

It seems to me there are more different
styles of parameter passing than there are isolation levels, but
I don't usually see (competent) people (such as yourself) getting
call-by-value confused with call-by-reference.

Indeed.
Though the number of parameter passing mechanisms isn't that large
anyway. Off the top of my head, I could recite just three (by value, by
reference, by name aka lazy), the rest are just combinations with other
concepts (in/out/through, most notably) or a mapping to implementation
details (by reference vs. "pointer by value" in C++, for example).

Regards,
Jo

Jul 15 '06 #629

Marshall

Joachim Durchholz wrote:

Marshall schrieb:
Joachim Durchholz wrote:
As I said elsewhere, the record has an identity even though it isn't
explicit in SQL.
Hmmmm. What can this mean?

In general, I feel that "records" are not the right conceptual
level to think about.

They are, when it comes to aliasing of mutable data. I think it's
justified by the fact that aliased mutable data has a galling tendency
to break abstraction barriers. (More on this on request.)

I can see how pointers, or other kinds of aliasing
(by my more restricted definition) break abstraction barries;
it is hard to abstract something that can change out from
under you uncontrollably.

In any event, I am not sure what you mean by non-explicit
identity.

The identity isn't visible from inside SQL. (Unless there's an OID
facility available, which *is* an explicit identity.)

Agreed about OIDs.

I would say, records in SQL have value, and their
identity is exactly their value.

Definitely not. You can have two equal records and update just one of
them, yielding non-equal records; by my definition (and by intuition),
this means that the records were equal but not identical.

This sort of thing comes up when one has made the mistake of
not defining any keys on one's table and needs to rectify the
situation. It is in fact considered quite a challenge to do.
My preferred technique for such a situation is to create
a new table with the same columns and SELECT DISTINCT ...
INTO ... then recreate the original table. So that way doesn't
fit your model.

How else might you do it? I suppose there are nonstandard
extensions, such as UPDATE ... LIMIT. And in fact there
might be some yucky thing that made it in to the standard
that will work.

But what you descrbe is certainly *not* possible in the
relational algebra; alas that SQL doesn't hew closer
to it. Would you agree? Would you also agree that
if a language *did* adhere to relation semantics,
then relation elements would *not* have identity?
(Under such a language, one could not have two
equal records, for example.)

[The above paragraph is the part of this post that I
really care about; feel free to ignore the rest if it's naive
or annoying or whatever, but please do respond to this.]

I do not see that they have
any identity outside of their value. We can uniquely identify
any particular record via a key, but a table may have more
than one key, and an update may change the values of one
key but not another. So it is not possible in general to
definitely and uniquely assign a mapping from each record
of a table after an update to each record of the table before
the update, and if you can't do that, then where
is the record identity?

Such a mapping is indeed possible. Simply extend the table with a new
column, number the columns consecutively, and identify the records via
that column.

That doesn't work for me. It is one thing to say that for all
tables T, for all elements e in T, e has identity. It is a different
thing to say that for all tables T, there exists a table T' such
that for all elements e in T', e has identity.

But even if you don't do that, there's still identity. It is irrelevant
whether the programs can directly read the value of the identity field;
the adverse effects happen because updates are in-place. (If every
update generated a new record, then we'd indeed have no identity.)

Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.

No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.

To me, the SQL with the where clause is like i+2, not like a[2].
A query is simply an expression that makes mention of a variable.
However I would have to admit that if SQL table elements have
identity, it would be more like the array example.

(The model of SQL I use in my head is one with set semantics,
and I am careful to avoid running in to bag semantics by, for
example, always defining at least one key, and using SELECT
DISTINCT where necessary. But it is true that the strict set
semantics in my head are not the wobbly bag semantics in
the standard.)

There's another aspect here. If two expressions are always aliases to
the same mutable, that's usually easy to determine; this kind of
aliasing is usually not much of a problem.
What's more of a problem are those cases where there's occasional
aliasing. I.e. a[i] and a[j] may or may not be aliases of each other,
depending on the current value of i and j, and *that* is a problem
because the number of code paths to be tested doubles. It's even more of
a problem because testing with random data will usually not uncover the
case where the aliasing actually happens; you have to go around and
construct test cases specifically for the code paths that have aliasing.
Given that references may cross abstraction barriers (actually that's
often the purpose of constructing a reference in the first place), this
means you have to look for your test cases at multiple levels of
software abstraction, and *that* is really, really bad.

That is so general that I am concerned it has lost its ability to
identify problems specific to pointers.

If the reference to "pointers" above means "references", then I don't
know about any pointer problems that cannot be reproduced, in one form
or the other, in any of the other aliasing mechanisms.

It seems we agree that, for example, raw C pointers can easily
cause aliasing problems. To me, Java references, which are much
more tightly controlled, are still subject to aliasing problems,
although
in practice the situation is significantly less severe. (It is possible
to have multiple references to a single Java mutable object and
not know it.) I would assume the same is posible with, say, SML
refs.

However the situation in SQL strikes me as being different.
If one is speaking of base tables, and one has two queries,
one has everything one needs to know simply looking at
the queries. If views are involved, one must further examine
the definition of the view.

For example, we might ask in C, if we update a[i], will
the value of b[j] change? And we can't say, because
we don't know if there is aliasing. But in Fortran, we
can say that it won't, because they are definitely
different variables. But even so in Fortran, if we ask
if we update a[i], will a[j] change, and we know we
can't tell unless we know the values of i and j.

In SQL, if we update table T, will a SELECT on table
S change? We know it won't; they are different variables.
But if we update T, will a select on T change? It
certainly might, but I wouldn't call it aliasing, because
the two variables involved are the same variable, T.

Again, by generalizing the term this far, I am concerned with a
loss of precision. If "joe" in the prolog is a references, then
"reference" is just a term for "data" that is being used in a
certain way. The conection with a specfic address space
has been lost in favor of the full domain of the datatype.

Aliasing is indeed a more general idea that goes beyond address spaces.

However, identity and aliasing can be defined in fully abstract terms,
so I welcome this opportunity to get rid of a too-concrete model.

The records still have identities. It's possible to have two WHERE
clauses that refer to the same record, and if you update the record
using one WHERE clause, the record returned by the other WHERE clause
will have changed, too.
Is this any different from saying that an expression that includes
a variable will produce a different value if the variable changes?

Yes.
Note that the WHERE clause properly includes array indexing (just set up
a table that has continuous numeric primary key, and a single other column).

I am still not convinced. It is not sufficient to show how sometimes
you can establish identity, or how you could establish identity based
on some related table; it must be shown how the identity can be
established in general, and I don't think it can. If it can't, then
I claim we are back to expressions that include a variable, and
not aliasing, even by the wider definition.

I.e. I'm not talking about how a[i] is an alias of a[i+1] after updating
i, I'm talking about how a[i] may be an alias of a[j].

It seems odd to me to suggest that "i+1" has identity.

It doesn't (unless it's passed around as a closure, but that's
irrelevant to this discussion).
"i" does have identity. "a[i]" does have identity. "a[i+1]" does have
identity.
Let me say that for purposes of this discussion, if it can be assigned
to (or otherwise mutated), it has identity. (We *can* assign identity to
immutable things, but it's equivalent to equality and not interesting
for this discussion.)

I can see
that i has identity, but I would say that "i+1" has only value.

Agreed.

Cool.

But perhaps the ultimate upshoot of this thread is that my use
of terminology is nonstandard.

It's somewhat restricted, but not really nonstandard.

Whew!

Possibly. There are so many isolation levels that I have to look them up
whenever I want to get the terminology 100% correct.
Hmmm. Is it that there are so many, or that they are simply not
part of our daily concern?

I guess it's the latter. IIRC there are four or five isolation levels.

Yes, four.

It seems to me there are more different
styles of parameter passing than there are isolation levels, but
I don't usually see (competent) people (such as yourself) getting
call-by-value confused with call-by-reference.

Indeed.
Though the number of parameter passing mechanisms isn't that large
anyway. Off the top of my head, I could recite just three (by value, by
reference, by name aka lazy), the rest are just combinations with other
concepts (in/out/through, most notably) or a mapping to implementation
details (by reference vs. "pointer by value" in C++, for example).

Fair enough. I could only remember three, but I was sure someone
else could name more. :-)
Marshall

Jul 16 '06 #630

Joachim Durchholz

Marshall schrieb:

Joachim Durchholz wrote:
>Marshall schrieb:
>>I would say, records in SQL have value, and their
identity is exactly their value.

Definitely not. You can have two equal records and update just one of
them, yielding non-equal records; by my definition (and by intuition),
this means that the records were equal but not identical.

This sort of thing comes up when one has made the mistake of
not defining any keys on one's table and needs to rectify the
situation. It is in fact considered quite a challenge to do.
My preferred technique for such a situation is to create
a new table with the same columns and SELECT DISTINCT ...
INTO ... then recreate the original table. So that way doesn't
fit your model.

I was mentioning multiple equal records just as an example where the
records have an identity that's independent of the record values.

How else might you do it? I suppose there are nonstandard
extensions, such as UPDATE ... LIMIT. And in fact there
might be some yucky thing that made it in to the standard
that will work.

Right, it might be difficult to update multiple records that are exactly
the same. It's not what SQL was intended for, and difficult to do anyway
- I was thinking of LIMIT (I wasn't aware that it's nonstandard), and I
agree that there may be other ways to do it.

However, I wouldn't overvalue that case. The Jane/John Doe example
posted elsewhere in this thread is strictly within the confines of what
SQL was built for, yet there is aliasing.

But what you descrbe is certainly *not* possible in the
relational algebra; alas that SQL doesn't hew closer
to it. Would you agree?

Yup, SQL (particularly its update semantics) aren't relational semantics.
Still, it's SQL we've been talking about.

Would you also agree that
if a language *did* adhere to relation semantics,
then relation elements would *not* have identity?
(Under such a language, one could not have two
equal records, for example.)

Any identity that one could define within relational semantics would be
just equality, so no, it wouldn't be very helpful (unless as part of a
greater framework).
However, that's not really a surprise. Aliasing becomes relevant (even
observable) only when there's updates, and relational algebra doesn't
have updates.

> I do not see that they have
any identity outside of their value. We can uniquely identify
any particular record via a key, but a table may have more
than one key, and an update may change the values of one
key but not another. So it is not possible in general to
definitely and uniquely assign a mapping from each record
of a table after an update to each record of the table before
the update, and if you can't do that, then where
is the record identity?
Such a mapping is indeed possible. Simply extend the table with a new
column, number the columns consecutively, and identify the records via
that column.

That doesn't work for me. It is one thing to say that for all
tables T, for all elements e in T, e has identity. It is a different
thing to say that for all tables T, there exists a table T' such
that for all elements e in T', e has identity.

Let me continue that argument:
Records in T' have identity.
From an SQL point-of-view, there's no difference between the records in
T' and records in other tables, so they must share all properties, in
particular that of having an identity.

(I agree that's not as convincing as seeing aliasing in action. See below.)

>But even if you don't do that, there's still identity. It is irrelevant
whether the programs can directly read the value of the identity field;
the adverse effects happen because updates are in-place. (If every
update generated a new record, then we'd indeed have no identity.)

>>Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.
No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.

To me, the SQL with the where clause is like i+2, not like a[2].

I'd say WHERE is like [i+2]: neither is valid on its own, it's the
"selector" part of a reference into a table.

A query is simply an expression that makes mention of a variable.
However I would have to admit that if SQL table elements have
identity, it would be more like the array example.

OK.

(The model of SQL I use in my head is one with set semantics,
and I am careful to avoid running in to bag semantics by, for
example, always defining at least one key, and using SELECT
DISTINCT where necessary. But it is true that the strict set
semantics in my head are not the wobbly bag semantics in
the standard.)

Set-vs.-bag doesn't affect SQL's aliasing in general, it was a specific
property of the example that I chose. So you don't have to worry wether
that might be the cause of the misunderstanding.

>>That is so general that I am concerned it has lost its ability to
identify problems specific to pointers.

If the reference to "pointers" above means "references", then I don't
know about any pointer problems that cannot be reproduced, in one form
or the other, in any of the other aliasing mechanisms.

It seems we agree that, for example, raw C pointers can easily
cause aliasing problems. To me, Java references, which are much
more tightly controlled, are still subject to aliasing problems,
although
in practice the situation is significantly less severe. (It is possible
to have multiple references to a single Java mutable object and
not know it.)

Exactly.

I would assume the same is posible with, say, SML refs.

So do I.

However the situation in SQL strikes me as being different.
If one is speaking of base tables, and one has two queries,
one has everything one needs to know simply looking at
the queries. If views are involved, one must further examine
the definition of the view.

Sure. Usually, SQL itself is enough abstract that no additional
abstraction happens; so even if you have aliasing, you usually know
about it.
I.e. when compared to what you were saying about Java: "It is possible
to have multiple references to a single Java mutable object and
not know it.", the "and not know it" part is usually missing.

The picture changes as soon as the SQL statements get hidden behind
layers of abstraction, say result sets and prepared query handles.

.... Um... let me also restate the preconditions for aliasing become a
problem. You need three elements for that:
1) Identities that can be stored in multiple places.
2) Updates.
3) Abstraction (of updatable entities).

Without identities, there cannot be aliasing.
Without updates, the operation that makes aliasing problematic is excluded.
Without abstraction, you may have aliasing but can clearly identify it,
and don't have a problem.

I think the roadblock you're hitting is that you keep looking for
nonlocal trouble to identify spots where SQL might have aliasing; since
SQL isn't used with abstraction often, you come up empty and conclude
there's no aliasing.
My position is that not all aliasing is evil, and that there's a
technical definition: two program entities (l-expressions in C parlance)
are aliases if changing the referenced data through one also changes the
other.

For example, we might ask in C, if we update a[i], will
the value of b[j] change? And we can't say, because
we don't know if there is aliasing. But in Fortran, we
can say that it won't, because they are definitely
different variables.

Even that may be the case.
There's an EQUIVALENCE statement that can explicitly alias array sections.
Also, I suspect that newer versions of Fortran have reference parameters.

But even so in Fortran, if we ask
if we update a[i], will a[j] change, and we know we
can't tell unless we know the values of i and j.

In fact, that's the usual form of aliasing in Fortran.

In SQL, if we update table T, will a SELECT on table
S change? We know it won't; they are different variables.
But if we update T, will a select on T change? It
certainly might, but I wouldn't call it aliasing, because
the two variables involved are the same variable, T.

I fail to see how this is different from the aliasing in a[i] and a[j].

It's just the same kind of aliasing that I have with the statements
SELECT * FROM T WHERE key = $i
and
SELECT * FROM T WHERE key = $j
(where $i and $j are placeholders for variables from program variables).

After running
UPDATE T SET f = 5 WHERE key = $i
not only the first SELECT may give different results, the second may,
too (in those cases where $i and $j happen to have the same value).
I can also construct pointer-like aliasing in SQL. If I have
SELECT * FROM employees WHERE country = "US"
and
SELECT * FROM employees WHERE name = "Doe"
then somebody with a grudge against all US employees running
UPDATE employees SET salary = 0 WHERE country = "US"
will (on purpose or not) also modify the results of the second query.

Not a problem while all three queries are in plain view, but imagine
them wrapped in some opaque object or behind a module interface.

I am still not convinced. It is not sufficient to show how sometimes
you can establish identity, or how you could establish identity based
on some related table; it must be shown how the identity can be
established in general, and I don't think it can. If it can't, then
I claim we are back to expressions that include a variable, and
not aliasing, even by the wider definition.

See above.

I admit that identity cannot be reliably established in SQL. The
identity of a record isn't available (unless via OID extensions).
However, it is there and it can have effect.

Translated to real life, "my best friend" and "my employer" may be
referring to the same identity, even if these descriptions may change in
the future.

> It seems to me there are more different
styles of parameter passing than there are isolation levels, but
I don't usually see (competent) people (such as yourself) getting
call-by-value confused with call-by-reference.
Indeed.
Though the number of parameter passing mechanisms isn't that large
anyway. Off the top of my head, I could recite just three (by value, by
reference, by name aka lazy), the rest are just combinations with other
concepts (in/out/through, most notably) or a mapping to implementation
details (by reference vs. "pointer by value" in C++, for example).

Fair enough. I could only remember three, but I was sure someone
else could name more. :-)

I got a private mail naming more parameter passing mechanisms. It's a
bit difficult define what's a parameter passing mechanism and what's
just an attribute for such a mechanism. If you count all combinations as
an extra mechanism, you can easily get dozens.

Regards,
Jo

Jul 16 '06 #631

Chris F Clark

"Marshall" <ma*************@gmail.comwrote:

In general, I feel that "records" are not the right conceptual
level to think about.

Unfortunately, they are the right level. Actually,the right level
might even be lower, the fields within a record, but that's moving
even farther away from the direction you wish to go. The right level
is the lowest level at which the programmer can see and manipulate
values. Thus, since a programmer can see and manipulate fields within
records in SQL that is the level we need to concern ourselves with
when talking about aliasing in SQL. In other words, since a SELECT
(or UPDATE) statement can talk about specific fields of specific
records within the table (not directly, but reliably, which I'll
explain in a moment) then those are the "primitive objects" in SQL
that can be aliased.

What I meant by "not directly, but reliably":

If you have a fixed database, and you do two selects which specify the
same sets of fields to be selected and the same keys to select records
with, one expects the two selects to return the same values. You do
not necessarily expect the records to be returned in the same order,
but you do expect the same set of records to be returned and the
records to have the same values in the same fields. Further, if you
do an update, you expect certain fields of certain records to change
(and be reflected in subsequent selects).

However, therein lies the rub, if you do a select on some records, and
then an update that changes those records, the records you have from
the select have either changed or show outdated values. If there is
some way to refer to the records you first selected before the update,
then you have an aliasing problem, but maybe one can't do that. One
could also have an aliasing problem, if one were allowed to do two
updates simultaneously, so that one update could changed records in
the middle of the other changing the records. However, I don't know
if that's allowed in the relational algebra either. (I think I finally
see your point, and more on that later.)

I do not see that they have
any identity outside of their value. We can uniquely identify
any particular record via a key, but a table may have more
than one key, and an update may change the values of one
key but not another. So it is not possible in general to
definitely and uniquely assign a mapping from each record
of a table after an update to each record of the table before
the update, and if you can't do that, then where
is the record identity?

I don't know if your point of having two keys within one table amkes a
difference. If one can uniquely identify a record, then it has
identity. The fact that there is another mapping where one may not be
able to uniquely identify the record is not necessarily relevant.

But what you descrbe is certainly *not* possible in the
relational algebra; alas that SQL doesn't hew closer
to it. Would you agree? Would you also agree that
if a language *did* adhere to relation semantics,
then relation elements would *not* have identity?
(Under such a language, one could not have two
equal records, for example.)

Ok, here I think I will explain my understanding of your point. If we
treat each select statement and each update statements as a separate
program, i.e. we can't save records from one select statement to a
later one (and I think in the relational algebra one cannot), then
each single (individual) select statement or update statement is a
functional program. That is we can view a single select statement as
doing a fold over some incoming data, but it cannot change the data.
Similarly, we can view an update statement as creating a new copy of
the table with mutations in it, but the update statement can only
refer to the original immutable table that was input. (Well, that's
atleast how I recall the relational algebra.) It is the second point,
about the semantics of the update statement which is important. There
are no aliasing issues because in the relational algebra one cannot
refer to the mutated values in an update statement, every reference in
an update statement is refering to the unmutated input (again, that's
my recollection). (Note, if my recollection is off, and one can refer
to the mutated records in an update statement, perhaps they can
explain why aliasing is not an issue in it.)

Now for some minor points:

For example, we might ask in C, if we update a[i], will
the value of b[j] change? And we can't say, because
we don't know if there is aliasing. But in Fortran, we
can say that it won't, because they are definitely
different variables.

Unfortunately, your statement about FORTRAN is not true, if a and b
are parameters to a subroutine (or members of a common block, or
equivalenced, or ...), then an update of a[i] might change b[j].

This is in fact, an important point, it is in subroutines (and
subroutine like things), where we have local names for things
(bindings) that are actually names for things which may or may not be
distinct, that aliasing is the primary issue. I don't recall the
relational algebra having subroutines, and even if it did they would
be unimportant, given that one could not refer within a single update
statement to the records that a subroutine changed (since one cannot
refer in an update statement to any record which has been changed, see
the caveats above).

Joachim Durchholz wrote:

Indeed.
Though the number of parameter passing mechanisms isn't that large
anyway. Off the top of my head, I could recite just three (by value, by
reference, by name aka lazy), the rest are just combinations with other
concepts (in/out/through, most notably) or a mapping to implementation
details (by reference vs. "pointer by value" in C++, for example).

Marshall again:

Fair enough. I could only remember three, but I was sure someone
else could name more. :-)

There actual are some more, but very rare, for example there was call-
by-text-string, which is sort of like call-by-name, but allows a
parameter to reach into the calling routine and mess with local
variables therein. Most of the rare ones have really terrible
semantics, and are perhaps best forgotten except to keep future
implementors from choosing them. For example, call-by-text-string is
really easy to implement in a naive interpreter that reparses each
statement at every invocation, but but it exposes the implementation
properties so blatantly that writing a different interpretor is nearly
impossible.

If I recall correctly, there are also some call-by- methods that take
into account networks and addressing space issues--call-by-reference
doesn't work if one cannot see the thing being referenced. Of coruse,
one must then ask whether one considers "remote-procedure-call" and/or
message-passing to be the same thing as a local call.

One last nit, Call-by-value is actually call-by-copy-in-copy-out when
generalized.

There are some principles that one can use to organize the parameter
passing methods. However, the space is much richer than people
commonly know about (and that is actually a good thing, that most
people aren't aware of the richness, simplicity is good).

-Chris

Jul 16 '06 #632

Dr.Ruud

Chris F Clark schreef:

If you have a fixed database, and you do two selects which specify the
same sets of fields to be selected and the same keys to select records
with, one expects the two selects to return the same values.

When your "fixed" means read-only, or (fully) locked, then yes.

Modern databases also have modes without locks, so without special
attention (like maybe your "fixed") the two selects do not necessarily
return the same set of records.

Further, if you
do an update, you expect certain fields of certain records to change
(and be reflected in subsequent selects).

Doesn't your "fixed" block updates?

However, therein lies the rub, if you do a select on some records, and
then an update that changes those records, the records you have from
the select have either changed or show outdated values.

Not necessarily outdated: values can be fixed in time for a purpose.
Example: calculating interest is done once per day, but the whole
process takes more than a day.

Some systems implemented a lock plus requery just before the update, to
check for unacceptable changes in the stored data; this to prevent
having to keep locks while waiting.

If there is
some way to refer to the records you first selected before the update,
then you have an aliasing problem, but maybe one can't do that. One
could also have an aliasing problem, if one were allowed to do two
updates simultaneously, so that one update could changed records in
the middle of the other changing the records.

Some databases allow you to travel back in time: run this query on the
data of 1 year ago. All previous values are kept "behind" the current
value.

--
Affijn, Ruud

"Gewoon is een tijger."

Jul 16 '06 #633

Marshall

Joachim Durchholz wrote:

Marshall schrieb:

But what you descrbe is certainly *not* possible in the
relational algebra; alas that SQL doesn't hew closer
to it. Would you agree?

Yup, SQL (particularly its update semantics) aren't relational semantics.
Still, it's SQL we've been talking about.

Would you also agree that
if a language *did* adhere to relation semantics,
then relation elements would *not* have identity?
(Under such a language, one could not have two
equal records, for example.)

Any identity that one could define within relational semantics would be
just equality, so no, it wouldn't be very helpful (unless as part of a
greater framework).
However, that's not really a surprise. Aliasing becomes relevant (even
observable) only when there's updates, and relational algebra doesn't
have updates.

Good point. Perhaps I should have said "relational algebra +
variables with assignment." It is interesting to consider
assignment vs. the more restricted update operators: insert,
update, delete. I think I see what you mean about the restricted
update operators working on identity.

>Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.
No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.
To me, the SQL with the where clause is like i+2, not like a[2].

I'd say WHERE is like [i+2]: neither is valid on its own, it's the
"selector" part of a reference into a table.

Is it possible you're being distracted by the syntax? WHERE is a
binary operation taking a relation and a filter function. I don't
think of filters as being like array indexing; do they appear
analogous to you? (Always a difficult question because often
times A and B share some qualities and not others, and
what does one call that?)

However the situation in SQL strikes me as being different.
If one is speaking of base tables, and one has two queries,
one has everything one needs to know simply looking at
the queries. If views are involved, one must further examine
the definition of the view.

Sure. Usually, SQL itself is enough abstract that no additional
abstraction happens; so even if you have aliasing, you usually know
about it.
I.e. when compared to what you were saying about Java: "It is possible
to have multiple references to a single Java mutable object and
not know it.", the "and not know it" part is usually missing.

Yes!

The picture changes as soon as the SQL statements get hidden behind
layers of abstraction, say result sets and prepared query handles.

... Um... let me also restate the preconditions for aliasing become a
problem. You need three elements for that:
1) Identities that can be stored in multiple places.
2) Updates.
3) Abstraction (of updatable entities).

Without identities, there cannot be aliasing.
Without updates, the operation that makes aliasing problematic is excluded.
Without abstraction, you may have aliasing but can clearly identify it,
and don't have a problem.

Aha! That description works very well for me.

I admit that identity cannot be reliably established in SQL. The
identity of a record isn't available (unless via OID extensions).
However, it is there and it can have effect.

Yes, I think I see what you mean now. This is in part a consequence
of the restricted update operators.

Thanks,

Marshall

Jul 16 '06 #634

Marshall

Chris F Clark wrote:

"Marshall" <ma*************@gmail.comwrote:
In general, I feel that "records" are not the right conceptual
level to think about.

Unfortunately, they are the right level. Actually,the right level
might even be lower, the fields within a record, but that's moving
even farther away from the direction you wish to go. The right level
is the lowest level at which the programmer can see and manipulate
values.

But how is this not always "the bit"?

I do not see that they have
any identity outside of their value. We can uniquely identify
any particular record via a key, but a table may have more
than one key, and an update may change the values of one
key but not another. So it is not possible in general to
definitely and uniquely assign a mapping from each record
of a table after an update to each record of the table before
the update, and if you can't do that, then where
is the record identity?

I don't know if your point of having two keys within one table amkes a
difference. If one can uniquely identify a record, then it has
identity. The fact that there is another mapping where one may not be
able to uniquely identify the record is not necessarily relevant.

The issue with two keys is that the keys may not *agree* on the
mapping before vs. after the update.

For example, we might ask in C, if we update a[i], will
the value of b[j] change? And we can't say, because
we don't know if there is aliasing. But in Fortran, we
can say that it won't, because they are definitely
different variables.

Unfortunately, your statement about FORTRAN is not true, if a and b
are parameters to a subroutine (or members of a common block, or
equivalenced, or ...), then an update of a[i] might change b[j].

Ah, common blocks. How that takes me back!

But your point is a good one; I oversimplified.

Marshall again:
Fair enough. I could only remember three, but I was sure someone
else could name more. :-)

There actual are some more, but very rare, for example there was call-
by-text-string, which is sort of like call-by-name, but allows a
parameter to reach into the calling routine and mess with local
variables therein. Most of the rare ones have really terrible
semantics, and are perhaps best forgotten except to keep future
implementors from choosing them. For example, call-by-text-string is
really easy to implement in a naive interpreter that reparses each
statement at every invocation, but but it exposes the implementation
properties so blatantly that writing a different interpretor is nearly
impossible.

If I recall correctly, there are also some call-by- methods that take
into account networks and addressing space issues--call-by-reference
doesn't work if one cannot see the thing being referenced. Of coruse,
one must then ask whether one considers "remote-procedure-call" and/or
message-passing to be the same thing as a local call.

One last nit, Call-by-value is actually call-by-copy-in-copy-out when
generalized.

There are some principles that one can use to organize the parameter
passing methods. However, the space is much richer than people
commonly know about (and that is actually a good thing, that most
people aren't aware of the richness, simplicity is good).

Fair enough.
Marshall

Jul 16 '06 #635

Chris Smith

We Chris's stick together, as always.

Marshall <ma*************@gmail.comwrote:

Unfortunately, they are the right level. Actually,the right level
might even be lower, the fields within a record, but that's moving
even farther away from the direction you wish to go. The right level
is the lowest level at which the programmer can see and manipulate
values.

But how is this not always "the bit"?

First of all, we should be consistent in speaking about the logical
language semantics, which may not include bits anyway.

That said, if bits were the primitive concept of data in a language,
then we'd be able to get away with talking about higher-level concepts
because we agree to always manipulate a larger structure (a byte, for
example) as a pure value. If we were using bitwise operators, then we
wouldn't be able to get away with it, for example. That said, it's also
quite possible to consider aliasing on higher levels as well; it's just
not possible to point out the lack of aliasing for higher levels of
abstraction, and thus conclude that no aliasing exists. Aliasing is
still possible for entities within those layers of abstraction.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Jul 16 '06 #636

Chris Smith

Marshall <ma*************@gmail.comwrote:

Is it possible you're being distracted by the syntax? WHERE is a
binary operation taking a relation and a filter function. I don't
think of filters as being like array indexing; do they appear
analogous to you? (Always a difficult question because often
times A and B share some qualities and not others, and
what does one call that?)

This is definitely a good question. Nevertheless, I think we're fooling
ourselves if we decide that we've made progress by defining part of the
problem out of existence. The original problem was that when there are
several ways of getting access to something that is identical (rather
than just equal), this causes problems for invariant checking. These
several ways of accessing data are being called 'aliasing' -- I have no
comment on whether that's standard usage or not, because I don't know.
I suspect that aliasing is normally used for something closer to the
physical level. The point is that whatever "aliasing" really means,
what we're discussing is having two paths by which one may access
identical data.

So there are infinitely complex ways in which this can occur. I can
have pointers that are aliased. I can have integers, which by
themselves are just plain values, but can alias each other as indices
into an array. I can have strings that do the same thing for an
associative array. I can, of course, have arbitrarily more complex data
structures with exactly the same issue. I can also have two different
WHERE clauses, which when applied to the same relation yield exactly the
same set of tuples. The problem arises when code is written to update
some value (or set of tuples, in the SQL case, since definitions differ
there) using one pointer/int/string/WHERE clause/etc, and at the same
time an invariant was written using the other pointer/int/WHERE/etc, the
result is that either of:

a) The compiler has to be able to prove that the two are not aliased

b) The compiler has to re-evaluate the invariant from scratch when this
operation is performed.

(Yeah, there's actually a whole continuum between a and b, based on more
limited abilities of the compiler to prove things. I'm oversimplifying,
and I think it's rather benign in this case.)

So in this sense, a WHERE clause is a binary operation in exactly the
same way that an array indexing expression is a binary operation, and
both are capable of aliasing in at least this logical sense.

Now it's certainly true that languages with unrestricted pointers are
less successful at limiting the scope of re-evaluation of invariants.
In the array or SQL relation cases, there's a limited set of value/tuple
modifications that might cause the invariant to need rechecking
(specifically those that point to the same array, or to the relation or
one of its views). A language with unrestricted untyped pointers, if it
doesn't get lucky with the capture analysis, may have to re-evaluate all
invariants in the application on any given assignment. Nevertheless,
the re-evaluation of the invariant still needs to be done.

--
Chris Smith - Lead Software Developer /Technical Trainer
MindIQ Corporation

Jul 16 '06 #637

Joachim Durchholz

Marshall schrieb:

>
Good point. Perhaps I should have said "relational algebra +
variables with assignment." It is interesting to consider
assignment vs. the more restricted update operators: insert,
update, delete.

Actually I see it the other way round: assignment is strictly less
powerful than DML since it doesn't allow creating or destroying
variables, while UPDATE does cover assignment to fields.
(However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
at which point there is not much of a difference.)

>>>>Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.
No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.

To me, the SQL with the where clause is like i+2, not like a[2].
I'd say WHERE is like [i+2]: neither is valid on its own, it's the
"selector" part of a reference into a table.

Is it possible you're being distracted by the syntax?

I think that's very, very unlikely ;-)

WHERE is a
binary operation taking a relation and a filter function. I don't
think of filters as being like array indexing; do they appear
analogous to you?

Yes.

Filters are just like array indexing: both select a subset of variables
from a collection. In SQL, you select a subset of a table, in a
programming language, you select a subset of an array.

(The SQL selection mechanism is far more flexible in the kinds of
filtering you can apply, while array indexing allows filtering just by
ordinal position. However, the relevant point is that both select things
that can be updated.)

>I admit that identity cannot be reliably established in SQL. The
identity of a record isn't available (unless via OID extensions).
However, it is there and it can have effect.

Yes, I think I see what you mean now. This is in part a consequence
of the restricted update operators.

I don't think so. You can update SQL records any way you want.
The unavailability of a record identity is due to the fact that, well,
it's unavailable in standard SQL.

Regards,
Jo

Jul 16 '06 #638

Marshall

Joachim Durchholz wrote:

Marshall schrieb:

Good point. Perhaps I should have said "relational algebra +
variables with assignment." It is interesting to consider
assignment vs. the more restricted update operators: insert,
update, delete.

Actually I see it the other way round: assignment is strictly less
powerful than DML since it doesn't allow creating or destroying
variables, while UPDATE does cover assignment to fields.

Oh, my.

Well, for all table variables T, there exists some pair of
values v and v', such that we can transition the value of
T from v to v' via assignment, but not by any single
insert, update or delete. The reverse is not true. I
would consider that a solid demonstration of which
one is more powerful. Further, it is my understanding
that your claim of row identity *depends* on the restricted
nature of DML; if the only mutator operation is assignment,
then there is definitely no record identity.

(However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
at which point there is not much of a difference.)

I am not sure what this means. Delete can be expressed in
terms of assignment. (As can insert and update.) (Assignment
can also be expressed in terms of insert and delete.) I
don't know what "new" would be in a value-semantics, relational
world. So I would say it's assignment vs. INSERT+UPDATE+DELETE,
but perhaps I'm not understanding what you mean.

Assignment by itself is complete. Insert and delete together
are complete.

>>>Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.
No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.
>>>
To me, the SQL with the where clause is like i+2, not like a[2].
I'd say WHERE is like [i+2]: neither is valid on its own, it's the
"selector" part of a reference into a table.
Is it possible you're being distracted by the syntax?

I think that's very, very unlikely ;-)

Well, I just thought I'd ask. :-)

WHERE is a
binary operation taking a relation and a filter function. I don't
think of filters as being like array indexing; do they appear
analogous to you?

Yes.

Filters are just like array indexing: both select a subset of variables
from a collection.

I can't agree with this wording. A filter produces a collection
value from a collection value. I don't see how variables
enter in to it. One can filter either a collection constant or
a collection variable; if one speaks of filtering a collection
variable, on is really speaking of filtering the collection value
that the variable currently contains; filtering is not an operation
on the variable as such, the way the "address of" operator is.
Note you can't update the result of a filter.

In SQL, you select a subset of a table, in a
programming language, you select a subset of an array.

(The SQL selection mechanism is far more flexible in the kinds of
filtering you can apply, while array indexing allows filtering just by
ordinal position. However, the relevant point is that both select things
that can be updated.)

When you have been saying "select things that can be updated"
I have been assuming you meant that one can derive values
from variables, and that some other operation can update that
variable, causing the expression, if re-evaluated, to produce
a different value. However the phrase also suggests that
you mean that the *result* of the select can *itself* be
updated. Which one do you mean? (Or is there a third
possibility?)

I admit that identity cannot be reliably established in SQL. The
identity of a record isn't available (unless via OID extensions).
However, it is there and it can have effect.
Yes, I think I see what you mean now. This is in part a consequence
of the restricted update operators.

I don't think so. You can update SQL records any way you want.
The unavailability of a record identity is due to the fact that, well,
it's unavailable in standard SQL.

I disagree. The concept of record-identity is quite tenuous, and
depends on specifics of SQL semantics. (In fact I'm not entirely
convinced it's definitely there even in SQL.) It certainly does not
hold in, say, set theory. Set elements do not have identity; they
only have value. If we have a set-semantics language that
supports set variables with assignment, we still do not have
element-identity.
Marshall

Jul 16 '06 #639

Chris Smith

David Hopwood <da******************@blueyonder.co.ukwrote:

Chris Smith wrote:
If checked by execution, yes. In which case, I am trying to get my head
around how it's any more true to say that functional languages are
compilable postconditions than to say the same of imperative languages.

A postcondition must, by definition[*], be a (pure) assertion about the
values that relevant variables will take after the execution of a subprogram.

Okay, my objection was misplaced, then, as I misunderstood the original
statement. I had understood it to mean that programs written in pure
functional languages carry no additional information except for their
postconditions.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Jul 16 '06 #640

Chris Smith

Darren New <dn**@san.rr.comwrote:

I'm not sure what linear or uniqueness typing is. It's typestate, and if
I remember correctly the papers I read 10 years ago, the folks at
TJWatson that invented Hermes also invented the concept of typestate.
They at least claim to have coined the term.

Coining the term is one thing, but I feel pretty confident that the idea
was not invented in 1986 with the Hermes language, but rather far
earlier. Perhaps they may have invented the concept of considering it
any different from other applications of types, though. I still have
trouble delineating how to consider "typestate" different from ordinary
types in formal terms, unless I make the formal model complex enough to
include some kind of programmer-defined identifiers such as variables.
The distinction is not at all relevant to common type system models
based on the lambda calculus.

While acknowledging, on the one hand, that the word "typestate" is used
to describe this, I also object that types have *always* been assigned
to expressions in differing type environments. Otherwise, it would be
impossible to assign types to lambda abstractions in the simply typed
lambda calculus, the simplest of all generally studied type systems.
What is being named here is the overcoming of a limitation that
programming language designers imposed upon themselves, whether from not
understanding the theoretical research or not believing it important, I
don't know.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Jul 17 '06 #641

Zoltan Somogyi

Andreas Rossberg <ro******@ps.uni-sb.dewrites:

>Uh, aliasing all over the place! Actually, I think that logic
programming, usually based on deep unification, brings by far the worst
incarnation of aliasing issues to the table.

I agree that deep unification, as implemented in Prolog, brings with it
lots of aliasing problems. However, these problems have been recognized
from the seventies, and people have tried to solve them. There have
been a huge variety of mode systems added to various dialects of Prolog
over the years, which each attempt to tackle (various parts of) the aliasing
problem, none of them fully successfully in my view, since their designers
usually actually *wanted* to retain at least *some* of the expressive power
of the logic variable.

Some non-Prolog logic languages have departed from this approach, the earliest
being the Relational Language. To tie this message to another thread, the
Relational Language had a strict mode system that is as identical as possible
to the notion of typestate in NIL and Hermes given the limitations of
comparing declarative apples with imperative oranges, but significantly
earlier, 1979 vs 1986 IIRC.

Marshall wrote:

>I have explored the OO path to its bitter end and am
convinced it is not the way. So what is left? Uniqueness
types and logic programming, I suppose. I enjoy logic
programming but it doesn't seem quite right. But notice:
no pointers there! And it doesn't seem to suffer from the
lack.

Of course, the main logic programming language today that disallows the use
of unification for arbitrary aliasing is Mercury. It enforces this through
a strong mode system. Our motivation for the design of this mode system
was precisely to eliminate the problems Andreas identified above. However
it has also turned out to be an excellent foundation for Mercury's notion of
unique modes, its equivalent of uniqueness types, which Mercury uses to
express I/O.

Zoltan Somogyi <zs@cs.mu.OZ.AUhttp://www.cs.mu.oz.au/~zs/
Department of Computer Science and Software Engineering, Univ. of Melbourne

Jul 17 '06 #642

Joachim Durchholz

Marshall schrieb:

Joachim Durchholz wrote:
>Marshall schrieb:
>>Good point. Perhaps I should have said "relational algebra +
variables with assignment." It is interesting to consider
assignment vs. the more restricted update operators: insert,
update, delete.
Actually I see it the other way round: assignment is strictly less
powerful than DML since it doesn't allow creating or destroying
variables, while UPDATE does cover assignment to fields.

Oh, my.

Well, for all table variables T, there exists some pair of
values v and v', such that we can transition the value of
T from v to v' via assignment, but not by any single
insert, update or delete.

I fail to see an example that would support such a claim.

On the other hand, UPDATE can assign any value to any field of any
record, so it's doing exactly what an assignment does. INSERT/DELETE can
create resp. destroy records, which is what new and delete operators
would do.

I must really be missing the point.

Further, it is my understanding
that your claim of row identity *depends* on the restricted
nature of DML; if the only mutator operation is assignment,
then there is definitely no record identity.

Again, I fail to connect.

I and others have given aliasing examples that use just SELECT and UPDATE.

>(However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
at which point there is not much of a difference.)

I am not sure what this means. Delete can be expressed in
terms of assignment. (As can insert and update.)

INSERT cannot be expressed in terms of assignment. INSERT creates a new
record; there's no way that assignment in a language like C can create a
new data structure!
The same goes for DELETE.

(Assignment can also be expressed in terms of insert and delete.)

Agreed.

I also realize that this makes it a bit more difficult to nail down the
nature of identity in a database. It's certainly not storage location:
if you DELETE a record and then INSERT it with the same values, it may
be allocated somewhere entirely else, and our intuition would say it's
not "the same" (i.e. not identical). (In a system with OID, it would
even be impossible to recreate such a record, since it would have a
different OID. I'm not sure whether this makes OID systems better or
worse at preserving identity, but that's just a side track.)

Since intuition gives me ambivalent results here, I'll go back to my
semiformal definition (and take the opportunity to make it a bit more
precise):
Two path expressions (lvalues, ...) are aliases if and only if the
referred-to values compare equal, and if they stay equal after applying
any operation to the referred-to value through either of the path
expressions.

In the context of SQL, this means that identity isn't the location where
the data is stored. It's also not the values stored in the record -
these may change, including key data. SQL record identity is local, it
can be defined from one operation to the next, but there is no such
thing as a global identity that one can memorize and look up years
later, without looking at the intermediate states of the store.

It's a gross concept, now that I think about it. Well, or at least
rather alien for us programmers, who are used to taking the address of a
variable to get a tangible identity that will stay stable over time.

On the other hand, variable addresses as tangible identities don't hold
much water anyway.
Imagine data that's written out to disk at program end, and read back
in. Further imagine that while the data is read into main memory,
there's a mechanism that redirects all further reads and writes to the
file into the read-in copy in memory, i.e. whenever any program changes
the data, all other programs see the change, too.
Alternatively, think about software agents that move from machine to
machine, carrying their data with them. They might be communicating with
each other, so they need some means of establishing identity
("addressing") the memory buffers that they use for communication.

I don't know what "new" would be in a value-semantics, relational
world.

It would be INSERT.

Um, my idea of "value semantics" is associated with immutable values.
SQL with INSERT/DELETE/UPDATE certainly doesn't match that definition.

So by my definition, SQL doesn't have value semantics, by your
definition, it would have value semantics but updates which are enough
to create aliasing problems, so I'm not sure what point you're making
here...

>Filters are just like array indexing: both select a subset of variables
from a collection.

I can't agree with this wording. A filter produces a collection
value from a collection value. I don't see how variables
enter in to it.

A collection can consist of values or variables.

And yes, I do think that WHERE is a selection over a bunch of variables
- you can update records after all, so they are variables! They don't
have a name, at least none which is guaranteed to be constant over their
lifetime, but they can be mutated!

One can filter either a collection constant or
a collection variable; if one speaks of filtering a collection
variable, on is really speaking of filtering the collection value
that the variable currently contains; filtering is not an operation
on the variable as such, the way the "address of" operator is.
Note you can't update the result of a filter.

If that's your definition of a filter, then WHERE is not a filter,
simple as that.

>In SQL, you select a subset of a table, in a
programming language, you select a subset of an array.

(The SQL selection mechanism is far more flexible in the kinds of
filtering you can apply, while array indexing allows filtering just by
ordinal position. However, the relevant point is that both select things
that can be updated.)

When you have been saying "select things that can be updated"
I have been assuming you meant that one can derive values
from variables, and that some other operation can update that
variable, causing the expression, if re-evaluated, to produce
a different value.

That's what I meant.

However the phrase also suggests that
you mean that the *result* of the select can *itself* be
updated.

The "that" in "things that can be updated" refers to the selected
things. I'm not sure how this "that" could be interpreted to refer to
the selection as a whole (is my understanding of English really that bad?)

Which one do you mean? (Or is there a third
possibility?)

I couldn't tell - I wouldn't have thought that there are even two
possibilities.

Regards,
Jo

Jul 17 '06 #643

Rob Warnock

Joachim Durchholz <jo@durchholz.orgwrote:
+---------------
| INSERT cannot be expressed in terms of assignment. INSERT creates a new
| record; there's no way that assignment in a language like C can create a
| new data structure! The same goes for DELETE.
+---------------

Well, what about "malloc()" & "free()"? I mean, if your
"database" is a simple linked list, then INSERT is just:

ROW *p = malloc(sizeof *p);
p->field1 = value1;
p->field2 = value2;
...
p->fieldN = valueN;
database = cons(p, database); /* Common Lisp's CONS */

and DELETE is just:

ROW *p = find_if(predicate_function, database); /* CL's FIND-IF */
database = delete(p, database); /* CL's DELETE */
free(p);

[There are single-pass methods, of course, but...]
-Rob

-----
Rob Warnock <rp**@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

Jul 17 '06 #644

Joachim Durchholz

Chris Smith schrieb:

David Hopwood <da******************@blueyonder.co.ukwrote:
>Chris Smith wrote:
>>If checked by execution, yes. In which case, I am trying to get my head
around how it's any more true to say that functional languages are
compilable postconditions than to say the same of imperative languages.

A postcondition must, by definition[*], be a (pure) assertion about the
values that relevant variables will take after the execution of a subprogram.

Okay, my objection was misplaced, then, as I misunderstood the original
statement. I had understood it to mean that programs written in pure
functional languages carry no additional information except for their
postconditions.

Oh, but that's indeed correct for pure functional languages. (Well, they
*might* carry additional information anyway, but it's not a requirement
to make it a programming language.)

The answer that I have for your original objection may be partial, but
here goes anyway:

Postconditions cannot easily capture all side effects.
E.g. assume there's a file-open routine but no way to test whether a
given file handle was ever opened (callers are supposed to test the
return code from the open() call).
In an imperative language, that's a perfectly valid (though possibly not
very good) library design.
Now the postcondition would have to say something like "from now on,
read() and write() are valid on the filehandle that I returned". I know
of no assertion language that can express such temporal relationships,
and even if there is (I'm pretty sure there is), I'm rather sceptical
that programmers would be able to write correct assertions, or correctly
interpret them - temporal logic offers several traps for the unwary.
(The informal postcondition above certainly isn't complete, since it
doesn't cover close(); I shunned the effort to get a wording that would
correctly cover sequences like open()-close()-open(), or
open()-close()-close()-open().)

Regards,
Jo

Jul 17 '06 #645

Joachim Durchholz

Rob Warnock schrieb:

Joachim Durchholz <jo@durchholz.orgwrote:
+---------------
| INSERT cannot be expressed in terms of assignment. INSERT creates a new
| record; there's no way that assignment in a language like C can create a
| new data structure! The same goes for DELETE.
+---------------

Well, what about "malloc()" & "free()"?

These do exactly what assignment cannot do.

The correspondence between SQL and C would be:
INSERT - malloc() (create a new data structure)
DELETE - free() (destroy a data structure)
UPDATE - assignment (change data within a data structure)

Regards,
Jo

Jul 17 '06 #646

Chris Smith

Joachim Durchholz <jo@durchholz.orgwrote:

I fail to see an example that would support such a claim.

On the other hand, UPDATE can assign any value to any field of any
record, so it's doing exactly what an assignment does. INSERT/DELETE can
create resp. destroy records, which is what new and delete operators
would do.

I must really be missing the point.

I *think* I understand Marshall here. When you are saying "assignment",
you mean assignment to values of attributes within tuples of the cell.
When Marshall is saying "assignment", he seems to mean assigning a
completely new *table* value to a relation; i.e., wiping out the entire
contents of the relation and replacing it with a whole new set of
tuples. Your assignment is indeed less powerful than DML, whereas
Marshall's assignment is more powerful than DML.

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Jul 17 '06 #647

Marshall

Chris Smith wrote:

Joachim Durchholz <jo@durchholz.orgwrote:
I fail to see an example that would support such a claim.

On the other hand, UPDATE can assign any value to any field of any
record, so it's doing exactly what an assignment does. INSERT/DELETE can
create resp. destroy records, which is what new and delete operators
would do.

I must really be missing the point.

I *think* I understand Marshall here. When you are saying "assignment",
you mean assignment to values of attributes within tuples of the cell.
When Marshall is saying "assignment", he seems to mean assigning a
completely new *table* value to a relation; i.e., wiping out the entire
contents of the relation and replacing it with a whole new set of
tuples. Your assignment is indeed less powerful than DML, whereas
Marshall's assignment is more powerful than DML.

Exactly.
Marshall

Jul 17 '06 #648

Marshall

Joachim Durchholz wrote:

Marshall schrieb:
Joachim Durchholz wrote:
Marshall schrieb:
Good point. Perhaps I should have said "relational algebra +
variables with assignment." It is interesting to consider
assignment vs. the more restricted update operators: insert,
update, delete.
Actually I see it the other way round: assignment is strictly less
powerful than DML since it doesn't allow creating or destroying
variables, while UPDATE does cover assignment to fields.
Oh, my.

Well, for all table variables T, there exists some pair of
values v and v', such that we can transition the value of
T from v to v' via assignment, but not by any single
insert, update or delete.

I fail to see an example that would support such a claim.

variable T : unary relation of int
T = { 1, 2, 3 }; // initialization
T := { 4, 5 }; // assignment

The above transition of the value of T cannot be be
done by any one single insert, update or delete.
Two would suffice, however. (In fact, any assignement
can be modeled at a full delete followed by an insert
of the new value.)

On the other hand, UPDATE can assign any value to any field of any
record,

Yes.

so it's doing exactly what an assignment does.

No. The variable is the table, not the records. Relations are not
arrays.
Records are not lvalues.

INSERT/DELETE can
create resp. destroy records, which is what new and delete operators
would do.

I must really be missing the point.

Further, it is my understanding
that your claim of row identity *depends* on the restricted
nature of DML; if the only mutator operation is assignment,
then there is definitely no record identity.

Again, I fail to connect.

I and others have given aliasing examples that use just SELECT and UPDATE.

Sure, but update's semantics are defined in a per-record way,
which is consistent with record identity. Assignment's isn't.

(However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
at which point there is not much of a difference.)
I am not sure what this means. Delete can be expressed in
terms of assignment. (As can insert and update.)

INSERT cannot be expressed in terms of assignment. INSERT creates a new
record; there's no way that assignment in a language like C can create a
new data structure!
The same goes for DELETE.

I was intendind to be discussing a hypothetical relation-based
language,
so while I generally agree with you statement about C, I don't see
how it applies.

(Assignment can also be expressed in terms of insert and delete.)

Agreed.

I also realize that this makes it a bit more difficult to nail down the
nature of identity in a database.

I would propose that variables have identity, and values do not.
In part this is via the supplied definition of identity, in which, when
you change one thing, if something else changes as well, they
share identity. Since one cannot change values, they necessarily
lack identity.

It's certainly not storage location:
if you DELETE a record and then INSERT it with the same values, it may
be allocated somewhere entirely else, and our intuition would say it's
not "the same" (i.e. not identical).

Well, it would depend on how our intuition had been primed. If it
was via implementation techniques in low level languages, we
might reach a different conclusion than if our intuition was primed
via logical models and relation theory.

(In a system with OID, it would
even be impossible to recreate such a record, since it would have a
different OID. I'm not sure whether this makes OID systems better or
worse at preserving identity, but that's just a side track.)

OIDs are something of a kludge, and they break set semantics.

Since intuition gives me ambivalent results here, I'll go back to my
semiformal definition (and take the opportunity to make it a bit more
precise):
Two path expressions (lvalues, ...) are aliases if and only if the
referred-to values compare equal, and if they stay equal after applying
any operation to the referred-to value through either of the path
expressions.

Alas, this leaves me confused. I don't see how a path expression
(in this case, SELECT ... WHERE) can be an l-value. You cannot
apply imperative operations to the result. (Also I think the use
of equality here is too narrow; it is only necessary to show that
two things both change, not that they change in the same way.)

I was under the impression you agred that "i+2" was not
a "path expression". If our hypothetical language lacks record
identity, then I would say that any query is simply an expression
that returns a value, as in "i+2."

In the context of SQL, this means that identity isn't the location where
the data is stored. It's also not the values stored in the record -
these may change, including key data. SQL record identity is local, it
can be defined from one operation to the next, but there is no such
thing as a global identity that one can memorize and look up years
later, without looking at the intermediate states of the store.

Yes, however all of this depends on record identity.

It's a gross concept, now that I think about it. Well, or at least
rather alien for us programmers, who are used to taking the address of a
variable to get a tangible identity that will stay stable over time.

It is certaily alien if one is not used to relation semantics, which
is the default case.

On the other hand, variable addresses as tangible identities don't hold
much water anyway.
Imagine data that's written out to disk at program end, and read back
in. Further imagine that while the data is read into main memory,
there's a mechanism that redirects all further reads and writes to the
file into the read-in copy in memory, i.e. whenever any program changes
the data, all other programs see the change, too.
Alternatively, think about software agents that move from machine to
machine, carrying their data with them. They might be communicating with
each other, so they need some means of establishing identity
("addressing") the memory buffers that they use for communication.

These are exactly why content-based addressing is so important.
Location addressing depends on an address space, and this
concept does not distribute well.

I don't know what "new" would be in a value-semantics, relational
world.

It would be INSERT.

Um, my idea of "value semantics" is associated with immutable values.
SQL with INSERT/DELETE/UPDATE certainly doesn't match that definition.

Sorry, I was vague. Compare, in OOP, the difference between a value
object and a "regular" object.

So by my definition, SQL doesn't have value semantics, by your
definition, it would have value semantics but updates which are enough
to create aliasing problems, so I'm not sure what point you're making
here...

Filters are just like array indexing: both select a subset of variables
from a collection.
I can't agree with this wording. A filter produces a collection
value from a collection value. I don't see how variables
enter in to it.

A collection can consist of values or variables.

And yes, I do think that WHERE is a selection over a bunch of variables
- you can update records after all, so they are variables! They don't
have a name, at least none which is guaranteed to be constant over their
lifetime, but they can be mutated!

We seem to have slipped back from the hypothetical relation language
with only assignement back to SQL.

One can filter either a collection constant or
a collection variable; if one speaks of filtering a collection
variable, on is really speaking of filtering the collection value
that the variable currently contains; filtering is not an operation
on the variable as such, the way the "address of" operator is.
Note you can't update the result of a filter.

If that's your definition of a filter, then WHERE is not a filter,
simple as that.

Fair enough! Can you correct my definition of filter, though?
I am still unaware of the difference.

In SQL, you select a subset of a table, in a
programming language, you select a subset of an array.

(The SQL selection mechanism is far more flexible in the kinds of
filtering you can apply, while array indexing allows filtering just by
ordinal position. However, the relevant point is that both select things
that can be updated.)
When you have been saying "select things that can be updated"
I have been assuming you meant that one can derive values
from variables, and that some other operation can update that
variable, causing the expression, if re-evaluated, to produce
a different value.

That's what I meant.

However the phrase also suggests that
you mean that the *result* of the select can *itself* be
updated.

The "that" in "things that can be updated" refers to the selected
things. I'm not sure how this "that" could be interpreted to refer to
the selection as a whole (is my understanding of English really that bad?)

Your English is extraordinary. I could easily conclude that you
were born in Boston and educated at Harvard, and either have
Germanic ancestry or have simply adopted a Germanic name
out of whimsy. If English is not your native tongue, there is no
way to detect it.

Argh, late for dropping off my daughter at school now. Must run.
Sorry if I was a bit unclear due to being rushed.
Marshall

Jul 17 '06 #649

Darren New

Chris Smith wrote:

Darren New <dn**@san.rr.comwrote:

>>I'm not sure what linear or uniqueness typing is. It's typestate, and if
I remember correctly the papers I read 10 years ago, the folks at
TJWatson that invented Hermes also invented the concept of typestate.
They at least claim to have coined the term.

Coining the term is one thing, but I feel pretty confident that the idea
was not invented in 1986 with the Hermes language, but rather far
earlier.

Yes. However, the guys who invented Hermes didn't come up with it out of
the blue. It was around (in NIL - Network Implementation Language) for
years beforehand. I read papers about these things in graduate school,
but I don't know where my photocopies are.

NIL was apparently quite successful, but a niche language, designed by
IBM for programming IBM routers. Hermes was an attempt years later to
take the same successful formula and turn it into a general-purpose
programming system, which failed (I believe) for the same reason that a
general purpose operating system that can't run C programs will fail.

Perhaps they may have invented the concept of considering it
any different from other applications of types, though.

From what I can determine, the authors seem to imply that typestate is
dataflow analysis modified in (at least) two ways:

1) When control flow joins, the new typestate is the intersection of
typestates coming into the join, where as dataflow analysis doesn't
guarantee that. (They imply they think dataflow analysis is allowed to
say "the variable might or might not be initialized here", while
typestate would ensure the variable is uninitialized.)

2) The user has control over the typestate, so the user can (for
exmaple) assert a variable is uninitialized at some point, and by doing
so, make it so.

How this differs from theoretical lambda types and all I couldn't say.

What is being named here is the overcoming of a limitation that
programming language designers imposed upon themselves, whether from not
understanding the theoretical research or not believing it important, I
don't know.

I believe there's also a certain level of common-senseness needed to
make a language even marginally popular. :-) While it's possible that
there's really no difference between type and typestate at the
theoretical level, I think most practical programmers would have trouble
wrapping their head around that, just as programming in an entirely
recursive pattern when one is used to looping can be disorienting.

--
Darren New / San Diego, CA, USA (PST)
This octopus isn't tasty. Too many
tentacles, not enough chops.

Jul 17 '06 #650

Similar topics