473,416 Members | 1,877 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,416 software developers and data experts.

Boost process and C

Hi,

Is there any group in the manner of the C++ Boost group that works on
the evolution of the C language? Or is there any group that performs an
equivalent function?

Thanks,
-vs

Apr 29 '06
335 11403
Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
CBFalconer wrote:
we******@gmail.com wrote:
> CBFalconer wrote:
... snip ...
>> The last time I took an (admittedly cursory) look at Bstrlib, I
>> found it cursed with non-portabilities
>
> You perhaps would like to name one?

I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767.
[snip]
[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.


Ok, so you can name a single application of such a thing right?
it allows in an efficient way for all desirable encoding scenarios,
and it avoids any wrap around anomolies causing under-allocations.


What anomalies? Are these a consequence of using signed long, or
size_t?


I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.
If I tried to use size_t I would give up a significant amount of
safety and design features (or else I would have to put more entries
into the header, making it less efficient).


If you only need a single "special" marker value (for which you were
perhaps using -1), you could consider using ~(size_t) 0.


For the mlen, I need one value that indicates a write protected string
(that can be unprotected) and one the indicates a constant (that can
never be unprotected). The slen has to be of the same type as mlen,
and so in order to check for potential errors, I set it to -1 to
indicate that it has been deterministically set to an invalid value.
Of course I could just isolate a handful of values, however, but this
makes the error space extremely small, which reduces your chances of
finding accidental full corruptions, and removes a useful debugging
mechanism (where you could pass around useful information through
negative values.)
Things will go wrong for at most one possible string length, but that's
more than can be said for using int.
Huh? You *WANT* more erroneous scenarios. You want the mechanism to
require a somewhat tighter form of correctness, with it otherwise
leading to the thing stopping or other feeding back detectable errors.
If you have only a small error trap, random behaviour will not fall
into it.
But whatever the difference in efficiency, surely correctness and safety
first, efficiency second has to be the rule for a general-purpose
library?


It *IS* correct and safe. (And its fast, and easy to use/understand
and powerful and portable and secure ...) What are you talking about?
I'll take it you have never tried to use or understand Bstrlib either.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 4 '06 #201
On Tue, 02 May 2006 23:20:47 +0200, regis <re***@dil.univ-mrs.fr> wrote:
without operator overloading, how about just an infix notation
for 2-ary functions (with, e.g., functions evaluated left to right,
all with the same priority) ?

typedef struct Vect { double x, y; } Vect;

infix Vect Vect_Sub (Vect u, Vect v) {
return (Vect) { .x= u.x - v.x, .y= u.y - v.y };
}
infix Vect Vect_Scale (double lambda, Vect u) {
return (Vect) { .x= lambda*u.x, .y= lambda*u.y };
}
infix double Vect_Dot (Vect u, Vect v) {
return u.x * v.x + u.y * v.y;
}
int main (void) {
Vect u, v, w, p, q, r, s, t;
...
t= ((v Vect_Sub u) Vect_Dot (w Vect_Sub v))
Vect_Scale (p Vect_Sub q Vect_Sub r Vect_Sub s);
...
}


No, please. This looks strangely familiar if you know LISP :P

Plus, it doesn't really work for functions with an arbitrary number of
arguments, and this creates an inconsistency in the elegantly simple
syntax of C.

May 4 '06 #202
On 2006-05-03, REH <me@you.com> wrote:

"Ben C" <sp******@spam.eggs> wrote in message
news:sl*********************@bowser.marioworld...
On 2006-05-03, REH <sp******@stny.rr.com> wrote:

Ben C wrote:
In C, builtin types are passed around by value and space for them
doesn't need to be allocated or freed.

Um, the same is true for C++.
Yes of course, I never intended to imply that it wasn't.

The point I was making was that operator overloading doesn't mix so
easily with things that might need to be allocated and freed manually--
i.e. objects of user-defined types. You start needing constructors and
destructors, which C++ (but not C) has.


Why? And why do you think objects of user-defined types have to be
"allocated and freed manually"?


They don't _have_ to be, but they _might_ be.

One of the "features" of C is that the programmer has control over
memory allocation and de-allocation.

Usually in practice this just means a lot of bugs and crashes; but there
are good reasons for it too: you can write domain-specific allocators
that are more efficient and/or tunable in the amount of space or time
they use, instead of relying on a general-purpose allocator or
garbage-collector all the time.

The programmer also might implement things like shallow-copy and
copy-on-write.

Somehow all of these things need to happen when an expression like this
is evaluated:

string a = b + c;

In C++ the basic mechanism you use for this is constructors. For example
the string copy constructor might set up a shallow copy-on-write copy.
Someone has to write the code for that. If the programmer writes it, and
it's not just part of the framework, then it has to get implicitly
called.
struct foo {
int x, y;
};

foo operator+ (const foo& a, const foo& b)
// for it you are of the "I hate references" camp: foo operator+ (foo a,
foo b)
{
const foo z = {a.x + b.x, a.y + b.y};
return z;
}

foo x = {1, 2};
foo y = {3, 4};
foo z = x + y;

simplistic, but no constructors.


Yes exactly, and AFAIK the kind of operator-overloading that has been
proposed for C is something like this-- it's fine for structs
representing things like complex numbers (that are a few words long and
don't contain pointers).

But this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

On its own it's not enough; with the extra workarounds you need, you end
up with C++ (or some other kind of "octopus made by nailing four extra
legs onto a dog").
May 4 '06 #203
On 2006-05-04, we******@gmail.com <we******@gmail.com> wrote:
Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
> CBFalconer wrote:
>> we******@gmail.com wrote:
>> > CBFalconer wrote:
>> ... snip ...
>> >> The last time I took an (admittedly cursory) look at Bstrlib, I
>> >> found it cursed with non-portabilities
>> >
>> > You perhaps would like to name one?
>>
>> I took another 2 minute look, and was immediately struck by the use
>> of int for sizes, rather than size_t. This limits reliably
>> available string length to 32767.
[snip]
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.
Ok, so you can name a single application of such a thing right?


No, but I don't assume that everything I can't name an example of
doesn't exist.
> it allows in an efficient way for all desirable encoding scenarios,
> and it avoids any wrap around anomolies causing under-allocations.


What anomalies? Are these a consequence of using signed long, or
size_t?


I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.


OK, I think I understand that part now.
> If I tried to use size_t I would give up a significant amount of
> safety and design features (or else I would have to put more entries
> into the header, making it less efficient).


If you only need a single "special" marker value (for which you were
perhaps using -1), you could consider using ~(size_t) 0.


For the mlen, I need one value that indicates a write protected string
(that can be unprotected) and one the indicates a constant (that can
never be unprotected). The slen has to be of the same type as mlen,
and so in order to check for potential errors, I set it to -1 to
indicate that it has been deterministically set to an invalid value.
Of course I could just isolate a handful of values, however, but this
makes the error space extremely small, which reduces your chances of
finding accidental full corruptions, and removes a useful debugging
mechanism (where you could pass around useful information through
negative values.)
Things will go wrong for at most one possible string length, but that's
more than can be said for using int.


Huh? You *WANT* more erroneous scenarios.[..]


Sorry, I was unclear; I meant "that's better than you can say of the
situation if you use int".
But whatever the difference in efficiency, surely correctness and safety
first, efficiency second has to be the rule for a general-purpose
library?


It *IS* correct and safe. (And its fast, and easy to use/understand
and powerful and portable and secure ...)


I have nothing against Bstrlib.
What are you talking about?
What if int is bigger than size_t?
I'll take it you have never tried to use or understand Bstrlib either.


No I'd never heard of it.
May 4 '06 #204
we******@gmail.com wrote:
Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
CBFalconer wrote:
we******@gmail.com wrote:
> CBFalconer wrote:
... snip ...
>> The last time I took an (admittedly cursory) look at Bstrlib, I
>> found it cursed with non-portabilities
> You perhaps would like to name one?
I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767.

[snip]
[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,

I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.


Ok, so you can name a single application of such a thing right?


Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.
it allows in an efficient way for all desirable encoding scenarios,
and it avoids any wrap around anomolies causing under-allocations.

What anomalies? Are these a consequence of using signed long, or
size_t?


I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.


Is an extra byte (or word, or double word) for a flags field really that
big an overhead?

<snip>
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Inviato da X-Privat.Org - Registrazione gratuita http://www.x-privat.org/join.php
May 4 '06 #205
Giorgos Keramidas wrote:
On Tue, 02 May 2006 23:20:47 +0200, regis <re***@dil.univ-mrs.fr> wrote:
without operator overloading, how about just an infix notation
for 2-ary functions (with, e.g., functions evaluated left to right,
all with the same priority) ?

typedef struct Vect { double x, y; } Vect;

infix Vect Vect_Sub (Vect u, Vect v) {
return (Vect) { .x= u.x - v.x, .y= u.y - v.y };
}
infix Vect Vect_Scale (double lambda, Vect u) {
return (Vect) { .x= lambda*u.x, .y= lambda*u.y };
}
infix double Vect_Dot (Vect u, Vect v) {
return u.x * v.x + u.y * v.y;
}
int main (void) {
Vect u, v, w, p, q, r, s, t;
...
t= ((v Vect_Sub u) Vect_Dot (w Vect_Sub v))
Vect_Scale (p Vect_Sub q Vect_Sub r Vect_Sub s);
...
}


No, please. This looks strangely familiar if you know LISP :P

Plus, it doesn't really work for functions with an arbitrary number of
arguments, and this creates an inconsistency in the elegantly simple
syntax of C.


I know no infix scheme for functions in Lisp.
In Lisp, This would look like:

(Vect_Scale
(Vect_Dot
(Vect_Sub v u)
(Vect_Sub w v)
)
(Vect_Sub_va p q r s)
)

which is much like it looks in C without infix notation:

Vect_Scale (
Vect_Dot (
Vect_Sub (u,v),
Vect_Sub (w,v)
),
Vect_Sub_va (p, q, r, s, ARGS_END)
);

May 4 '06 #206
Ben C a écrit :

Yes exactly, and AFAIK the kind of operator-overloading that has been
proposed for C is something like this-- it's fine for structs
representing things like complex numbers (that are a few words long and
don't contain pointers).

But this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

Why not?

Suppose Matrix A,B,C;

C = A+B;

Your operator + function would allocate the space, add the matrix to a
possible linked lists of matrices that allows to GC unused ones, and
return the results.

Or, instead of taking all this trouble you could just use the GC and
forget about destructors. All intermediate results would be
automatically garbage collected.
On its own it's not enough; with the extra workarounds you need, you end
up with C++ (or some other kind of "octopus made by nailing four extra
legs onto a dog").


The crucial point in this is to know when to stop. There are NO
constructors/destructors in C, and none of the proposed extensions
proposes that.

Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for the
fools!

If you feel that operator overloading would not solve the problem for
matrices addition, then you will have to devise other means of doing that.

The GC however, is an ELEGANT solution to all this problems. We would
have the easy of use of C++ with its automatic destructors, WITHOUT
PAYING THE PRICE in language and compiler complexity.

This last point is important: compiler complexity increases the effort
that the language implementor must do and increases the "bug surface".

The module that handles the operator overloading in lcc-win32 is 1732
lines long, including all commentaries and lines that contain just a '{'
or a '}'.

The compiled operators module is 11K machine code. All the extensions of
lcc-win32 are conceptually simple, even if operator overloading is the
most complex one. The others like generic functions are much simpler to
implement.

jacob
May 4 '06 #207
Flash Gordon a écrit :

Is an extra byte (or word, or double word) for a flags field really that
big an overhead?


Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision.
May 4 '06 #208
jacob navia wrote:
Flash Gordon a écrit :

Is an extra byte (or word, or double word) for a flags field really
that big an overhead?


Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision.


I've yet to see software where short strings made up a significant
portion of the memory footprint and saving the memory that avoiding the
flags would be of real use. Of course, such applications might exist.

Personally I would say that using negative lengths was asking for
problems because at some point a negative length will be checked without
first changing it to positive.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc
May 4 '06 #209
On 2006-05-04, we******@gmail.com <we******@gmail.com> wrote:
Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
> CBFalconer wrote:
>> we******@gmail.com wrote:
>> > CBFalconer wrote:
>> ... snip ...
>> >> The last time I took an (admittedly cursory) look at Bstrlib, I
>> >> found it cursed with non-portabilities
>> >
>> > You perhaps would like to name one?
>>
>> I took another 2 minute look, and was immediately struck by the use
>> of int for sizes, rather than size_t. This limits reliably
>> available string length to 32767.


[snip]
>> [...] I did find an explanation and
>> justification for this. Conceded, such a size is probably adequate
>> for most usage, but the restriction is not present in standard C
>> strings.

> Your going to need to conceed on more grounds than that. There is a
> reason many UNIX systems tried to add a ssize_t type, and why TR 24731
> has added rsize_t to their extension. (As a side note, I strongly
> suspect the Microsoft, in fact, added this whole rsize_t thing to TR
> 24731 when they realized that Bstrlib, or things like it, actually has
> far better real world safety because its use of ints for string
> lengths.) Using a long would be incorrect since there are some systems
> where a long value can exceed a size_t value (and thus lead to falsely
> sized mallocs.) There is also the matter of trying to codify
> read-only and constant strings and detecting errors efficiently
> (negative lengths fit the bill.) Using ints is the best choice
> because at worst its giving up things (super-long strings) that nobody
> cares about,


I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.


Ok, so you can name a single application of such a thing right?
> it allows in an efficient way for all desirable encoding scenarios,
> and it avoids any wrap around anomolies causing under-allocations.


What anomalies? Are these a consequence of using signed long, or
size_t?


I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.


If it's longer than the maximum size_t value, you probably can't have it
anyway, so there's no point in being able to represent it.

Silly encoding tricks buy you nothing, just use another field with bit
flags.
> If I tried to use size_t I would give up a significant amount of
> safety and design features (or else I would have to put more entries
> into the header, making it less efficient).


If you only need a single "special" marker value (for which you were
perhaps using -1), you could consider using ~(size_t) 0.


For the mlen, I need one value that indicates a write protected string
(that can be unprotected) and one the indicates a constant (that can
never be unprotected). The slen has to be of the same type as mlen,
and so in order to check for potential errors, I set it to -1 to
indicate that it has been deterministically set to an invalid value.
Of course I could just isolate a handful of values, however, but this
makes the error space extremely small, which reduces your chances of
finding accidental full corruptions,


This shouldn't be left to chance anyway, pretending that it can be
caught invites disaster when inevitably one of the cases comes up when
it _doesn't_ get caught.
May 4 '06 #210
Flash Gordon wrote:
we******@gmail.com wrote:
Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
CBFalconer wrote:
> we******@gmail.com wrote:
>> CBFalconer wrote:
> ... snip ...
>>> The last time I took an (admittedly cursory) look at Bstrlib, I
>>> found it cursed with non-portabilities
>> You perhaps would like to name one?
> I took another 2 minute look, and was immediately struck by the use
> of int for sizes, rather than size_t. This limits reliably
> available string length to 32767.
[snip]

> [...] I did find an explanation and
> justification for this. Conceded, such a size is probably adequate
> for most usage, but the restriction is not present in standard C
> strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.


Ok, so you can name a single application of such a thing right?


Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.


So now name the platform where its *possible* to deal with this, but
where Bstrlib fails to be able to deal with them due to its design
choices.
it allows in an efficient way for all desirable encoding scenarios,
and it avoids any wrap around anomolies causing under-allocations.
What anomalies? Are these a consequence of using signed long, or
size_t?


I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.


Is an extra byte (or word, or double word) for a flags field really that
big an overhead?


I need two *bits* for flags, and I want large ranges to catch errors in
the scalar fields (this is a *safe* library). An extra struct entry is
the wrong way to do this because it doesn't help my catch errors in the
scalar fields, and its space inefficient.

ssize_t would have been a reasonable *functional* choice, but its not
standard. size_t is no good because it can't go negative. long int is
no good because there are plenty of real platforms where long int is
larger than size_t. int solves all the main real problems, and as a
bonus the compiler is designed to make sure its the fastest scalar
primitive available.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 4 '06 #211
REH

Ben C wrote:
Why? And why do you think objects of user-defined types have to be
"allocated and freed manually"?
They don't _have_ to be, but they _might_ be.

One of the "features" of C is that the programmer has control over
memory allocation and de-allocation.

Yes, C++ has this same "feature." Memory allocation is completely
under control of the programmer.

Usually in practice this just means a lot of bugs and crashes; but there
are good reasons for it too: you can write domain-specific allocators
that are more efficient and/or tunable in the amount of space or time
they use, instead of relying on a general-purpose allocator or
garbage-collector all the time. C++ does not do GC, nor are you required to use any "general-purpose"
allocator.

The programmer also might implement things like shallow-copy and
copy-on-write.

Somehow all of these things need to happen when an expression like this
is evaluated:

string a = b + c;

In C++ the basic mechanism you use for this is constructors. For example
the string copy constructor might set up a shallow copy-on-write copy.
Someone has to write the code for that. If the programmer writes it, and
it's not just part of the framework, then it has to get implicitly
called. Yes, the programmer can write a constructor to do this. He does not
have to.
struct foo {
int x, y;
};

foo operator+ (const foo& a, const foo& b)
// for it you are of the "I hate references" camp: foo operator+ (foo a,
foo b)
{
const foo z = {a.x + b.x, a.y + b.y};
return z;
}

foo x = {1, 2};
foo y = {3, 4};
foo z = x + y;

simplistic, but no constructors.


Yes exactly, and AFAIK the kind of operator-overloading that has been
proposed for C is something like this-- it's fine for structs
representing things like complex numbers (that are a few words long and
don't contain pointers).

But this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

On its own it's not enough; with the extra workarounds you need, you end
up with C++ (or some other kind of "octopus made by nailing four extra
legs onto a dog").


I still don't get your point.

REH

May 4 '06 #212
Flash Gordon wrote:
jacob navia wrote:
Flash Gordon a écrit :
Is an extra byte (or word, or double word) for a flags field really
that big an overhead?
Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision.


I've yet to see software where short strings made up a significant
portion of the memory footprint and saving the memory that avoiding the
flags would be of real use. Of course, such applications might exist.


Any program that reads words from any language dictionary. Like a
spell checker, or a word puzzle solver/creator, or a spam filter. For
dictionaries the size of the english language dictionary, these kinds
of applications can typically push the L2 cache of your CPU pretty
hard.
Personally I would say that using negative lengths was asking for
problems because at some point a negative length will be checked without
first changing it to positive.


I think you miss the point. If the string length is negative then it
is erroneous. That's the point of it. But the amount of memory
allocated being negative, I use to indicate that the memory is not
legally modifiable at the moment, and being 0 meaning that its not
modifiable ever. The point being that the library blocks erroneous
action due to intentionally or unintentionally having bad header values
in the same test. So it reduces overhead, while increasing safety and
functionality at the same time.

You know, you can actually read the explanation of all this in the
documentation if you care to do so.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 4 '06 #213
Jordan Abel wrote:
On 2006-05-04, we******@gmail.com <we******@gmail.com> wrote:
Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
> CBFalconer wrote:
>> we******@gmail.com wrote:
>> > CBFalconer wrote:
>> ... snip ...
>> >> The last time I took an (admittedly cursory) look at Bstrlib, I
>> >> found it cursed with non-portabilities
>> >
>> > You perhaps would like to name one?
>>
>> I took another 2 minute look, and was immediately struck by the use
>> of int for sizes, rather than size_t. This limits reliably
>> available string length to 32767.

[snip]

>> [...] I did find an explanation and
>> justification for this. Conceded, such a size is probably adequate
>> for most usage, but the restriction is not present in standard C
>> strings.

> Your going to need to conceed on more grounds than that. There is a
> reason many UNIX systems tried to add a ssize_t type, and why TR 24731
> has added rsize_t to their extension. (As a side note, I strongly
> suspect the Microsoft, in fact, added this whole rsize_t thing to TR
> 24731 when they realized that Bstrlib, or things like it, actually has
> far better real world safety because its use of ints for string
> lengths.) Using a long would be incorrect since there are some systems
> where a long value can exceed a size_t value (and thus lead to falsely
> sized mallocs.) There is also the matter of trying to codify
> read-only and constant strings and detecting errors efficiently
> (negative lengths fit the bill.) Using ints is the best choice
> because at worst its giving up things (super-long strings) that nobody
> cares about,

I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.
Ok, so you can name a single application of such a thing right?
> it allows in an efficient way for all desirable encoding scenarios,
> and it avoids any wrap around anomolies causing under-allocations.

What anomalies? Are these a consequence of using signed long, or
size_t?


I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.


If it's longer than the maximum size_t value, you probably can't have it
anyway, so there's no point in being able to represent it.


Huh?
Silly encoding tricks buy you nothing, just use another field with bit
flags.


If I do that, I lose space, speed, and error detection. I see it as
buying me a whole hell of a lot actually.
> If I tried to use size_t I would give up a significant amount of
> safety and design features (or else I would have to put more entries
> into the header, making it less efficient).

If you only need a single "special" marker value (for which you were
perhaps using -1), you could consider using ~(size_t) 0.


For the mlen, I need one value that indicates a write protected string
(that can be unprotected) and one the indicates a constant (that can
never be unprotected). The slen has to be of the same type as mlen,
and so in order to check for potential errors, I set it to -1 to
indicate that it has been deterministically set to an invalid value.
Of course I could just isolate a handful of values, however, but this
makes the error space extremely small, which reduces your chances of
finding accidental full corruptions,


This shouldn't be left to chance anyway, pretending that it can be
caught invites disaster when inevitably one of the cases comes up when
it _doesn't_ get caught.


Uhh ... that's the situation we have with basically all other string
libraries in existence for C *today*. My library and TR 24731 are the
only ones to attempt to catch these errors *before* any undefined
scenario occurs. In practice this means that a greater percentage of
corruption errors are simply caught in your normal error handling.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 4 '06 #214
On 2006-05-04, jacob navia <ja***@jacob.remcomp.fr> wrote:
Ben C a écrit :

Yes exactly, and AFAIK the kind of operator-overloading that has been
proposed for C is something like this-- it's fine for structs
representing things like complex numbers (that are a few words long and
don't contain pointers).

But this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

Why not?

Suppose Matrix A,B,C;

C = A+B;

Your operator + function would allocate the space, add the matrix to a
possible linked lists of matrices that allows to GC unused ones, and
return the results.


A reference to them presumably.

Yes indeed, if you have a garbage collector (and references) there is no
problem.

That's why I say operator-overloading works well in languages where
the framework manages storage for you (e.g. in Python, and apparently
lcc-extended C).

[snip]
Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for the
fools!
And what about left-shifting iostreams :)
If you feel that operator overloading would not solve the problem for
matrices addition, then you will have to devise other means of doing that.

The GC however, is an ELEGANT solution to all this problems. We would
have the easy of use of C++ with its automatic destructors, WITHOUT
PAYING THE PRICE in language and compiler complexity.

This last point is important: compiler complexity increases the effort
that the language implementor must do and increases the "bug surface".


Yes of course. Although I would say, why not leave poor C alone and
start a new language? Or just use a different language that already
exists... there are a lot out there.

I often get the feeling there's a lot of pain and complexity in C++ that
could have been avoided if it hadn't started out trying to be compatible
with C.
May 4 '06 #215
On 2006-05-04, REH <sp******@stny.rr.com> wrote:

Ben C wrote:
Usually in practice this just means a lot of bugs and crashes; but there
are good reasons for it too: you can write domain-specific allocators
that are more efficient and/or tunable in the amount of space or time
they use, instead of relying on a general-purpose allocator or
garbage-collector all the time.
C++ does not do GC, nor are you required to use any "general-purpose"
allocator.
Yes I know. But you do get constructors, destructors and references, so
you can fit explicit memory management "under the hood" of operator
overloading.
The programmer also might implement things like shallow-copy and
copy-on-write.

Somehow all of these things need to happen when an expression like this
is evaluated:

string a = b + c;

In C++ the basic mechanism you use for this is constructors. For example
the string copy constructor might set up a shallow copy-on-write copy.
Someone has to write the code for that. If the programmer writes it, and
it's not just part of the framework, then it has to get implicitly
called.

Yes, the programmer can write a constructor to do this. He does not
have to.
I don't know of a way to do it without a constructor (for a
shallow-copied copy-on-write string class).

[snip] I still don't get your point.


Show me the string example, and hopefully either you will get my point
or I will get yours :)
May 4 '06 #216
On 2006-05-04, we******@gmail.com <we******@gmail.com> wrote:
Jordan Abel wrote:
On 2006-05-04, we******@gmail.com <we******@gmail.com> wrote:
> Ben C wrote:
>> On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
>> > CBFalconer wrote:
>> >> we******@gmail.com wrote:
>> >> > CBFalconer wrote:
>> >> ... snip ...
>> >> >> The last time I took an (admittedly cursory) look at Bstrlib, I
>> >> >> found it cursed with non-portabilities
>> >> >
>> >> > You perhaps would like to name one?
>> >>
>> >> I took another 2 minute look, and was immediately struck by the use
>> >> of int for sizes, rather than size_t. This limits reliably
>> >> available string length to 32767.
>>
>> [snip]
>>
>> >> [...] I did find an explanation and
>> >> justification for this. Conceded, such a size is probably adequate
>> >> for most usage, but the restriction is not present in standard C
>> >> strings.
>>
>> > Your going to need to conceed on more grounds than that. There is a
>> > reason many UNIX systems tried to add a ssize_t type, and why TR 24731
>> > has added rsize_t to their extension. (As a side note, I strongly
>> > suspect the Microsoft, in fact, added this whole rsize_t thing to TR
>> > 24731 when they realized that Bstrlib, or things like it, actually has
>> > far better real world safety because its use of ints for string
>> > lengths.) Using a long would be incorrect since there are some systems
>> > where a long value can exceed a size_t value (and thus lead to falsely
>> > sized mallocs.) There is also the matter of trying to codify
>> > read-only and constant strings and detecting errors efficiently
>> > (negative lengths fit the bill.) Using ints is the best choice
>> > because at worst its giving up things (super-long strings) that nobody
>> > cares about,
>>
>> I think it's fair to expect the possibility of super-long strings in a
>> general-purpose string library.
>
> Ok, so you can name a single application of such a thing right?
>
>> > it allows in an efficient way for all desirable encoding scenarios,
>> > and it avoids any wrap around anomolies causing under-allocations.
>>
>> What anomalies? Are these a consequence of using signed long, or
>> size_t?
>
> I am describing what int does (*BOTH* the encoding scenarios and
> avoiding anomolies), Using a long int would allow for arithmetic on
> numbers that exceed the maximum value of size_t on some systems (that
> actually *exist*), so when there was an attempt to malloc or realloc on
> such sizes, there would be a wrap around to some value that would just
> make it screw up. And if I used a size_t, then there would be no
> simple space of encodings that can catch errors, constants and write
> protected strings.
If it's longer than the maximum size_t value, you probably can't have it
anyway, so there's no point in being able to represent it.


Huh?


size_t has to be able to represent the size of any object. to have
a string longer than its maximum value you have to have an array of
characters longer than that maximum value - which you can't have.
Silly encoding tricks buy you nothing, just use another field with bit
flags.


If I do that, I lose space, speed, and error detection. I see it as
buying me a whole hell of a lot actually.


Space and speed are cheap these days.

Even if you have a million strings, that's still only four megabytes
saved. If you make a million calls, that's still only a few million
cycles saved.

It does _not_ buy you error detection in general, and a false sense of
safety can be dangerous.

Probably the best thing to do to prevent errors would be make everything
use your API, and make sure your functions don't have bugs. Once you
have that, the only possible source of errors is bit rot, and you can't
do anything about that.
May 4 '06 #217
On 2006-05-04, jacob navia <ja***@jacob.remcomp.fr> wrote:
It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!


The addition operator on dates would work _exactly_ the same way as the
addition operator on pointers - you can subtract two of them, or add one
to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems where
it's not trivial, could call localtime, modify tm_secs, and then call
mktime]
May 4 '06 #218
Jordan Abel a écrit :
On 2006-05-04, jacob navia <ja***@jacob.remcomp.fr> wrote:
It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!

The addition operator on dates would work _exactly_ the same way as the
addition operator on pointers - you can subtract two of them, or add one
to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems where
it's not trivial, could call localtime, modify tm_secs, and then call
mktime]


Yes adding a number to a date makes sense, but I was speaking about
adding two dates!
May 4 '06 #219
we******@gmail.com wrote:
Flash Gordon wrote:
we******@gmail.com wrote:
Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
> CBFalconer wrote:
>> we******@gmail.com wrote:
>>> CBFalconer wrote:
>> ... snip ...
>>>> The last time I took an (admittedly cursory) look at Bstrlib, I
>>>> found it cursed with non-portabilities
>>> You perhaps would like to name one?
>> I took another 2 minute look, and was immediately struck by the use
>> of int for sizes, rather than size_t. This limits reliably
>> available string length to 32767.
[snip]

>> [...] I did find an explanation and
>> justification for this. Conceded, such a size is probably adequate
>> for most usage, but the restriction is not present in standard C
>> strings.
> Your going to need to conceed on more grounds than that. There is a
> reason many UNIX systems tried to add a ssize_t type, and why TR 24731
> has added rsize_t to their extension. (As a side note, I strongly
> suspect the Microsoft, in fact, added this whole rsize_t thing to TR
> 24731 when they realized that Bstrlib, or things like it, actually has
> far better real world safety because its use of ints for string
> lengths.) Using a long would be incorrect since there are some systems
> where a long value can exceed a size_t value (and thus lead to falsely
> sized mallocs.) There is also the matter of trying to codify
> read-only and constant strings and detecting errors efficiently
> (negative lengths fit the bill.) Using ints is the best choice
> because at worst its giving up things (super-long strings) that nobody
> cares about,
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.
Ok, so you can name a single application of such a thing right?

Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.


So now name the platform where its *possible* to deal with this, but
where Bstrlib fails to be able to deal with them due to its design
choices.


If the DOS port hadn't been dropped then depending on the compiler we
might have hit this. A significant portion of the SW I'm thinking of
originated on DOS, so it could have hit it.
> it allows in an efficient way for all desirable encoding scenarios,
> and it avoids any wrap around anomolies causing under-allocations.
What anomalies? Are these a consequence of using signed long, or
size_t?
I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.

Is an extra byte (or word, or double word) for a flags field really that
big an overhead?


I need two *bits* for flags, and I want large ranges to catch errors in
the scalar fields (this is a *safe* library). An extra struct entry is
the wrong way to do this because it doesn't help my catch errors in the
scalar fields, and its space inefficient.

ssize_t would have been a reasonable *functional* choice, but its not
standard. size_t is no good because it can't go negative. long int is
no good because there are plenty of real platforms where long int is
larger than size_t. int solves all the main real problems, and as a
bonus the compiler is designed to make sure its the fastest scalar
primitive available.


Strangely enough, when a previous developer on the code I'm dealing with
thought he could limit size to a "valid" range an assert if it was out
of range we found that the asserts kept getting triggered. However, it
was always triggered incorrectly because the size was actually valid! So
I'll stick to not artificially limiting sizes. If the administrator of a
server the SW is installed on wants then s/he can use system specific
means to limit the size of a process.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Inviato da X-Privat.Org - Registrazione gratuita http://www.x-privat.org/join.php
May 4 '06 #220
we******@gmail.com wrote:
Flash Gordon wrote:
jacob navia wrote:
Flash Gordon a écrit :
Is an extra byte (or word, or double word) for a flags field really
that big an overhead?
Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision. I've yet to see software where short strings made up a significant
portion of the memory footprint and saving the memory that avoiding the
flags would be of real use. Of course, such applications might exist.


Any program that reads words from any language dictionary. Like a
spell checker, or a word puzzle solver/creator, or a spam filter. For
dictionaries the size of the english language dictionary, these kinds
of applications can typically push the L2 cache of your CPU pretty
hard.


I never said they didn't exist. However, a typical dictionary + the
structures is not going to fit in my L2 cache anyway. However, the
subset of it that is likely to be actually in use is probably an order
of magnitude smaller and so could easily fit in with the extra overhead.
Alternatively, one could go to conventional C strings and have a bigger
chance of it fitting since they only have a 1 byte overhead compared to
probably an 8 byte overhead (4 byte int for lenght, 4 byte int for
memory block size) that it sounds like your library has. Even if your
library only has a 4 byte overhead it is still larger!
Personally I would say that using negative lengths was asking for
problems because at some point a negative length will be checked without
first changing it to positive.


I think you miss the point. If the string length is negative then it
is erroneous. That's the point of it. But the amount of memory
allocated being negative, I use to indicate that the memory is not
legally modifiable at the moment, and being 0 meaning that its not
modifiable ever. The point being that the library blocks erroneous
action due to intentionally or unintentionally having bad header values
in the same test. So it reduces overhead, while increasing safety and
functionality at the same time.


If you are trying to detect corruption then you should also be checking
that the length is not longer than the memory block, so you should be
doing more than one comparison anyway. Then you can easily check if any
unused flag bits are non-0.
You know, you can actually read the explanation of all this in the
documentation if you care to do so.


Probably true.

It may well be that the performance gain is worth it for the
applications people use your library for. If so then fine, but the
limitation means it is not worth me migrating to it.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Inviato da X-Privat.Org - Registrazione gratuita http://www.x-privat.org/join.php
May 4 '06 #221
>>> In "higher-level" languages
which are further abstracted from the implementation, it's attractive to
remove this distinction-- Python for example achieves this well. But I'm
not convinced of the wisdom of the hybrid, C with operator overloading.

I am certain that the conservative option just puts brakes to the
development of the language


I agree. You need brakes.

Having said that I'm all for trying these things out in projects like
lcc-win32.


I've been playing around with extending C to be more "high-level" using the
TinyCC compiler. The license of TinyCC is GPL and it runs on win32 and linux
so I think it makes a better base for experiments that one wants to
distribute. TinyCC is located at http://www.tinycc.org and, for those
curious, my experiments are at http://www.tinycx.org.

-Ben Hinkle
May 4 '06 #222
jacob navia wrote:
Jordan Abel a écrit :
On 2006-05-04, jacob navia <ja***@jacob.remcomp.fr> wrote:
It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!

The addition operator on dates would work _exactly_ the same way as
the addition operator on pointers - you can subtract two of them, or
add one to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems
where it's not trivial, could call localtime, modify tm_secs, and then
call mktime]


Yes adding a number to a date makes sense, but I was speaking about
adding two dates!


What will the date be in 4 years, 2 months and 5 days from today?

Adding something other than a number can make a lot of sense. Adding to
real dates doesn't I agree.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc
May 4 '06 #223
On 2006-05-04, Flash Gordon <sp**@flash-gordon.me.uk> wrote:
jacob navia wrote:
Jordan Abel a écrit :
On 2006-05-04, jacob navia <ja***@jacob.remcomp.fr> wrote:

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!
The addition operator on dates would work _exactly_ the same way as
the addition operator on pointers - you can subtract two of them, or
add one to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems
where it's not trivial, could call localtime, modify tm_secs, and then
call mktime]


Yes adding a number to a date makes sense, but I was speaking about
adding two dates!


What will the date be in 4 years, 2 months and 5 days from today?

Adding something other than a number can make a lot of sense. Adding to
real dates doesn't I agree.


My first thought was "represent it as a number of seconds", but then
I realized - how many seconds in two months? Or in any number of years
not a multiple of four?

Should there be another type to represent intervals of time in such
a way? Or use struct tm? 0 for the year would mean either no years or
1900 depending on the context
May 4 '06 #224

jacob navia wrote:
Ben C a écrit (regarding operator overloading)

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

Why not?

Suppose Matrix A,B,C;

C = A+B;


<snip>
Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"


Do you also propose to avoid using '*' to represent
matrix multiplication on the basis that matrix multiplication is not
commutative?

May 4 '06 #225
>Ben C wrote:
[much snippage]
The programmer also might implement things like shallow-copy and
copy-on-write.

Somehow all of these things need to happen when an expression like this
is evaluated:

string a = b + c;
Or, comparably (but considerably more complicated):

String a = ((b + c) - "zog") * " ";

where string "addition" means "concatenate" (the usual definition
for string addition), "subtraction" means "remove the first copy
of the target string, if there is one", and string "multiplication"
means "repeatedly insert this string". Hence if b and c hold "xyz"
and "ogle" respectively, the "sum" is "xyzogle", subtracting "zog"
yields "xyle", and multiplying by " " yields "x y l e " (including
the final space).
In C++ the basic mechanism you use for this is constructors. For example
the string copy constructor might set up a shallow copy-on-write copy.
Indeed. Suppose the String data structure is much like Paul Hsieh's
favorite, but perhaps with a few more bells and whistles (I have not
looked at his implementation):

struct StringBuffer;

struct String {
char *bytes; /* the bytes (if any) in the string */
size_t slen; /* the length of the string */
struct StringBuffer *buf; /* the underlying buffer (may be shared) */
struct String *next; /* linked list in case of shared references */
};

struct StringBuffer {
char *base; /* base address of buffer */
size_t bufsize; /* size of buffer */
size_t refcnt; /* number of references to this buffer */
struct String *firstref; /* head of reference chain */
};

This gives us functions that, in C, might look like:

/* "Copy" a string: return a new reference to an existing string */
struct String *String_copy(struct String *old) {
struct String *new = xmalloc(sizeof *new);
/* xmalloc is just malloc plus panic-if-out-of-space */

/* copy the underlying string's info */
new->bytes = old->bytes;
new->slen = old->slen;
new->buf = old->buf;

/* insert into list, remembering new ref */
new->next = old;
new->buf->refcnt++;
new->buf->firstref = new;
}

In this case, making a second copy of a very long string is
quite cheap. So is making a sub-string out of an existing
string:

/*
* Shrink a string by removing "frontoff" charcters from the
* front, and "backoff" characters from the back. The frontoff
* may be negative to extend the string back to its original length
* although typically exactly one will be zero (remove head or
* tail part of string). The backoff must be nonnegative
* (because tail parts of buffers are not necessarily valid).
*/
void String_shrink(struct String *s, int frontoff, int backoff) {
if (frontoff) {
if (frontoff < 0) {
size_t maxshrink = s->bytes - s->buf->base;

/* NB: this can be optimized to fall into the "else" */
frontoff = -frontoff;
if (frontoff > maxshrink)
frontoff = maxshrink;
s->len += frontoff;
s->bytes -= frontoff;
} else {
if (s->slen < frontoff)
frontoff = s->len;
s->len -= frontoff;
s->bytes += frontoff;
}
}
if (backoff) {
if (backoff < 0)
panic("bad call to String_shrink");
if (s->len < backoff)
backoff = s->len;
s->len -= backoff;
}
}

Now, of course, in order to *modify* the *contents* of a string,
we have to check whether the string is shared, and if so, "break"
the sharing:

/* inline */ struct String *String_preptomod(struct String *s) {
return (s->buf->refcnt == 1) ? s : String_private_copy(s);
}

[without complicated C++ style mechanisms,] this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

On its own it's not enough; with the extra workarounds you need, you end
up with C++ (or some other kind of "octopus made by nailing four extra
legs onto a dog").

In article <11**********************@i40g2000cwc.googlegroups .com>
REH <sp******@stny.rr.com> wrote:I still don't get your point.


OK: so write the "operator" functions for +, -, and * above and
tell us what happens to any intermediate copies of the String
structures that are created by each addition, subtraction, and
multiply.

Show us the code, and "we" (Ben C and I, perhaps) will show you
where you have re-invented the (detailed and hairy) C++ mechanisms
(or, contrariwise, have assumed that your underlying language has
garbage collection, so that temporary objects can be created and
then thrown away without calling "constructor" and "destructor"
functions on references, reference-copies, etc.; if you do have
constructors and destructors, you also have to decide whether such
functions can or must be "virtual" or not, and so on).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
May 4 '06 #226
Bill Pursell a écrit :
jacob navia wrote:
Ben C a écrit (regarding operator overloading)
But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).


Why not?

Suppose Matrix A,B,C;

C = A+B;

<snip>
Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

Do you also propose to avoid using '*' to represent
matrix multiplication on the basis that matrix multiplication is not
commutative?


Matrix multiplication is a multiplication. Granted, not commutative but
a multiplication. In other languages, for instance APL that has an
operator for matrix multiplication, there are TWO signs: one ('*') for
normal multiplication, and another (a box enclosing another sign) to
denote matrix multiplication to clearly distinguish both operations.

Of course this is a matter more of taste but in the case of strings
there isn't any mathematical operation performed in those strings. Not
even a set operation. Take for instance

"Hello World" - "World"

Is the result "Hello " ???

Are we adding or subtracting things?

Surely not.

I want to introduce operator overloading into C but I am not for ANY
application of operator overloading. It has been pointed out that
overloading could lead to excessive temporaries construction, that would
be far more efficiently handled in a normal C syntax with careful code.

This is not a problem for small structures, but it could be a show
stopper for large structures like matrices for instance, where
efficiency would be more important that syntactic sugar.
Another problem that bothers me (and is so far unsolved) is the problem
of taking the address of an operator function. What should be the syntax
in that case?

For instance:

int128 operator+(int128 a,int128 b);

typedef int128 (*i128add)(int128 a, int128 b);

i128add = operator+(i128 a,i128 b); /// This ?

jacob
May 4 '06 #227
CBFalconer <cb********@yahoo.com> wrote:
And, if you write the library in truly portable C, without any
silly extensions and/or entanglements, you just compile the library
module. All the compiler vendor need to do is meet the
specifications of the C standard.

Simple, huh?


That all depends on the license under which the source code was
released. Linking a bunch of C libraries under various licenses can
involve non-trivial amounts of legal hassle to ensure compliance.

Also, there's something to be said for having features built into the
standard library. Besides making things easier from a legal point of
view, it means you can spend that much less time evaluating multiple
solutions, since most of the time, you'll just use the implementation
already available in the standard library.

I know it's unpopular around these parts to utter such heresy, but I,
for one, would love it if the standard C library included support for
smarter strings, hash tables, and linked lists.

Then again, I'm certainly NOT advocating these things should be added
to the standard C library. I recognize C for what it is, and use it
where it's appropriate. There are other languages that offer those
features. But that doesn't stop me from wanting those features in C.
May 4 '06 #228
jacob navia <ja***@jacob.remcomp.fr> writes:
[...]
Besides, I think that using the addition operator to "add" strings is
an ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE.


<OT>
Assuming the existence of operator overloading (which, once again,
standard C, the topic of this newsgroup, does not have), using "+" for
string concatenation makes at least sense as much as using "<<" and
">>" for I/O. (I know you haven't advocated that either, but it's
established practice in C++.) And I really don't have much problem
with the idea of a "+" operator being non-commutative -- just as a "*"
operator for matrices would be non-commutative. If you don't like it,
by all means don't use it -- but if you provide operator overloading
in your compiler, users *will* use it in ways that you don't like.

The point of operator overloading is to provide a notational shorthand
for something that could be expressed equivalently but more verbosely
using function calls. It isn't to provide something that absolutely
must follow the rules of mathematics. What would a mathematician
unfamiliar with computer programming think of "x = x + 1"?
</OT>

<WAY_OT>
Ada has a separate operator, "&", for array concatenation.
</WAY_OT>

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
May 4 '06 #229
Chris Torek a écrit :
[horrible string "math" snipped]

OK: so write the "operator" functions for +, -, and * above and
tell us what happens to any intermediate copies of the String
structures that are created by each addition, subtraction, and
multiply.

Show us the code, and "we" (Ben C and I, perhaps) will show you
where you have re-invented the (detailed and hairy) C++ mechanisms
(or, contrariwise, have assumed that your underlying language has
garbage collection, so that temporary objects can be created and
then thrown away without calling "constructor" and "destructor"
functions on references, reference-copies, etc.; if you do have
constructors and destructors, you also have to decide whether such
functions can or must be "virtual" or not, and so on).


Chris:

1) Operator overloading is NOT a good application for strings, as I have
argued in another thread in this same discussion. lcc-win32 does NOT
support addition of strings nor any math operation with them.

a+b != b+a
"Hello" + "World != "World" + "Hello"

2) Operator overloading does NOT need any constructors, nor destructors
nor the GC if we use small objects:

int128 a,b,c,d;

a = (b+c)/(b-d);

This will be translated by lcc-win32 to

tmp1 = operator+(b,c);
tmp2 = operator-(b,d);
tmp3 = operator/(tmp1,tmp2);
a = tmp3;

The temporary values are automatically allocated in the stack.

Of course if you have interior pointers those intermediate structures
must be registered so that the storage can be freed. This is solved, as
you say, with a GC. lcc-win32 offers a GC in the standard distribution,
and allows to have the best of both worlds: the easy of C++ destructors
that take care of memory managemnt WITHOUT PAYING THE PRICE of C++
complexity.

If you do not want the GC, just make a linked list with all the
allocations you make in the "constructor" (say in the new_string()
function) and periodically clean them up.
May 4 '06 #230
jacob navia wrote:

The crucial point in this is to know when to stop. There are NO
constructors/destructors in C, and none of the proposed extensions
proposes that.

Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE.


If C were to have a string type and operator overloading and it didn't
have '+' for strings, the first thing people would do is write one! It
may be syntactic sugar, but it's very convenient sugar.

--
Ian Collins.
May 4 '06 #231
In article <44***********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:
It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates.


mid_date = (start_date + end_date) / 2;

-- Richard
May 4 '06 #232
Richard Tobin a écrit :
In article <44***********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates.

mid_date = (start_date + end_date) / 2;

-- Richard


Excuse me but what does it mean

Sep-25-1981 + Dec-22-2000

If you figure out what THAT means then please explain.

You obviously meant:
mid_date = (end_date - start_date)/2

The *subtraction* of two dates yields a time interval
May 4 '06 #233
Ian Collins a écrit :
jacob navia wrote:
The crucial point in this is to know when to stop. There are NO
constructors/destructors in C, and none of the proposed extensions
proposes that.

Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE.

If C were to have a string type and operator overloading and it didn't
have '+' for strings, the first thing people would do is write one! It
may be syntactic sugar, but it's very convenient sugar.


Well, in this same thread Chris Torek posted this:

String a = ((b + c) - "zog") * " ";

where string "addition" means "concatenate" (the usual definition
for string addition), "subtraction" means "remove the first copy
of the target string, if there is one", and string "multiplication"
means "repeatedly insert this string". Hence if b and c hold "xyz"
and "ogle" respectively, the "sum" is "xyzogle", subtracting "zog"
yields "xyle", and multiplying by " " yields "x y l e " (including
the final space).

:-)
May 4 '06 #234
Flash Gordon wrote:
we******@gmail.com wrote:
Flash Gordon wrote:
we******@gmail.com wrote:
Ben C wrote:
> On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
>> CBFalconer wrote:
>>> we******@gmail.com wrote:
>>>> CBFalconer wrote:
>>> ... snip ...
>>>>> The last time I took an (admittedly cursory) look at Bstrlib, I
>>>>> found it cursed with non-portabilities
>>>> You perhaps would like to name one?
>>> I took another 2 minute look, and was immediately struck by the use
>>> of int for sizes, rather than size_t. This limits reliably
>>> available string length to 32767.
> [snip]
>
>>> [...] I did find an explanation and
>>> justification for this. Conceded, such a size is probably adequate
>>> for most usage, but the restriction is not present in standard C
>>> strings.
>> Your going to need to conceed on more grounds than that. There is a
>> reason many UNIX systems tried to add a ssize_t type, and why TR 24731
>> has added rsize_t to their extension. (As a side note, I strongly
>> suspect the Microsoft, in fact, added this whole rsize_t thing to TR
>> 24731 when they realized that Bstrlib, or things like it, actually has
>> far better real world safety because its use of ints for string
>> lengths.) Using a long would be incorrect since there are some systems
>> where a long value can exceed a size_t value (and thus lead to falsely
>> sized mallocs.) There is also the matter of trying to codify
>> read-only and constant strings and detecting errors efficiently
>> (negative lengths fit the bill.) Using ints is the best choice
>> because at worst its giving up things (super-long strings) that nobody
>> cares about,
> I think it's fair to expect the possibility of super-long strings in a
> general-purpose string library.
Ok, so you can name a single application of such a thing right?
Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.
So now name the platform where its *possible* to deal with this, but
where Bstrlib fails to be able to deal with them due to its design
choices.


If the DOS port hadn't been dropped then depending on the compiler we
might have hit this. A significant portion of the SW I'm thinking of
originated on DOS, so it could have hit it.


Oh ... I think of DOS as exactly the case where this *can't* happen.
Single objects in 16bit DOS have a size limit of 64K (size_t is just
unsigned which is 16 bits), so these huge RTF files you are talking
about *have* to be streamed, or split over multiple allocations
anyways.
>> it allows in an efficient way for all desirable encoding scenarios,
>> and it avoids any wrap around anomolies causing under-allocations.
> What anomalies? Are these a consequence of using signed long, or
> size_t?
I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.
Is an extra byte (or word, or double word) for a flags field really that
big an overhead?


I need two *bits* for flags, and I want large ranges to catch errors in
the scalar fields (this is a *safe* library). An extra struct entry is
the wrong way to do this because it doesn't help my catch errors in the
scalar fields, and its space inefficient.

ssize_t would have been a reasonable *functional* choice, but its not
standard. size_t is no good because it can't go negative. long int is
no good because there are plenty of real platforms where long int is
larger than size_t. int solves all the main real problems, and as a
bonus the compiler is designed to make sure its the fastest scalar
primitive available.


Strangely enough, when a previous developer on the code I'm dealing with
thought he could limit size to a "valid" range an assert if it was out
of range we found that the asserts kept getting triggered. However, it
was always triggered incorrectly because the size was actually valid!


And how is this connected with Bstrlib? The library comes with a test
that, if you run in a 16 bit environment, will exercise length
overflowing. So you have some reasonable assurance that Bstrlib does
not make obvious mistakes with size computations.
[...] So I'll stick to not artificially limiting sizes.
And how do you deal with the fact that the language limits your sizes
anyways?
[...] If the administrator of a
server the SW is installed on wants then s/he can use system specific
means to limit the size of a process.


What? You think the adminstrator is in charge of how the compiler
works?

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 4 '06 #235
In article <44**************@jacob.remcomp.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:
mid_date = (start_date + end_date) / 2;
Excuse me but what does it mean

Sep-25-1981 + Dec-22-2000
Just because the sum of two dates is not a date doesn't mean that
it doesn't mean anything.
You obviously meant:

mid_date = (end_date - start_date)/2
No I didn't. That is something completely different.
The *subtraction* of two dates yields a time interval


True, and (end_date - start_date) / 2 would give me half the interval
between the dates, but that is not what I wanted. I wanted the
average of the dates, which is a date.

(Sep-25-1981 + Dec-22-2000) / 2 would be the date mid-way between
Sep-25-1981 and Dec-22-2000, just as (45 + 78) / 2 is the integer
mid-way between 45 and 78.

-- Richard
May 4 '06 #236
Richard Tobin a écrit :
In article <44**************@jacob.remcomp.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:

mid_date = (start_date + end_date) / 2;


Excuse me but what does it mean

Sep-25-1981 + Dec-22-2000

Just because the sum of two dates is not a date doesn't mean that
it doesn't mean anything.

You obviously meant:

mid_date = (end_date - start_date)/2

No I didn't. That is something completely different.

The *subtraction* of two dates yields a time interval

True, and (end_date - start_date) / 2 would give me half the interval
between the dates, but that is not what I wanted. I wanted the
average of the dates, which is a date.

(Sep-25-1981 + Dec-22-2000) / 2 would be the date mid-way between
Sep-25-1981 and Dec-22-2000, just as (45 + 78) / 2 is the integer
mid-way between 45 and 78.

-- Richard


Ahh ok, you mean then

mid_date = startdate + (end_date-start_date)/2

A date + a time interval is a date later than the start date.
May 4 '06 #237
Flash Gordon wrote:
we******@gmail.com wrote:
Flash Gordon wrote:
jacob navia wrote:
Flash Gordon a écrit :
> Is an extra byte (or word, or double word) for a flags field really
> that big an overhead?
Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision.
I've yet to see software where short strings made up a significant
portion of the memory footprint and saving the memory that avoiding the
flags would be of real use. Of course, such applications might exist.
Any program that reads words from any language dictionary. Like a
spell checker, or a word puzzle solver/creator, or a spam filter. For
dictionaries the size of the english language dictionary, these kinds
of applications can typically push the L2 cache of your CPU pretty
hard.


I never said they didn't exist.


I think the point is that there are *many* such application. In fact I
would be suspicious of anyone who claimed to be an experienced
programmer who hasn't *written* one of these.
[...] However, a typical dictionary + the
structures is not going to fit in my L2 cache anyway. However, the
subset of it that is likely to be actually in use is probably an order
of magnitude smaller and so could easily fit in with the extra overhead.
Its more *likely* if the data is compacted. Another way of saying
this, is that for any overflowing data set with a locality bias with
perform better monotonically with how well it fits in the cache. I.e.,
everything you save improves some percentage of performance.
Alternatively, one could go to conventional C strings and have a bigger
chance of it fitting since they only have a 1 byte overhead compared to
probably an 8 byte overhead (4 byte int for lenght, 4 byte int for
memory block size) that it sounds like your library has. Even if your
library only has a 4 byte overhead it is still larger!
Yes, but you eat a huge additional O(strlen) penality for very *many*
typical operations. So Bstrlib makes the trade off where the more
common scenarios are faster.
Personally I would say that using negative lengths was asking for
problems because at some point a negative length will be checked without
first changing it to positive.


I think you miss the point. If the string length is negative then it
is erroneous. That's the point of it. But the amount of memory
allocated being negative, I use to indicate that the memory is not
legally modifiable at the moment, and being 0 meaning that its not
modifiable ever. The point being that the library blocks erroneous
action due to intentionally or unintentionally having bad header values
in the same test. So it reduces overhead, while increasing safety and
functionality at the same time.


If you are trying to detect corruption then you should also be checking
that the length is not longer than the memory block, so you should be
doing more than one comparison anyway.


Yes, it does that as well. So you really are talking out of your ass.
This is in the first couple pages of the documentation, and strewn
throughout the source code.
[...] Then you can easily check if any unused flag bits are non-0.


Yes, this is an alternative -- but its less safe and slower, so why
would I do it this way?
You know, you can actually read the explanation of all this in the
documentation if you care to do so.


Probably true.

It may well be that the performance gain is worth it for the
applications people use your library for. If so then fine, but the
limitation means it is not worth me migrating to it.


Probably not true. But you won't look at it anyways, so I won't waste
my breath.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 4 '06 #238
In article <44***********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:
mid_date = (start_date + end_date) / 2;
Ahh ok, you mean then

mid_date = startdate + (end_date-start_date)/2


Your attitude is baffling. You deny that adding dates makes sense,
and when I post an example where adding dates makes perfect sense, you
respond by asserting that I mean some other expression that achieves
that same effect. The mere fact that you were able to post another
expression with the same meaning refutes your original claim.

-- Richard
May 4 '06 #239
In article <e3**********@pc-news.cogsci.ed.ac.uk>, I wrote:
Just because the sum of two dates is not a date doesn't mean that
it doesn't mean anything.


Just in case anyone has not noticed, this is really just a re-run of
pointer addition with dates instead of pointers.

The reason for not allowing (date|pointer) addition is not that it
doesn't make sense, but that the gain isn't worth the mechanism
required.

-- Richard
May 4 '06 #240
we******@gmail.com wrote:
Flash Gordon wrote:
we******@gmail.com wrote:
Flash Gordon wrote:
we******@gmail.com wrote:
> Ben C wrote:
>> On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
>>> CBFalconer wrote:
>>>> we******@gmail.com wrote:
>>>>> CBFalconer wrote:
>>>> ... snip ...
>>>>>> The last time I took an (admittedly cursory) look at Bstrlib, I
>>>>>> found it cursed with non-portabilities
>>>>> You perhaps would like to name one?
>>>> I took another 2 minute look, and was immediately struck by the use
>>>> of int for sizes, rather than size_t. This limits reliably
>>>> available string length to 32767.
>> [snip]
>>
>>>> [...] I did find an explanation and
>>>> justification for this. Conceded, such a size is probably adequate
>>>> for most usage, but the restriction is not present in standard C
>>>> strings.
>>> Your going to need to conceed on more grounds than that. There is a
>>> reason many UNIX systems tried to add a ssize_t type, and why TR 24731
>>> has added rsize_t to their extension. (As a side note, I strongly
>>> suspect the Microsoft, in fact, added this whole rsize_t thing to TR
>>> 24731 when they realized that Bstrlib, or things like it, actually has
>>> far better real world safety because its use of ints for string
>>> lengths.) Using a long would be incorrect since there are some systems
>>> where a long value can exceed a size_t value (and thus lead to falsely
>>> sized mallocs.) There is also the matter of trying to codify
>>> read-only and constant strings and detecting errors efficiently
>>> (negative lengths fit the bill.) Using ints is the best choice
>>> because at worst its giving up things (super-long strings) that nobody
>>> cares about,
>> I think it's fair to expect the possibility of super-long strings in a
>> general-purpose string library.
> Ok, so you can name a single application of such a thing right?
Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.
So now name the platform where its *possible* to deal with this, but
where Bstrlib fails to be able to deal with them due to its design
choices.

If the DOS port hadn't been dropped then depending on the compiler we
might have hit this. A significant portion of the SW I'm thinking of
originated on DOS, so it could have hit it.


Oh ... I think of DOS as exactly the case where this *can't* happen.
Single objects in 16bit DOS have a size limit of 64K (size_t is just
unsigned which is 16 bits), so these huge RTF files you are talking
about *have* to be streamed, or split over multiple allocations
anyways.


Strangely enough there have been ways of having objects larger than 64K
in DOS. At least, given a 386 and some extensions.
>>> it allows in an efficient way for all desirable encoding scenarios,
>>> and it avoids any wrap around anomolies causing under-allocations.
>> What anomalies? Are these a consequence of using signed long, or
>> size_t?
> I am describing what int does (*BOTH* the encoding scenarios and
> avoiding anomolies), Using a long int would allow for arithmetic on
> numbers that exceed the maximum value of size_t on some systems (that
> actually *exist*), so when there was an attempt to malloc or realloc on
> such sizes, there would be a wrap around to some value that would just
> make it screw up. And if I used a size_t, then there would be no
> simple space of encodings that can catch errors, constants and write
> protected strings.
Is an extra byte (or word, or double word) for a flags field really that
big an overhead?
I need two *bits* for flags, and I want large ranges to catch errors in
the scalar fields (this is a *safe* library). An extra struct entry is
the wrong way to do this because it doesn't help my catch errors in the
scalar fields, and its space inefficient.

ssize_t would have been a reasonable *functional* choice, but its not
standard. size_t is no good because it can't go negative. long int is
no good because there are plenty of real platforms where long int is
larger than size_t. int solves all the main real problems, and as a
bonus the compiler is designed to make sure its the fastest scalar
primitive available.

Strangely enough, when a previous developer on the code I'm dealing with
thought he could limit size to a "valid" range an assert if it was out
of range we found that the asserts kept getting triggered. However, it
was always triggered incorrectly because the size was actually valid!


And how is this connected with Bstrlib? The library comes with a test
that, if you run in a 16 bit environment, will exercise length
overflowing. So you have some reasonable assurance that Bstrlib does
not make obvious mistakes with size computations.


You are assuming I won't want an object larger than can be represented
in an int. That is an artificial limitation.
[...] So I'll stick to not artificially limiting sizes.


And how do you deal with the fact that the language limits your sizes
anyways?


You are artificially reducing the limit below what the language allows
for. The language is not artificially reducing it below what the
language allows.
[...] If the administrator of a
server the SW is installed on wants then s/he can use system specific
means to limit the size of a process.


What? You think the adminstrator is in charge of how the compiler
works?


No, but the SW I'm dealing with is run on systems where the
administrator can limit process size, maximum CPU usage and lots of
other good stuff. Or the administrator can leave it unlimited (i.e.
limited by available resources). You really should try an OS that gives
real power and flexibility one day.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Inviato da X-Privat.Org - Registrazione gratuita http://www.x-privat.org/join.php
May 4 '06 #241
Richard Tobin wrote:
In article <44**************@jacob.remcomp.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:
mid_date = (start_date + end_date) / 2;

Excuse me but what does it mean

Sep-25-1981 + Dec-22-2000


Just because the sum of two dates is not a date doesn't mean that
it doesn't mean anything.
You obviously meant:

mid_date = (end_date - start_date)/2


No I didn't. That is something completely different.
The *subtraction* of two dates yields a time interval


True, and (end_date - start_date) / 2 would give me half the interval
between the dates, but that is not what I wanted. I wanted the
average of the dates, which is a date.

(Sep-25-1981 + Dec-22-2000) / 2 would be the date mid-way between
Sep-25-1981 and Dec-22-2000, just as (45 + 78) / 2 is the integer
mid-way between 45 and 78.

-- Richard


Adding date values is nonsense. Subtracting one date from another to
yield integer days between two dates is very handy. Adding (or
subtracting) integer days to (or from) a date yielding a date is handy
too. Look at this ..

set century on // prints 1981 instead of 81
dbeg = ctod("09/25/1981") // convert character string to date type
dend = ctod("12/22/2000")
diff = dend - dbeg // 7028 days between two dates
? dbeg, dend, diff
dmid = dbeg + diff / 2 // begin date + 3514 days, yielding date type
? dmid // 05/10/1991

... in xBase, the language of dBASE, FoxPro, Clipper and xHarbour. While
C is my favorite language, my employer pays for xBase. I have a hobby
project to translate some of the more useful xBase stuff into C.

Note that ? is a print command in xBase. It prints a leading newline and
then the values of its arguments, separated by a space character.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
May 5 '06 #242
REH

"Ben C" <sp******@spam.eggs> wrote in message
news:sl*********************@bowser.marioworld...
Yes I know. But you do get constructors, destructors and references, so
you can fit explicit memory management "under the hood" of operator
overloading.
I can understood people's dislike of references (though I don't agree with
the reasons), but what is wrong with constructors and destructors?

Show me the string example, and hopefully either you will get my point
or I will get yours :)


I'd rather understand what you think is wrong this constructors. My
previous example can be written with constructors and will generate code
that is as efficient, if not more so, than without.

REH
May 5 '06 #243
Ed Jensen wrote:
CBFalconer <cb********@yahoo.com> wrote:
And, if you write the library in truly portable C, without any
silly extensions and/or entanglements, you just compile the library
module. All the compiler vendor need to do is meet the
specifications of the C standard.

Simple, huh?
That all depends on the license under which the source code was
released. Linking a bunch of C libraries under various licenses can
involve non-trivial amounts of legal hassle to ensure compliance.


If you publish your source under GPL, there is very little chance
of conflicts. In the case of things I have originated, all you
have to do is contact me to negotiate other licenses. I can be
fairly reasonable on months with a 'R' in them.

Also, there's something to be said for having features built into the
standard library. Besides making things easier from a legal point of
view, it means you can spend that much less time evaluating multiple
solutions, since most of the time, you'll just use the implementation
already available in the standard library.

I know it's unpopular around these parts to utter such heresy, but I,
for one, would love it if the standard C library included support for
smarter strings, hash tables, and linked lists.
No, there is nothing wrong with expanding the standard library.
Nothing forces anyone to use such components anyhow. There is
provision in the standard for "future library expansion". This is
a far cry from bastardizing the language with overloaded operators
and peculiar non-standard syntax, as recommended by some of the
unwashed.

Then again, I'm certainly NOT advocating these things should be added
to the standard C library. I recognize C for what it is, and use it
where it's appropriate. There are other languages that offer those
features. But that doesn't stop me from wanting those features in C.


Go ahead and advocate. I would certainly like to see at least
strlcpy/cat in the next standard, with gets removed, and possibly
my own hashlib and ggets added. What all of those things are is
completely described in terms of the existing C standards, so the
decisions can be fairly black and white.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
May 5 '06 #244
Chris Torek wrote:
.... snip ...
Indeed. Suppose the String data structure is much like Paul Hsieh's
favorite, but perhaps with a few more bells and whistles (I have not
looked at his implementation):


Neither have I, beyond a cursory glance. However I did see that
the fundamental object involved is a struct, which contains a
length, a capacity, and a pointer to actual string data as an array
of char. This is an organization that has been in use for many
years in GNU Pascal. There are still awkwardnesses in its use,
such as the equivalent of a union of two strings, and how to handle
the capacity value. GPC does this by making such a union an actual
structure, with separate fields. But, by and large, it is a
familiar organization.

Any of these so-called advanced organizations have to give up
something, be it code compactness, efficiency, other limitations.
There are very few limitations to the null terminated string, which
is why it has endured. There are, however, many traps for the
unwary. This is the hallmark of virtually all C code.

You pays your money and you takes your choice.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
May 5 '06 #245
Flash Gordon wrote:
we******@gmail.com wrote:
Flash Gordon wrote:
we******@gmail.com wrote:
Flash Gordon wrote:
> we******@gmail.com wrote:
>> Ben C wrote:
>>> On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
>>>> CBFalconer wrote:
>>>>> we******@gmail.com wrote:
>>>>>> CBFalconer wrote:
>>>>> ... snip ...
>>>>>>> The last time I took an (admittedly cursory) look at Bstrlib, I
>>>>>>> found it cursed with non-portabilities
>>>>>> You perhaps would like to name one?
>>>>> I took another 2 minute look, and was immediately struck by the use
>>>>> of int for sizes, rather than size_t. This limits reliably
>>>>> available string length to 32767.
>>> [snip]
>>>
>>>>> [...] I did find an explanation and
>>>>> justification for this. Conceded, such a size is probably adequate
>>>>> for most usage, but the restriction is not present in standard C
>>>>> strings.
>>>> Your going to need to conceed on more grounds than that. There is a
>>>> reason many UNIX systems tried to add a ssize_t type, and why TR 24731
>>>> has added rsize_t to their extension. (As a side note, I strongly
>>>> suspect the Microsoft, in fact, added this whole rsize_t thing to TR
>>>> 24731 when they realized that Bstrlib, or things like it, actually has
>>>> far better real world safety because its use of ints for string
>>>> lengths.) Using a long would be incorrect since there are some systems
>>>> where a long value can exceed a size_t value (and thus lead to falsely
>>>> sized mallocs.) There is also the matter of trying to codify
>>>> read-only and constant strings and detecting errors efficiently
>>>> (negative lengths fit the bill.) Using ints is the best choice
>>>> because at worst its giving up things (super-long strings) that nobody
>>>> cares about,
>>> I think it's fair to expect the possibility of super-long strings in a
>>> general-purpose string library.
>> Ok, so you can name a single application of such a thing right?
> Handling an RTF document that you will be writing to a variable length
> record in a database. Yes, I do have good reason for doing this. No, I
> can't stream the document in to the database so I do have to have it all
> in memory. Yes, RTF documents are encoded as text. Yes, they can be
> extremely large, especially if they have graphics embedded in them
> encoded as text.
So now name the platform where its *possible* to deal with this, but
where Bstrlib fails to be able to deal with them due to its design
choices.
If the DOS port hadn't been dropped then depending on the compiler we
might have hit this. A significant portion of the SW I'm thinking of
originated on DOS, so it could have hit it.


Oh ... I think of DOS as exactly the case where this *can't* happen.
Single objects in 16bit DOS have a size limit of 64K (size_t is just
unsigned which is 16 bits), so these huge RTF files you are talking
about *have* to be streamed, or split over multiple allocations
anyways.


Strangely enough there have been ways of having objects larger than 64K
in DOS. At least, given a 386 and some extensions.


For actual storage, you need go no further than a 8086, which could be
equipped with up to 640K of memory without issue. But of course,
that's not what's at issue here. Its a question of what size_t is on
those platforms. In all the 16 bit mode compilers I am aware of,
size_t (and int) is a 16 bit unsigned integer, which by the C standard,
says a single object cannot be more than 64K. This is a real issue
when you realize that if you perform a strcat on two strings each
greater than 32K, you get an undefined result, (because the C
specification is just a worthless in this respect).

If you want to use the 32 bit instruction x86 sets and a DOS extender,
you can use one of the 32 bit compilers, but here size_t is a 32 bit
unsigned integer (as is int.)

Perhaps you might want to refrain from chiming in about things you know
very little about; I mean seriously, are *YOU* trying to tell *ME* how
DOS works? Are you kidding me?
>>>> it allows in an efficient way for all desirable encoding scenarios,
>>>> and it avoids any wrap around anomolies causing under-allocations.
>>> What anomalies? Are these a consequence of using signed long, or
>>> size_t?
>> I am describing what int does (*BOTH* the encoding scenarios and
>> avoiding anomolies), Using a long int would allow for arithmetic on
>> numbers that exceed the maximum value of size_t on some systems (that
>> actually *exist*), so when there was an attempt to malloc or realloc on
>> such sizes, there would be a wrap around to some value that would just
>> make it screw up. And if I used a size_t, then there would be no
>> simple space of encodings that can catch errors, constants and write
>> protected strings.
> Is an extra byte (or word, or double word) for a flags field really that
> big an overhead?
I need two *bits* for flags, and I want large ranges to catch errors in
the scalar fields (this is a *safe* library). An extra struct entry is
the wrong way to do this because it doesn't help my catch errors in the
scalar fields, and its space inefficient.

ssize_t would have been a reasonable *functional* choice, but its not
standard. size_t is no good because it can't go negative. long int is
no good because there are plenty of real platforms where long int is
larger than size_t. int solves all the main real problems, and as a
bonus the compiler is designed to make sure its the fastest scalar
primitive available.
Strangely enough, when a previous developer on the code I'm dealing with
thought he could limit size to a "valid" range an assert if it was out
of range we found that the asserts kept getting triggered. However, it
was always triggered incorrectly because the size was actually valid!


And how is this connected with Bstrlib? The library comes with a test
that, if you run in a 16 bit environment, will exercise length
overflowing. So you have some reasonable assurance that Bstrlib does
not make obvious mistakes with size computations.


You are assuming I won't want an object larger than can be represented
in an int. That is an artificial limitation.


size_t is also a similar artificial limitation. The fact that arrays
can only take certain kinds of scalars as index parameters is also an
artificial limitation. But it turns out that basically every language
and every array-like or string-like (with the notable exceptions of Lua
and Python) has a similar kind of limitation.
[...] So I'll stick to not artificially limiting sizes.


And how do you deal with the fact that the language limits your sizes
anyways?


You are artificially reducing the limit below what the language allows
for. The language is not artificially reducing it below what the
language allows.


One of these statements is circular reasoning. See if you can figure
out which one it is.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 5 '06 #246
jacob navia <ja***@jacob.remcomp.fr> wrote:
Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE.


Quaternions must have come as a shock to you. Or does a*b != b*a somehow
make more sense to you than a+b != b+a?

Richard
May 5 '06 #247
On 2006-05-04, Richard Tobin <ri*****@cogsci.ed.ac.uk> wrote:
In article <44***********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:
>mid_date = (start_date + end_date) / 2;

Ahh ok, you mean then

mid_date = startdate + (end_date-start_date)/2


Your attitude is baffling. You deny that adding dates makes sense,
and when I post an example where adding dates makes perfect sense, you
respond by asserting that I mean some other expression that achieves
that same effect. The mere fact that you were able to post another
expression with the same meaning refutes your original claim.


Mr Navia's attitude makes sense if you think of dates in "homogenous
coordinates".

It's common in 3D graphics to use 4-vectors to represent positions and
directions. A position has a 1 in its last element, and a direction has a
0.

I say directions, the vectors are not necessarily normalized, so they
are "directions with magnitude".

Positions implicitly mean "the place you get to if you start at the
origin and add the 3D part of the vector".

Directions-with-magnitude are not implicitly based at the origin. You
can add a d-with-m to a position to get to a new position.

[a0, a1, a2, 1] + [m0, m1, m2, 0] = [b0, b1, b2, 1]

If we do this as a 4D vector add, the result ends up correctly with a 1
in the 4th element-- it's a position.

Other implementation conveniences arise from this approach-- you can use
the last column of a 4D matrix to represent a translation. Applying the
matrix to a vector will rotate and then translate positions, but will
just rotate and not translate d-with-ms, because the 0 in the 4th
element will select out the last column in the matrix multiply.

Using this system, you should be able to do everything with straight 4D
matrix arithmetic, and if you ever end up with a 2 or a -1, or anything
that isn't 0 or 1 in the 4th element of a vector, you've done something
wrong.

Adding two positions, for example, gives you a 2 in that 4th element.
And, thinking of it geometrically, it doesn't make a lot of sense
because positions are implicitly "translations from the origin", so you
can't translate one position from another position.

Well, we can represent time in a 1D space and use 2D "homogenous
coordinates":

[100, 0] means "100 seconds forwards"
[-100, 0] means "100 seconds ago"
[100, 1] means "100 seconds since 1970-01-01T00:00"

In exactly the same way we distinguish between a length of time, and a
length of time that implicitly starts at the origin.

start_date + (end_date - start_date) / 2

doesn't generate any invalid last-elements in any intermediate results,
but

(start_date + end_date) / 2

does.

In Python's datetime module, subtracting two dates returns a "timedelta"
object, which can be added to a date. But two dates cannot be added.

This seems a sensible way to do it, and if you wanted to do it in C++, I
think you'd overload global operators, not member function operators:

Timedelta& operator-(const Date& a, const Date& b);
Date& operator+(const Date& a, const Timedelta& delta);
Timedelta& operator+(const Timedelta& a, const Timedelta& b);

etc. You could make a perfectly usable system this way, and I'd say that
using operators for dates is no more or less sane or insane than using
them for matrices and vectors.
May 5 '06 #248
On 2006-05-05, REH <me@you.com> wrote:

"Ben C" <sp******@spam.eggs> wrote in message
news:sl*********************@bowser.marioworld...
Yes I know. But you do get constructors, destructors and references, so
you can fit explicit memory management "under the hood" of operator
overloading.
I can understood people's dislike of references (though I don't agree with
the reasons), but what is wrong with constructors and destructors?


Nothing, I like constructors and destructors.
May 5 '06 #249
jacob navia <ja***@jacob.remcomp.fr> wrote:
2) Operator overloading does NOT need any constructors, nor destructors
nor the GC if we use small objects:

int128 a,b,c,d;

a = (b+c)/(b-d);


You keep repeating this as one of the prime examples (in fact, the only
consistent example) of why overloading is so useful in your suite. Don't
you realise that C99 allows any implementation to define any size
integers without requiring overloading at all?

Richard
May 5 '06 #250

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

65
by: perseus | last post by:
I think that everyone who told me that my question is irrelevant, in particular Mr. David White, is being absolutely ridiculous. Obviously, most of you up here behave like the owners of the C++...
205
by: Jeremy Siek | last post by:
CALL FOR PAPERS/PARTICIPATION C++, Boost, and the Future of C++ Libraries Workshop at OOPSLA October 24-28, 2004 Vancouver, British Columbia, Canada http://tinyurl.com/4n5pf Submissions
17
by: Howard Gardner | last post by:
/* If I am using boost, then how should I write this program? As it sits, this program is using SFINAE to determine whether or not a type supports particular syntax. I suspect that there is...
2
by: smith4894 | last post by:
{ not sure you're aware of that but there are the newsgroups for all major operating systems. you might want to try asking in the forum 'comp.os.linux.development.apps', since memory-mapped files...
5
by: linyanhung | last post by:
I used a boost multi thread in VS 2005 on a Duo Core PC, and made a two thread process. The code is something like this: #include <boost/thread/thread.hpp> void fun1() { //do something
8
by: Matt England | last post by:
My team currently using Boost Threads, but we are considering switching to ZThreads. (We seek cross-platform, C++ multithreading capabilities in an external library.) ZThread(s): ...
2
by: ironpingwin | last post by:
Hi! I'd like to make few threads which will run in the same time in C++. I try to use boost library v 1.34.1 (it can't be newest, because I compile on remote machine, which is not...
13
by: brad | last post by:
Still learning C++. I'm writing some regex using boost. It works great. Only thing is... this code seems slow to me compared to equivelent Perl and Python. I'm sure I'm doing something incorrect....
5
by: ameyav | last post by:
Hi All, I am converting some C code into C++ code. The objective is to improve throughput. I have some code written in C which serially parses through a list of files, opens each one of them,...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.