Boost process and C

we******@gmail.com wrote:

Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
CBFalconer wrote:
we******@gmail.com wrote:
> CBFalconer wrote:
... snip ...
>> The last time I took an (admittedly cursory) look at Bstrlib, I
>> found it cursed with non-portabilities
> You perhaps would like to name one?
I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767.

[snip]
[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,

I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.

Ok, so you can name a single application of such a thing right?

Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.

it allows in an efficient way for all desirable encoding scenarios,
and it avoids any wrap around anomolies causing under-allocations.

What anomalies? Are these a consequence of using signed long, or
size_t?

I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.

Is an extra byte (or word, or double word) for a flags field really that
big an overhead?

<snip>
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Inviato da X-Privat.Org - Registrazione gratuita http://www.x-privat.org/join.php

May 4 '06 #205

regis

Giorgos Keramidas wrote:

On Tue, 02 May 2006 23:20:47 +0200, regis <re***@dil.univ-mrs.fr> wrote:
without operator overloading, how about just an infix notation
for 2-ary functions (with, e.g., functions evaluated left to right,
all with the same priority) ?

typedef struct Vect { double x, y; } Vect;

infix Vect Vect_Sub (Vect u, Vect v) {
return (Vect) { .x= u.x - v.x, .y= u.y - v.y };
}
infix Vect Vect_Scale (double lambda, Vect u) {
return (Vect) { .x= lambda*u.x, .y= lambda*u.y };
}
infix double Vect_Dot (Vect u, Vect v) {
return u.x * v.x + u.y * v.y;
}
int main (void) {
Vect u, v, w, p, q, r, s, t;
...
t= ((v Vect_Sub u) Vect_Dot (w Vect_Sub v))
Vect_Scale (p Vect_Sub q Vect_Sub r Vect_Sub s);
...
}

No, please. This looks strangely familiar if you know LISP :P

Plus, it doesn't really work for functions with an arbitrary number of
arguments, and this creates an inconsistency in the elegantly simple
syntax of C.

I know no infix scheme for functions in Lisp.
In Lisp, This would look like:

(Vect_Scale
(Vect_Dot
(Vect_Sub v u)
(Vect_Sub w v)
)
(Vect_Sub_va p q r s)
)

which is much like it looks in C without infix notation:

Vect_Scale (
Vect_Dot (
Vect_Sub (u,v),
Vect_Sub (w,v)
),
Vect_Sub_va (p, q, r, s, ARGS_END)
);

May 4 '06 #206

Ben C a écrit :

Yes exactly, and AFAIK the kind of operator-overloading that has been
proposed for C is something like this-- it's fine for structs
representing things like complex numbers (that are a few words long and
don't contain pointers).

But this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

Why not?

Suppose Matrix A,B,C;

C = A+B;

Your operator + function would allocate the space, add the matrix to a
possible linked lists of matrices that allows to GC unused ones, and
return the results.

Or, instead of taking all this trouble you could just use the GC and
forget about destructors. All intermediate results would be
automatically garbage collected.
On its own it's not enough; with the extra workarounds you need, you end
up with C++ (or some other kind of "octopus made by nailing four extra
legs onto a dog").

The crucial point in this is to know when to stop. There are NO
constructors/destructors in C, and none of the proposed extensions
proposes that.

Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for the
fools!

If you feel that operator overloading would not solve the problem for
matrices addition, then you will have to devise other means of doing that.

The GC however, is an ELEGANT solution to all this problems. We would
have the easy of use of C++ with its automatic destructors, WITHOUT
PAYING THE PRICE in language and compiler complexity.

This last point is important: compiler complexity increases the effort
that the language implementor must do and increases the "bug surface".

The module that handles the operator overloading in lcc-win32 is 1732
lines long, including all commentaries and lines that contain just a '{'
or a '}'.

The compiled operators module is 11K machine code. All the extensions of
lcc-win32 are conceptually simple, even if operator overloading is the
most complex one. The others like generic functions are much simpler to
implement.

jacob

May 4 '06 #207

Flash Gordon a écrit :

Is an extra byte (or word, or double word) for a flags field really that
big an overhead?

Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision.

May 4 '06 #208

jacob navia wrote:

Flash Gordon a écrit :

Is an extra byte (or word, or double word) for a flags field really
that big an overhead?

Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision.

I've yet to see software where short strings made up a significant
portion of the memory footprint and saving the memory that avoiding the
flags would be of real use. Of course, such applications might exist.

Personally I would say that using negative lengths was asking for
problems because at some point a negative length will be checked without
first changing it to positive.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

May 4 '06 #209

On 2006-05-04, we******@gmail.com <we******@gmail.com> wrote:

Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
> CBFalconer wrote:
>> we******@gmail.com wrote:
>> > CBFalconer wrote:
>> ... snip ...
>> >> The last time I took an (admittedly cursory) look at Bstrlib, I
>> >> found it cursed with non-portabilities
>> >
>> > You perhaps would like to name one?
>>
>> I took another 2 minute look, and was immediately struck by the use
>> of int for sizes, rather than size_t. This limits reliably
>> available string length to 32767.

[snip]
>> [...] I did find an explanation and
>> justification for this. Conceded, such a size is probably adequate
>> for most usage, but the restriction is not present in standard C
>> strings.

> Your going to need to conceed on more grounds than that. There is a
> reason many UNIX systems tried to add a ssize_t type, and why TR 24731
> has added rsize_t to their extension. (As a side note, I strongly
> suspect the Microsoft, in fact, added this whole rsize_t thing to TR
> 24731 when they realized that Bstrlib, or things like it, actually has
> far better real world safety because its use of ints for string
> lengths.) Using a long would be incorrect since there are some systems
> where a long value can exceed a size_t value (and thus lead to falsely
> sized mallocs.) There is also the matter of trying to codify
> read-only and constant strings and detecting errors efficiently
> (negative lengths fit the bill.) Using ints is the best choice
> because at worst its giving up things (super-long strings) that nobody
> cares about,

I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.

Ok, so you can name a single application of such a thing right?
> it allows in an efficient way for all desirable encoding scenarios,
> and it avoids any wrap around anomolies causing under-allocations.

What anomalies? Are these a consequence of using signed long, or
size_t?

I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.

If it's longer than the maximum size_t value, you probably can't have it
anyway, so there's no point in being able to represent it.

Silly encoding tricks buy you nothing, just use another field with bit
flags.

> If I tried to use size_t I would give up a significant amount of
> safety and design features (or else I would have to put more entries
> into the header, making it less efficient).

If you only need a single "special" marker value (for which you were
perhaps using -1), you could consider using ~(size_t) 0.

For the mlen, I need one value that indicates a write protected string
(that can be unprotected) and one the indicates a constant (that can
never be unprotected). The slen has to be of the same type as mlen,
and so in order to check for potential errors, I set it to -1 to
indicate that it has been deterministically set to an invalid value.
Of course I could just isolate a handful of values, however, but this
makes the error space extremely small, which reduces your chances of
finding accidental full corruptions,

This shouldn't be left to chance anyway, pretending that it can be
caught invites disaster when inevitably one of the cases comes up when
it _doesn't_ get caught.

May 4 '06 #210

websnarf

Flash Gordon wrote:

we******@gmail.com wrote:
Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
CBFalconer wrote:
> we******@gmail.com wrote:
>> CBFalconer wrote:
> ... snip ...
>>> The last time I took an (admittedly cursory) look at Bstrlib, I
>>> found it cursed with non-portabilities
>> You perhaps would like to name one?
> I took another 2 minute look, and was immediately struck by the use
> of int for sizes, rather than size_t. This limits reliably
> available string length to 32767.
[snip]

> [...] I did find an explanation and
> justification for this. Conceded, such a size is probably adequate
> for most usage, but the restriction is not present in standard C
> strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.

Ok, so you can name a single application of such a thing right?

Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.

So now name the platform where its *possible* to deal with this, but
where Bstrlib fails to be able to deal with them due to its design
choices.

it allows in an efficient way for all desirable encoding scenarios,
and it avoids any wrap around anomolies causing under-allocations.
What anomalies? Are these a consequence of using signed long, or
size_t?

I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.

Is an extra byte (or word, or double word) for a flags field really that
big an overhead?

I need two *bits* for flags, and I want large ranges to catch errors in
the scalar fields (this is a *safe* library). An extra struct entry is
the wrong way to do this because it doesn't help my catch errors in the
scalar fields, and its space inefficient.

ssize_t would have been a reasonable *functional* choice, but its not
standard. size_t is no good because it can't go negative. long int is
no good because there are plenty of real platforms where long int is
larger than size_t. int solves all the main real problems, and as a
bonus the compiler is designed to make sure its the fastest scalar
primitive available.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 4 '06 #211

REH

Ben C wrote:

Why? And why do you think objects of user-defined types have to be
"allocated and freed manually"?
They don't _have_ to be, but they _might_ be.

One of the "features" of C is that the programmer has control over
memory allocation and de-allocation.

Yes, C++ has this same "feature." Memory allocation is completely
under control of the programmer.

Usually in practice this just means a lot of bugs and crashes; but there
are good reasons for it too: you can write domain-specific allocators
that are more efficient and/or tunable in the amount of space or time
they use, instead of relying on a general-purpose allocator or
garbage-collector all the time. C++ does not do GC, nor are you required to use any "general-purpose"
allocator.

The programmer also might implement things like shallow-copy and
copy-on-write.

Somehow all of these things need to happen when an expression like this
is evaluated:

string a = b + c;

In C++ the basic mechanism you use for this is constructors. For example
the string copy constructor might set up a shallow copy-on-write copy.
Someone has to write the code for that. If the programmer writes it, and
it's not just part of the framework, then it has to get implicitly
called. Yes, the programmer can write a constructor to do this. He does not
have to.

struct foo {
int x, y;
};

foo operator+ (const foo& a, const foo& b)
// for it you are of the "I hate references" camp: foo operator+ (foo a,
foo b)
{
const foo z = {a.x + b.x, a.y + b.y};
return z;
}

foo x = {1, 2};
foo y = {3, 4};
foo z = x + y;

simplistic, but no constructors.

Yes exactly, and AFAIK the kind of operator-overloading that has been
proposed for C is something like this-- it's fine for structs
representing things like complex numbers (that are a few words long and
don't contain pointers).

But this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

On its own it's not enough; with the extra workarounds you need, you end
up with C++ (or some other kind of "octopus made by nailing four extra
legs onto a dog").

I still don't get your point.

REH

May 4 '06 #212

websnarf

Flash Gordon wrote:

jacob navia wrote:
Flash Gordon a écrit :
Is an extra byte (or word, or double word) for a flags field really
that big an overhead?
Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision.

I've yet to see software where short strings made up a significant
portion of the memory footprint and saving the memory that avoiding the
flags would be of real use. Of course, such applications might exist.

Any program that reads words from any language dictionary. Like a
spell checker, or a word puzzle solver/creator, or a spam filter. For
dictionaries the size of the english language dictionary, these kinds
of applications can typically push the L2 cache of your CPU pretty
hard.
Personally I would say that using negative lengths was asking for
problems because at some point a negative length will be checked without
first changing it to positive.

I think you miss the point. If the string length is negative then it
is erroneous. That's the point of it. But the amount of memory
allocated being negative, I use to indicate that the memory is not
legally modifiable at the moment, and being 0 meaning that its not
modifiable ever. The point being that the library blocks erroneous
action due to intentionally or unintentionally having bad header values
in the same test. So it reduces overhead, while increasing safety and
functionality at the same time.

You know, you can actually read the explanation of all this in the
documentation if you care to do so.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 4 '06 #213

websnarf

Jordan Abel wrote:

On 2006-05-04, we******@gmail.com <we******@gmail.com> wrote:
Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
> CBFalconer wrote:
>> we******@gmail.com wrote:
>> > CBFalconer wrote:
>> ... snip ...
>> >> The last time I took an (admittedly cursory) look at Bstrlib, I
>> >> found it cursed with non-portabilities
>> >
>> > You perhaps would like to name one?
>>
>> I took another 2 minute look, and was immediately struck by the use
>> of int for sizes, rather than size_t. This limits reliably
>> available string length to 32767.

[snip]

>> [...] I did find an explanation and
>> justification for this. Conceded, such a size is probably adequate
>> for most usage, but the restriction is not present in standard C
>> strings.

> Your going to need to conceed on more grounds than that. There is a
> reason many UNIX systems tried to add a ssize_t type, and why TR 24731
> has added rsize_t to their extension. (As a side note, I strongly
> suspect the Microsoft, in fact, added this whole rsize_t thing to TR
> 24731 when they realized that Bstrlib, or things like it, actually has
> far better real world safety because its use of ints for string
> lengths.) Using a long would be incorrect since there are some systems
> where a long value can exceed a size_t value (and thus lead to falsely
> sized mallocs.) There is also the matter of trying to codify
> read-only and constant strings and detecting errors efficiently
> (negative lengths fit the bill.) Using ints is the best choice
> because at worst its giving up things (super-long strings) that nobody
> cares about,

I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.
Ok, so you can name a single application of such a thing right?
> it allows in an efficient way for all desirable encoding scenarios,
> and it avoids any wrap around anomolies causing under-allocations.

What anomalies? Are these a consequence of using signed long, or
size_t?

I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.

If it's longer than the maximum size_t value, you probably can't have it
anyway, so there's no point in being able to represent it.

Huh?
Silly encoding tricks buy you nothing, just use another field with bit
flags.

If I do that, I lose space, speed, and error detection. I see it as
buying me a whole hell of a lot actually.

> If I tried to use size_t I would give up a significant amount of
> safety and design features (or else I would have to put more entries
> into the header, making it less efficient).

If you only need a single "special" marker value (for which you were
perhaps using -1), you could consider using ~(size_t) 0.

For the mlen, I need one value that indicates a write protected string
(that can be unprotected) and one the indicates a constant (that can
never be unprotected). The slen has to be of the same type as mlen,
and so in order to check for potential errors, I set it to -1 to
indicate that it has been deterministically set to an invalid value.
Of course I could just isolate a handful of values, however, but this
makes the error space extremely small, which reduces your chances of
finding accidental full corruptions,

This shouldn't be left to chance anyway, pretending that it can be
caught invites disaster when inevitably one of the cases comes up when
it _doesn't_ get caught.

Uhh ... that's the situation we have with basically all other string
libraries in existence for C *today*. My library and TR 24731 are the
only ones to attempt to catch these errors *before* any undefined
scenario occurs. In practice this means that a greater percentage of
corruption errors are simply caught in your normal error handling.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 4 '06 #214

Ben C

On 2006-05-04, jacob navia <ja***@jacob.remcomp.fr> wrote:

Ben C a écrit :

Yes exactly, and AFAIK the kind of operator-overloading that has been
proposed for C is something like this-- it's fine for structs
representing things like complex numbers (that are a few words long and
don't contain pointers).

But this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

Why not?

Suppose Matrix A,B,C;

C = A+B;

Your operator + function would allocate the space, add the matrix to a
possible linked lists of matrices that allows to GC unused ones, and
return the results.

A reference to them presumably.

Yes indeed, if you have a garbage collector (and references) there is no
problem.

That's why I say operator-overloading works well in languages where
the framework manages storage for you (e.g. in Python, and apparently
lcc-extended C).

[snip]
Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for the
fools!
And what about left-shifting iostreams :)
If you feel that operator overloading would not solve the problem for
matrices addition, then you will have to devise other means of doing that.

The GC however, is an ELEGANT solution to all this problems. We would
have the easy of use of C++ with its automatic destructors, WITHOUT
PAYING THE PRICE in language and compiler complexity.

This last point is important: compiler complexity increases the effort
that the language implementor must do and increases the "bug surface".

Yes of course. Although I would say, why not leave poor C alone and
start a new language? Or just use a different language that already
exists... there are a lot out there.

I often get the feeling there's a lot of pain and complexity in C++ that
could have been avoided if it hadn't started out trying to be compatible
with C.

May 4 '06 #215

Ben C

On 2006-05-04, REH <sp******@stny.rr.com> wrote:

Ben C wrote:
Usually in practice this just means a lot of bugs and crashes; but there
are good reasons for it too: you can write domain-specific allocators
that are more efficient and/or tunable in the amount of space or time
they use, instead of relying on a general-purpose allocator or
garbage-collector all the time.
C++ does not do GC, nor are you required to use any "general-purpose"
allocator.
Yes I know. But you do get constructors, destructors and references, so
you can fit explicit memory management "under the hood" of operator
overloading.
The programmer also might implement things like shallow-copy and
copy-on-write.

Somehow all of these things need to happen when an expression like this
is evaluated:

string a = b + c;

In C++ the basic mechanism you use for this is constructors. For example
the string copy constructor might set up a shallow copy-on-write copy.
Someone has to write the code for that. If the programmer writes it, and
it's not just part of the framework, then it has to get implicitly
called.

Yes, the programmer can write a constructor to do this. He does not
have to.
I don't know of a way to do it without a constructor (for a
shallow-copied copy-on-write string class).

[snip] I still don't get your point.

Show me the string example, and hopefully either you will get my point
or I will get yours :)

May 4 '06 #216

On 2006-05-04, we******@gmail.com <we******@gmail.com> wrote:

Jordan Abel wrote:
On 2006-05-04, we******@gmail.com <we******@gmail.com> wrote:
> Ben C wrote:
>> On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
>> > CBFalconer wrote:
>> >> we******@gmail.com wrote:
>> >> > CBFalconer wrote:
>> >> ... snip ...
>> >> >> The last time I took an (admittedly cursory) look at Bstrlib, I
>> >> >> found it cursed with non-portabilities
>> >> >
>> >> > You perhaps would like to name one?
>> >>
>> >> I took another 2 minute look, and was immediately struck by the use
>> >> of int for sizes, rather than size_t. This limits reliably
>> >> available string length to 32767.
>>
>> [snip]
>>
>> >> [...] I did find an explanation and
>> >> justification for this. Conceded, such a size is probably adequate
>> >> for most usage, but the restriction is not present in standard C
>> >> strings.
>>
>> > Your going to need to conceed on more grounds than that. There is a
>> > reason many UNIX systems tried to add a ssize_t type, and why TR 24731
>> > has added rsize_t to their extension. (As a side note, I strongly
>> > suspect the Microsoft, in fact, added this whole rsize_t thing to TR
>> > 24731 when they realized that Bstrlib, or things like it, actually has
>> > far better real world safety because its use of ints for string
>> > lengths.) Using a long would be incorrect since there are some systems
>> > where a long value can exceed a size_t value (and thus lead to falsely
>> > sized mallocs.) There is also the matter of trying to codify
>> > read-only and constant strings and detecting errors efficiently
>> > (negative lengths fit the bill.) Using ints is the best choice
>> > because at worst its giving up things (super-long strings) that nobody
>> > cares about,
>>
>> I think it's fair to expect the possibility of super-long strings in a
>> general-purpose string library.
>
> Ok, so you can name a single application of such a thing right?
>
>> > it allows in an efficient way for all desirable encoding scenarios,
>> > and it avoids any wrap around anomolies causing under-allocations.
>>
>> What anomalies? Are these a consequence of using signed long, or
>> size_t?
>
> I am describing what int does (*BOTH* the encoding scenarios and
> avoiding anomolies), Using a long int would allow for arithmetic on
> numbers that exceed the maximum value of size_t on some systems (that
> actually *exist*), so when there was an attempt to malloc or realloc on
> such sizes, there would be a wrap around to some value that would just
> make it screw up. And if I used a size_t, then there would be no
> simple space of encodings that can catch errors, constants and write
> protected strings.
If it's longer than the maximum size_t value, you probably can't have it
anyway, so there's no point in being able to represent it.

Huh?

size_t has to be able to represent the size of any object. to have
a string longer than its maximum value you have to have an array of
characters longer than that maximum value - which you can't have.

Silly encoding tricks buy you nothing, just use another field with bit
flags.

If I do that, I lose space, speed, and error detection. I see it as
buying me a whole hell of a lot actually.

Space and speed are cheap these days.

Even if you have a million strings, that's still only four megabytes
saved. If you make a million calls, that's still only a few million
cycles saved.

It does _not_ buy you error detection in general, and a false sense of
safety can be dangerous.

Probably the best thing to do to prevent errors would be make everything
use your API, and make sure your functions don't have bugs. Once you
have that, the only possible source of errors is bit rot, and you can't
do anything about that.

May 4 '06 #217

On 2006-05-04, jacob navia <ja***@jacob.remcomp.fr> wrote:

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!

The addition operator on dates would work _exactly_ the same way as the
addition operator on pointers - you can subtract two of them, or add one
to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems where
it's not trivial, could call localtime, modify tm_secs, and then call
mktime]

May 4 '06 #218

Jordan Abel a écrit :

On 2006-05-04, jacob navia <ja***@jacob.remcomp.fr> wrote:
It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!

The addition operator on dates would work _exactly_ the same way as the
addition operator on pointers - you can subtract two of them, or add one
to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems where
it's not trivial, could call localtime, modify tm_secs, and then call
mktime]

Yes adding a number to a date makes sense, but I was speaking about
adding two dates!

May 4 '06 #219

we******@gmail.com wrote:

Flash Gordon wrote:
we******@gmail.com wrote:
Ben C wrote:
On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
> CBFalconer wrote:
>> we******@gmail.com wrote:
>>> CBFalconer wrote:
>> ... snip ...
>>>> The last time I took an (admittedly cursory) look at Bstrlib, I
>>>> found it cursed with non-portabilities
>>> You perhaps would like to name one?
>> I took another 2 minute look, and was immediately struck by the use
>> of int for sizes, rather than size_t. This limits reliably
>> available string length to 32767.
[snip]

>> [...] I did find an explanation and
>> justification for this. Conceded, such a size is probably adequate
>> for most usage, but the restriction is not present in standard C
>> strings.
> Your going to need to conceed on more grounds than that. There is a
> reason many UNIX systems tried to add a ssize_t type, and why TR 24731
> has added rsize_t to their extension. (As a side note, I strongly
> suspect the Microsoft, in fact, added this whole rsize_t thing to TR
> 24731 when they realized that Bstrlib, or things like it, actually has
> far better real world safety because its use of ints for string
> lengths.) Using a long would be incorrect since there are some systems
> where a long value can exceed a size_t value (and thus lead to falsely
> sized mallocs.) There is also the matter of trying to codify
> read-only and constant strings and detecting errors efficiently
> (negative lengths fit the bill.) Using ints is the best choice
> because at worst its giving up things (super-long strings) that nobody
> cares about,
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.
Ok, so you can name a single application of such a thing right?

Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.

So now name the platform where its *possible* to deal with this, but
where Bstrlib fails to be able to deal with them due to its design
choices.

If the DOS port hadn't been dropped then depending on the compiler we
might have hit this. A significant portion of the SW I'm thinking of
originated on DOS, so it could have hit it.

> it allows in an efficient way for all desirable encoding scenarios,
> and it avoids any wrap around anomolies causing under-allocations.
What anomalies? Are these a consequence of using signed long, or
size_t?
I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.

Is an extra byte (or word, or double word) for a flags field really that
big an overhead?

I need two *bits* for flags, and I want large ranges to catch errors in
the scalar fields (this is a *safe* library). An extra struct entry is
the wrong way to do this because it doesn't help my catch errors in the
scalar fields, and its space inefficient.

ssize_t would have been a reasonable *functional* choice, but its not
standard. size_t is no good because it can't go negative. long int is
no good because there are plenty of real platforms where long int is
larger than size_t. int solves all the main real problems, and as a
bonus the compiler is designed to make sure its the fastest scalar
primitive available.

Strangely enough, when a previous developer on the code I'm dealing with
thought he could limit size to a "valid" range an assert if it was out
of range we found that the asserts kept getting triggered. However, it
was always triggered incorrectly because the size was actually valid! So
I'll stick to not artificially limiting sizes. If the administrator of a
server the SW is installed on wants then s/he can use system specific
means to limit the size of a process.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Inviato da X-Privat.Org - Registrazione gratuita http://www.x-privat.org/join.php

May 4 '06 #220

we******@gmail.com wrote:

Flash Gordon wrote:
jacob navia wrote:
Flash Gordon a écrit :
Is an extra byte (or word, or double word) for a flags field really
that big an overhead?
Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision. I've yet to see software where short strings made up a significant
portion of the memory footprint and saving the memory that avoiding the
flags would be of real use. Of course, such applications might exist.

Any program that reads words from any language dictionary. Like a
spell checker, or a word puzzle solver/creator, or a spam filter. For
dictionaries the size of the english language dictionary, these kinds
of applications can typically push the L2 cache of your CPU pretty
hard.

I never said they didn't exist. However, a typical dictionary + the
structures is not going to fit in my L2 cache anyway. However, the
subset of it that is likely to be actually in use is probably an order
of magnitude smaller and so could easily fit in with the extra overhead.
Alternatively, one could go to conventional C strings and have a bigger
chance of it fitting since they only have a 1 byte overhead compared to
probably an 8 byte overhead (4 byte int for lenght, 4 byte int for
memory block size) that it sounds like your library has. Even if your
library only has a 4 byte overhead it is still larger!

Personally I would say that using negative lengths was asking for
problems because at some point a negative length will be checked without
first changing it to positive.

I think you miss the point. If the string length is negative then it
is erroneous. That's the point of it. But the amount of memory
allocated being negative, I use to indicate that the memory is not
legally modifiable at the moment, and being 0 meaning that its not
modifiable ever. The point being that the library blocks erroneous
action due to intentionally or unintentionally having bad header values
in the same test. So it reduces overhead, while increasing safety and
functionality at the same time.

If you are trying to detect corruption then you should also be checking
that the length is not longer than the memory block, so you should be
doing more than one comparison anyway. Then you can easily check if any
unused flag bits are non-0.
You know, you can actually read the explanation of all this in the
documentation if you care to do so.

Probably true.

It may well be that the performance gain is worth it for the
applications people use your library for. If so then fine, but the
limitation means it is not worth me migrating to it.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Inviato da X-Privat.Org - Registrazione gratuita http://www.x-privat.org/join.php

May 4 '06 #221

Ben Hinkle

>>> In "higher-level" languages

which are further abstracted from the implementation, it's attractive to
remove this distinction-- Python for example achieves this well. But I'm
not convinced of the wisdom of the hybrid, C with operator overloading.

I am certain that the conservative option just puts brakes to the
development of the language

I agree. You need brakes.

Having said that I'm all for trying these things out in projects like
lcc-win32.

I've been playing around with extending C to be more "high-level" using the
TinyCC compiler. The license of TinyCC is GPL and it runs on win32 and linux
so I think it makes a better base for experiments that one wants to
distribute. TinyCC is located at http://www.tinycc.org and, for those
curious, my experiments are at http://www.tinycx.org.

-Ben Hinkle

May 4 '06 #222

jacob navia wrote:

Jordan Abel a écrit :
On 2006-05-04, jacob navia <ja***@jacob.remcomp.fr> wrote:
It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!

The addition operator on dates would work _exactly_ the same way as
the addition operator on pointers - you can subtract two of them, or
add one to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems
where it's not trivial, could call localtime, modify tm_secs, and then
call mktime]

Yes adding a number to a date makes sense, but I was speaking about
adding two dates!

What will the date be in 4 years, 2 months and 5 days from today?

Adding something other than a number can make a lot of sense. Adding to
real dates doesn't I agree.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

May 4 '06 #223

On 2006-05-04, Flash Gordon <sp**@flash-gordon.me.uk> wrote:

jacob navia wrote:
Jordan Abel a écrit :
On 2006-05-04, jacob navia <ja***@jacob.remcomp.fr> wrote:

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!
The addition operator on dates would work _exactly_ the same way as
the addition operator on pointers - you can subtract two of them, or
add one to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems
where it's not trivial, could call localtime, modify tm_secs, and then
call mktime]

Yes adding a number to a date makes sense, but I was speaking about
adding two dates!

What will the date be in 4 years, 2 months and 5 days from today?

Adding something other than a number can make a lot of sense. Adding to
real dates doesn't I agree.

My first thought was "represent it as a number of seconds", but then
I realized - how many seconds in two months? Or in any number of years
not a multiple of four?

Should there be another type to represent intervals of time in such
a way? Or use struct tm? 0 for the year would mean either no years or
1900 depending on the context

May 4 '06 #224

Bill Pursell

jacob navia wrote:

Ben C a écrit (regarding operator overloading)

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

Why not?

Suppose Matrix A,B,C;

C = A+B;

<snip>
Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

Do you also propose to avoid using '*' to represent
matrix multiplication on the basis that matrix multiplication is not
commutative?

May 4 '06 #225

Chris Torek

>Ben C wrote:
[much snippage]

The programmer also might implement things like shallow-copy and
copy-on-write.

Somehow all of these things need to happen when an expression like this
is evaluated:

string a = b + c;
Or, comparably (but considerably more complicated):

String a = ((b + c) - "zog") * " ";

where string "addition" means "concatenate" (the usual definition
for string addition), "subtraction" means "remove the first copy
of the target string, if there is one", and string "multiplication"
means "repeatedly insert this string". Hence if b and c hold "xyz"
and "ogle" respectively, the "sum" is "xyzogle", subtracting "zog"
yields "xyle", and multiplying by " " yields "x y l e " (including
the final space).
In C++ the basic mechanism you use for this is constructors. For example
the string copy constructor might set up a shallow copy-on-write copy.
Indeed. Suppose the String data structure is much like Paul Hsieh's
favorite, but perhaps with a few more bells and whistles (I have not
looked at his implementation):

struct StringBuffer;

struct String {
char *bytes; /* the bytes (if any) in the string */
size_t slen; /* the length of the string */
struct StringBuffer *buf; /* the underlying buffer (may be shared) */
struct String *next; /* linked list in case of shared references */
};

struct StringBuffer {
char *base; /* base address of buffer */
size_t bufsize; /* size of buffer */
size_t refcnt; /* number of references to this buffer */
struct String *firstref; /* head of reference chain */
};

This gives us functions that, in C, might look like:

/* "Copy" a string: return a new reference to an existing string */
struct String *String_copy(struct String *old) {
struct String *new = xmalloc(sizeof *new);
/* xmalloc is just malloc plus panic-if-out-of-space */

/* copy the underlying string's info */
new->bytes = old->bytes;
new->slen = old->slen;
new->buf = old->buf;

/* insert into list, remembering new ref */
new->next = old;
new->buf->refcnt++;
new->buf->firstref = new;
}

In this case, making a second copy of a very long string is
quite cheap. So is making a sub-string out of an existing
string:

/*
* Shrink a string by removing "frontoff" charcters from the
* front, and "backoff" characters from the back. The frontoff
* may be negative to extend the string back to its original length
* although typically exactly one will be zero (remove head or
* tail part of string). The backoff must be nonnegative
* (because tail parts of buffers are not necessarily valid).
*/
void String_shrink(struct String *s, int frontoff, int backoff) {
if (frontoff) {
if (frontoff < 0) {
size_t maxshrink = s->bytes - s->buf->base;

/* NB: this can be optimized to fall into the "else" */
frontoff = -frontoff;
if (frontoff > maxshrink)
frontoff = maxshrink;
s->len += frontoff;
s->bytes -= frontoff;
} else {
if (s->slen < frontoff)
frontoff = s->len;
s->len -= frontoff;
s->bytes += frontoff;
}
}
if (backoff) {
if (backoff < 0)
panic("bad call to String_shrink");
if (s->len < backoff)
backoff = s->len;
s->len -= backoff;
}
}

Now, of course, in order to *modify* the *contents* of a string,
we have to check whether the string is shared, and if so, "break"
the sharing:

/* inline */ struct String *String_preptomod(struct String *s) {
return (s->buf->refcnt == 1) ? s : String_private_copy(s);
}

[without complicated C++ style mechanisms,] this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

On its own it's not enough; with the extra workarounds you need, you end
up with C++ (or some other kind of "octopus made by nailing four extra
legs onto a dog").

In article <11**********************@i40g2000cwc.googlegroups .com>
REH <sp******@stny.rr.com> wrote:I still don't get your point.

OK: so write the "operator" functions for +, -, and * above and
tell us what happens to any intermediate copies of the String
structures that are created by each addition, subtraction, and
multiply.

Show us the code, and "we" (Ben C and I, perhaps) will show you
where you have re-invented the (detailed and hairy) C++ mechanisms
(or, contrariwise, have assumed that your underlying language has
garbage collection, so that temporary objects can be created and
then thrown away without calling "constructor" and "destructor"
functions on references, reference-copies, etc.; if you do have
constructors and destructors, you also have to decide whether such
functions can or must be "virtual" or not, and so on).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

May 4 '06 #226

Bill Pursell a écrit :

jacob navia wrote:
Ben C a écrit (regarding operator overloading)
But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

Why not?

Suppose Matrix A,B,C;

C = A+B;

<snip>
Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

Do you also propose to avoid using '*' to represent
matrix multiplication on the basis that matrix multiplication is not
commutative?

Matrix multiplication is a multiplication. Granted, not commutative but
a multiplication. In other languages, for instance APL that has an
operator for matrix multiplication, there are TWO signs: one ('*') for
normal multiplication, and another (a box enclosing another sign) to
denote matrix multiplication to clearly distinguish both operations.

Of course this is a matter more of taste but in the case of strings
there isn't any mathematical operation performed in those strings. Not
even a set operation. Take for instance

"Hello World" - "World"

Is the result "Hello " ???

Are we adding or subtracting things?

Surely not.

I want to introduce operator overloading into C but I am not for ANY
application of operator overloading. It has been pointed out that
overloading could lead to excessive temporaries construction, that would
be far more efficiently handled in a normal C syntax with careful code.

This is not a problem for small structures, but it could be a show
stopper for large structures like matrices for instance, where
efficiency would be more important that syntactic sugar.
Another problem that bothers me (and is so far unsolved) is the problem
of taking the address of an operator function. What should be the syntax
in that case?

For instance:

int128 operator+(int128 a,int128 b);

typedef int128 (*i128add)(int128 a, int128 b);

i128add = operator+(i128 a,i128 b); /// This ?

jacob

May 4 '06 #227

Ed Jensen

CBFalconer <cb********@yahoo.com> wrote:

And, if you write the library in truly portable C, without any
silly extensions and/or entanglements, you just compile the library
module. All the compiler vendor need to do is meet the
specifications of the C standard.

Simple, huh?

That all depends on the license under which the source code was
released. Linking a bunch of C libraries under various licenses can
involve non-trivial amounts of legal hassle to ensure compliance.

Also, there's something to be said for having features built into the
standard library. Besides making things easier from a legal point of
view, it means you can spend that much less time evaluating multiple
solutions, since most of the time, you'll just use the implementation
already available in the standard library.

I know it's unpopular around these parts to utter such heresy, but I,
for one, would love it if the standard C library included support for
smarter strings, hash tables, and linked lists.

Then again, I'm certainly NOT advocating these things should be added
to the standard C library. I recognize C for what it is, and use it
where it's appropriate. There are other languages that offer those
features. But that doesn't stop me from wanting those features in C.

May 4 '06 #228

Keith Thompson

jacob navia <ja***@jacob.remcomp.fr> writes:
[...]

Besides, I think that using the addition operator to "add" strings is
an ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE.

<OT>
Assuming the existence of operator overloading (which, once again,
standard C, the topic of this newsgroup, does not have), using "+" for
string concatenation makes at least sense as much as using "<<" and
">>" for I/O. (I know you haven't advocated that either, but it's
established practice in C++.) And I really don't have much problem
with the idea of a "+" operator being non-commutative -- just as a "*"
operator for matrices would be non-commutative. If you don't like it,
by all means don't use it -- but if you provide operator overloading
in your compiler, users *will* use it in ways that you don't like.

The point of operator overloading is to provide a notational shorthand
for something that could be expressed equivalently but more verbosely
using function calls. It isn't to provide something that absolutely
must follow the rules of mathematics. What would a mathematician
unfamiliar with computer programming think of "x = x + 1"?
</OT>

<WAY_OT>
Ada has a separate operator, "&", for array concatenation.
</WAY_OT>

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

May 4 '06 #229

Chris Torek a écrit :
[horrible string "math" snipped]

OK: so write the "operator" functions for +, -, and * above and
tell us what happens to any intermediate copies of the String
structures that are created by each addition, subtraction, and
multiply.

Show us the code, and "we" (Ben C and I, perhaps) will show you
where you have re-invented the (detailed and hairy) C++ mechanisms
(or, contrariwise, have assumed that your underlying language has
garbage collection, so that temporary objects can be created and
then thrown away without calling "constructor" and "destructor"
functions on references, reference-copies, etc.; if you do have
constructors and destructors, you also have to decide whether such
functions can or must be "virtual" or not, and so on).

Chris:

1) Operator overloading is NOT a good application for strings, as I have
argued in another thread in this same discussion. lcc-win32 does NOT
support addition of strings nor any math operation with them.

a+b != b+a
"Hello" + "World != "World" + "Hello"

2) Operator overloading does NOT need any constructors, nor destructors
nor the GC if we use small objects:

int128 a,b,c,d;

a = (b+c)/(b-d);

This will be translated by lcc-win32 to

tmp1 = operator+(b,c);
tmp2 = operator-(b,d);
tmp3 = operator/(tmp1,tmp2);
a = tmp3;

The temporary values are automatically allocated in the stack.

Of course if you have interior pointers those intermediate structures
must be registered so that the storage can be freed. This is solved, as
you say, with a GC. lcc-win32 offers a GC in the standard distribution,
and allows to have the best of both worlds: the easy of C++ destructors
that take care of memory managemnt WITHOUT PAYING THE PRICE of C++
complexity.

If you do not want the GC, just make a linked list with all the
allocations you make in the "constructor" (say in the new_string()
function) and periodically clean them up.

May 4 '06 #230

Ian Collins

jacob navia wrote:

The crucial point in this is to know when to stop. There are NO
constructors/destructors in C, and none of the proposed extensions
proposes that.

Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE.

If C were to have a string type and operator overloading and it didn't
have '+' for strings, the first thing people would do is write one! It
may be syntactic sugar, but it's very convenient sugar.

--
Ian Collins.

May 4 '06 #231

In article <44***********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates.

mid_date = (start_date + end_date) / 2;

-- Richard

May 4 '06 #232

Richard Tobin a écrit :

In article <44***********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates.

mid_date = (start_date + end_date) / 2;

-- Richard

Excuse me but what does it mean

Sep-25-1981 + Dec-22-2000

If you figure out what THAT means then please explain.

You obviously meant:
mid_date = (end_date - start_date)/2

The *subtraction* of two dates yields a time interval

May 4 '06 #233

Ian Collins a écrit :

jacob navia wrote:
The crucial point in this is to know when to stop. There are NO
constructors/destructors in C, and none of the proposed extensions
proposes that.

Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE.

If C were to have a string type and operator overloading and it didn't
have '+' for strings, the first thing people would do is write one! It
may be syntactic sugar, but it's very convenient sugar.

Well, in this same thread Chris Torek posted this:

String a = ((b + c) - "zog") * " ";

where string "addition" means "concatenate" (the usual definition
for string addition), "subtraction" means "remove the first copy
of the target string, if there is one", and string "multiplication"
means "repeatedly insert this string". Hence if b and c hold "xyz"
and "ogle" respectively, the "sum" is "xyzogle", subtracting "zog"
yields "xyle", and multiplying by " " yields "x y l e " (including
the final space).

:-)

May 4 '06 #234

websnarf

Flash Gordon wrote:

we******@gmail.com wrote:
Flash Gordon wrote:
we******@gmail.com wrote:
Ben C wrote:
> On 2006-05-03, we******@gmail.com <we******@gmail.com> wrote:
>> CBFalconer wrote:
>>> we******@gmail.com wrote:
>>>> CBFalconer wrote:
>>> ... snip ...
>>>>> The last time I took an (admittedly cursory) look at Bstrlib, I
>>>>> found it cursed with non-portabilities
>>>> You perhaps would like to name one?
>>> I took another 2 minute look, and was immediately struck by the use
>>> of int for sizes, rather than size_t. This limits reliably
>>> available string length to 32767.
> [snip]
>
>>> [...] I did find an explanation and
>>> justification for this. Conceded, such a size is probably adequate
>>> for most usage, but the restriction is not present in standard C
>>> strings.
>> Your going to need to conceed on more grounds than that. There is a
>> reason many UNIX systems tried to add a ssize_t type, and why TR 24731
>> has added rsize_t to their extension. (As a side note, I strongly
>> suspect the Microsoft, in fact, added this whole rsize_t thing to TR
>> 24731 when they realized that Bstrlib, or things like it, actually has
>> far better real world safety because its use of ints for string
>> lengths.) Using a long would be incorrect since there are some systems
>> where a long value can exceed a size_t value (and thus lead to falsely
>> sized mallocs.) There is also the matter of trying to codify
>> read-only and constant strings and detecting errors efficiently
>> (negative lengths fit the bill.) Using ints is the best choice
>> because at worst its giving up things (super-long strings) that nobody
>> cares about,
> I think it's fair to expect the possibility of super-long strings in a
> general-purpose string library.
Ok, so you can name a single application of such a thing right?
Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.
So now name the platform where its *possible* to deal with this, but
where Bstrlib fails to be able to deal with them due to its design
choices.

If the DOS port hadn't been dropped then depending on the compiler we
might have hit this. A significant portion of the SW I'm thinking of
originated on DOS, so it could have hit it.

Oh ... I think of DOS as exactly the case where this *can't* happen.
Single objects in 16bit DOS have a size limit of 64K (size_t is just
unsigned which is 16 bits), so these huge RTF files you are talking
about *have* to be streamed, or split over multiple allocations
anyways.

>> it allows in an efficient way for all desirable encoding scenarios,
>> and it avoids any wrap around anomolies causing under-allocations.
> What anomalies? Are these a consequence of using signed long, or
> size_t?
I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.
Is an extra byte (or word, or double word) for a flags field really that
big an overhead?

I need two *bits* for flags, and I want large ranges to catch errors in
the scalar fields (this is a *safe* library). An extra struct entry is
the wrong way to do this because it doesn't help my catch errors in the
scalar fields, and its space inefficient.

ssize_t would have been a reasonable *functional* choice, but its not
standard. size_t is no good because it can't go negative. long int is
no good because there are plenty of real platforms where long int is
larger than size_t. int solves all the main real problems, and as a
bonus the compiler is designed to make sure its the fastest scalar
primitive available.

Strangely enough, when a previous developer on the code I'm dealing with
thought he could limit size to a "valid" range an assert if it was out
of range we found that the asserts kept getting triggered. However, it
was always triggered incorrectly because the size was actually valid!

And how is this connected with Bstrlib? The library comes with a test
that, if you run in a 16 bit environment, will exercise length
overflowing. So you have some reasonable assurance that Bstrlib does
not make obvious mistakes with size computations.
[...] So I'll stick to not artificially limiting sizes.
And how do you deal with the fact that the language limits your sizes
anyways?
[...] If the administrator of a
server the SW is installed on wants then s/he can use system specific
means to limit the size of a process.

What? You think the adminstrator is in charge of how the compiler
works?

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 4 '06 #235

In article <44**************@jacob.remcomp.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:

mid_date = (start_date + end_date) / 2;
Excuse me but what does it mean

Sep-25-1981 + Dec-22-2000
Just because the sum of two dates is not a date doesn't mean that
it doesn't mean anything.
You obviously meant:

mid_date = (end_date - start_date)/2
No I didn't. That is something completely different.
The *subtraction* of two dates yields a time interval

True, and (end_date - start_date) / 2 would give me half the interval
between the dates, but that is not what I wanted. I wanted the
average of the dates, which is a date.

(Sep-25-1981 + Dec-22-2000) / 2 would be the date mid-way between
Sep-25-1981 and Dec-22-2000, just as (45 + 78) / 2 is the integer
mid-way between 45 and 78.

-- Richard

May 4 '06 #236

Richard Tobin a écrit :

In article <44**************@jacob.remcomp.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:

mid_date = (start_date + end_date) / 2;

Excuse me but what does it mean

Sep-25-1981 + Dec-22-2000

Just because the sum of two dates is not a date doesn't mean that
it doesn't mean anything.

You obviously meant:

mid_date = (end_date - start_date)/2

No I didn't. That is something completely different.

The *subtraction* of two dates yields a time interval

True, and (end_date - start_date) / 2 would give me half the interval
between the dates, but that is not what I wanted. I wanted the
average of the dates, which is a date.

(Sep-25-1981 + Dec-22-2000) / 2 would be the date mid-way between
Sep-25-1981 and Dec-22-2000, just as (45 + 78) / 2 is the integer
mid-way between 45 and 78.

-- Richard

Ahh ok, you mean then

mid_date = startdate + (end_date-start_date)/2

A date + a time interval is a date later than the start date.

May 4 '06 #237

websnarf

Flash Gordon wrote:

we******@gmail.com wrote:
Flash Gordon wrote:
jacob navia wrote:
Flash Gordon a écrit :
> Is an extra byte (or word, or double word) for a flags field really
> that big an overhead?
Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision.
I've yet to see software where short strings made up a significant
portion of the memory footprint and saving the memory that avoiding the
flags would be of real use. Of course, such applications might exist.
Any program that reads words from any language dictionary. Like a
spell checker, or a word puzzle solver/creator, or a spam filter. For
dictionaries the size of the english language dictionary, these kinds
of applications can typically push the L2 cache of your CPU pretty
hard.

I never said they didn't exist.

I think the point is that there are *many* such application. In fact I
would be suspicious of anyone who claimed to be an experienced
programmer who hasn't *written* one of these.
[...] However, a typical dictionary + the
structures is not going to fit in my L2 cache anyway. However, the
subset of it that is likely to be actually in use is probably an order
of magnitude smaller and so could easily fit in with the extra overhead.
Its more *likely* if the data is compacted. Another way of saying
this, is that for any overflowing data set with a locality bias with
perform better monotonically with how well it fits in the cache. I.e.,
everything you save improves some percentage of performance.
Alternatively, one could go to conventional C strings and have a bigger
chance of it fitting since they only have a 1 byte overhead compared to
probably an 8 byte overhead (4 byte int for lenght, 4 byte int for
memory block size) that it sounds like your library has. Even if your
library only has a 4 byte overhead it is still larger!
Yes, but you eat a huge additional O(strlen) penality for very *many*
typical operations. So Bstrlib makes the trade off where the more
common scenarios are faster.

Personally I would say that using negative lengths was asking for
problems because at some point a negative length will be checked without
first changing it to positive.

I think you miss the point. If the string length is negative then it
is erroneous. That's the point of it. But the amount of memory
allocated being negative, I use to indicate that the memory is not
legally modifiable at the moment, and being 0 meaning that its not
modifiable ever. The point being that the library blocks erroneous
action due to intentionally or unintentionally having bad header values
in the same test. So it reduces overhead, while increasing safety and
functionality at the same time.

If you are trying to detect corruption then you should also be checking
that the length is not longer than the memory block, so you should be
doing more than one comparison anyway.

Yes, it does that as well. So you really are talking out of your ass.
This is in the first couple pages of the documentation, and strewn
throughout the source code.
[...] Then you can easily check if any unused flag bits are non-0.

Yes, this is an alternative -- but its less safe and slower, so why
would I do it this way?

You know, you can actually read the explanation of all this in the
documentation if you care to do so.

Probably true.

It may well be that the performance gain is worth it for the
applications people use your library for. If so then fine, but the
limitation means it is not worth me migrating to it.

Probably not true. But you won't look at it anyways, so I won't waste
my breath.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

May 4 '06 #238

In article <44***********************@news.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:

mid_date = (start_date + end_date) / 2;
Ahh ok, you mean then

mid_date = startdate + (end_date-start_date)/2

Your attitude is baffling. You deny that adding dates makes sense,
and when I post an example where adding dates makes perfect sense, you
respond by asserting that I mean some other expression that achieves
that same effect. The mere fact that you were able to post another
expression with the same meaning refutes your original claim.

-- Richard

May 4 '06 #239

In article <e3**********@pc-news.cogsci.ed.ac.uk>, I wrote:

Just because the sum of two dates is not a date doesn't mean that
it doesn't mean anything.

Just in case anyone has not noticed, this is really just a re-run of
pointer addition with dates instead of pointers.

The reason for not allowing (date|pointer) addition is not that it
doesn't make sense, but that the gain isn't worth the mechanism
required.

-- Richard

May 4 '06 #240