Increasing efficiency in C

jacob navia

As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

A more efficient representation is:

struct string {
size_t length;
char data[];
};

The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.

A string like the one described above is not able to
resize itself. Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.

A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.

struct string {
size_t length;
char *data;
};

There is no compelling reason to choose one or the other.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.

Syntactic sugar.

I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.

Not a big deal for today's compilers.

Length checked strings can then use:

String s;
....
s[2] = 'a';

I think I am proposing the obvious.

Do you agree?

jacob

Nov 14 '05 #1

Subscribe Post Reply

100

3510

Mike Wahler

"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:c2**********@news-reader4.wanadoo.fr...

The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.

Not a big deal for today's compilers.

Length checked strings can then use:

String s;
...
s[2] = 'a';

I think I am proposing the obvious.

I think you're proposing C++. Rather than try to 'reinvent' it,
I just use it.

-Mike

Nov 14 '05 #2

jacob navia

"Mike Wahler" <mk******@mkwahler.net> a écrit dans le message de
news:LM*******************@newsread2.news.pas.eart hlink.net...

"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:c2**********@news-reader4.wanadoo.fr...
I think you're proposing C++. Rather than try to 'reinvent' it,
I just use it.

Well I can't use it Mike.

Just too complex.

Default instantiated template traits?

No thanks. Just characters. What I like of C is that it is not
"object oriented".

It is not oriented at all. It is the programmer that puts
the orientation of the program.

C++ has good ideas, but the complexity of the whole is
so staggering, that actually it is a reminder where that leads,
not knowing when to stop.

The crux of the matter is knowing when to stop. When a feature
becomes a nuisance, and doesn't simplify the task it is better
to drop it.

Syntactic sugar can lead to caries in the teeths. I said I like
a *bit* of sugar. Not five spoonfuls you see?

I want a bit of sugar in my coffee and not some coffee in my
sugar.

jacob

Nov 14 '05 #3

Mike Wahler

"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:c2**********@news-reader1.wanadoo.fr...

"Mike Wahler" <mk******@mkwahler.net> a écrit dans le message de
news:LM*******************@newsread2.news.pas.eart hlink.net...
"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:c2**********@news-reader4.wanadoo.fr...
I think you're proposing C++. Rather than try to 'reinvent' it,
I just use it.
Well I can't use it Mike.

Whatever.
Just too complex.
You needn't use all of it. You seem to be wanting
a 'real' string type, which C++ has, and it's not
at all difficult to use. Actually, if such a type
is needed, imo that's a good enough reason to use C++
(even if everything else is only the common subset
of the two languages).
Default instantiated template traits?
So don't use 'em.
No thanks. Just characters. What I like of C is that it is not
"object oriented".
C++ does not require OO design. This is a very common misconception.
It is not oriented at all. It is the programmer that puts
the orientation of the program.
Right. Which is why I find C++ useful for very many things.
(and C as well, and other languages too).
C++ has good ideas, but the complexity of the whole is
so staggering, that actually it is a reminder where that leads,
not knowing when to stop.
I use the parts I find useful, discard the rest.
The crux of the matter is knowing when to stop. When a feature
becomes a nuisance, and doesn't simplify the task it is better
to drop it.
Or ignore it. Simple, huh?
Syntactic sugar can lead to caries in the teeths. I said I like
a *bit* of sugar. Not five spoonfuls you see?
Ever hear of self-discipline? Anything can be abused.
I want a bit of sugar in my coffee and not some coffee in my
sugar.

Who's got control of the spoon, you or someone else? :-)

I'll stop now, I don't want to be accused of language
advocacy in a group about a different language. (It's
probably too late, though :-))

-Mike

Nov 14 '05 #4

Charles Harrison Caudill

jacob navia <ja***@jacob.remcomp.fr> wrote:

As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings. This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.
Correction, strcmp is inefficient, I'd like to see someone produce a more
efficient model for strings using even assembly
A more efficient representation is: struct string {
size_t length;
char data[];
};
I'm not really sure what the question is, you seem to have asked and answered
all in one...
The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.
foo[x] is the value held in the array foo at location x, not string foo at
location x. strings don't even exist in c, just memory.
Length checked strings can then use: String s;
...
s[2] = 'a'; I think I am proposing the obvious.

See C++

--
Harrison Caudill | .^ www.hypersphere.org
Computer Science & Physics Double Major | | Me*Me=1
Georgia Institute of Technology | v' I'm just a normal guy

Nov 14 '05 #5

Leor Zolman

On Wed, 3 Mar 2004 23:07:25 +0100, "jacob navia" <ja***@jacob.remcomp.fr>
wrote:

As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.
zero terminated, anyway.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.
It is "extremely inefficient" only if you're continuously recalculating the
length. For applications where you're not, it is extremely efficient.

A more efficient representation is:

struct string {
size_t length;
char data[];
};
What is that [] about? That's not a legal definition. Are you implying that
a fixed-length array implementation (with an actual size in there) is an
improvement in any significant way over a simple char *? I don't think so.
The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.
Now it is starting to sound like Java.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.
Perhaps a bit; but on average, inequality is determined pretty quick the
conventional way, and equality would actually take /more/ time to
determine. But yes, you might net a teeny bit of an improvement.

A string like the one described above is not able to
resize itself.
Are we talking about the one with the fixed-length array, or the version
with the mysterious empty brackets? Either way, /nothing/ in C can "resize
itself"...
Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.

A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.

struct string {
size_t length;
char *data;
};
This is the classic first C++ class implementation exercise. Thinking about
it yields some good fundamental principles about class design. But, to
achieve a true performance benefit in a string service, it ultimately
requires tailoring the string implementation to the specific circumstances
in which it will be used. There's no magic bullet; the "irrational
exuberance" surrounding the rise-and-fall of reference counted Standard C++
string implementations is a case in point.

There is no compelling reason to choose one or the other.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.

Syntactic sugar.

I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.

Not a big deal for today's compilers.
Might be a bigger deal for the C Standards committee ;-)

Length checked strings can then use:

String s;
...
s[2] = 'a';

I think I am proposing the obvious.

Do you agree?
Mike's right: use C++.
-leor

jacob

Leor Zolman
BD Software
le**@bdsoft.com
www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at www.bdsoft.com/tools/stlfilt.html

Nov 14 '05 #6

Leor Zolman

On Wed, 03 Mar 2004 22:54:20 GMT, "Mike Wahler" <mk******@mkwahler.net>
wrote:

"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:c2**********@news-reader1.wanadoo.fr...

"Mike Wahler" <mk******@mkwahler.net> a écrit dans le message de
news:LM*******************@newsread2.news.pas.eart hlink.net...
> "jacob navia" <ja***@jacob.remcomp.fr> wrote in message
> news:c2**********@news-reader4.wanadoo.fr...
>
>
> I think you're proposing C++. Rather than try to 'reinvent' it,
> I just use it.

Well I can't use it Mike.

Whatever.
Just too complex.

You needn't use all of it. You seem to be wanting
a 'real' string type, which C++ has, and it's not
at all difficult to use. Actually, if such a type
is needed, imo that's a good enough reason to use C++
(even if everything else is only the common subset
of the two languages).

Jaocb: There's even a common term used to describe C++ when used only for
those features that are a direct "clean-up" of messiness left over from C's
need to be backward-compatible with earlier incarnations of itself: "A
Better C". I don't think the C++ string class is typically lumped in with
those features (which I'm reluctant to go into here since this isn't a C++
group), but I'd think some more about it before discarding the idea of
using C++ as "C plus strings." I even wrote a CUJ article to help folks
with this very issue:
http://www.bdsoft.com/resources/thinking.html
(The title has "STL" in it, but the article is really about migration of
char *-based code to using strings)
-leor

Leor Zolman
BD Software
le**@bdsoft.com
www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at www.bdsoft.com/tools/stlfilt.html

Nov 14 '05 #7

jacob navia

"Leor Zolman" <le**@bdsoft.com> a écrit dans le message de
news:7t********************************@4ax.com...

On Wed, 3 Mar 2004 23:07:25 +0100, "jacob navia" <ja***@jacob.remcomp.fr>
wrote:
As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.
zero terminated, anyway.

Yes Sir!
Zero terminated and surely NOT zero delimited. What a deep
difference :-)

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.
It is "extremely inefficient" only if you're continuously recalculating

the length.
Obviously. And this is a very common use, haven't you
notice it?

For applications where you're not, it is extremely efficient.
Sorry but this string was once constructed, and the
length was known. Why not keeping this information?

What about the security?

What about the failure modes of unbounded pointers,

A more efficient representation is:

struct string {
size_t length;
char data[];
};
What is that [] about? That's not a legal definition.

C99 introduces variable length arrays. This is standard
notation.

Are you implying that
a fixed-length array implementation (with an actual size in there) is an
improvement in any significant way over a simple char *?
Yes.

1 Length operation is trivial
2 Comparisons for equality are cheaper when the length
of the strings differ. You never know this in C strings
and you have to start scanning for that zero...
3 Bounds checked strings can be implemented.
I don't think so.
Well. I think so for the reasons above. Can you
maybe go to those reasons in detail?

The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.

Now it is starting to sound like Java.

In matters of languages I do not despise any. I am
sorry, I like C but I am not a zealot, and see
C's problems and weakness. A bad string type
is the reason for many bugs we could really get
rid of.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.

Perhaps a bit; but on average, inequality is determined pretty quick the
conventional way, and equality would actually take /more/ time to
determine. But yes, you might net a teeny bit of an improvement.

And also net a big security improvement...

A string like the one described above is not able to
resize itself.

Are we talking about the one with the fixed-length array, or the version
with the mysterious empty brackets? Either way, /nothing/ in C can "resize
itself"...

Sorry, I thought realloc was part of C...
This is the classic first C++ class implementation exercise. Thinking about it yields some good fundamental principles about class design.
Maybe but I do not want any class design. There are no classes
in C. I want strings for holding text. As I said, no
default instantiated template traits. Just chars please.
But, to
achieve a true performance benefit in a string service, it ultimately
requires tailoring the string implementation to the specific circumstances
in which it will be used. There's no magic bullet; the "irrational
exuberance" surrounding the rise-and-fall of reference counted Standard C++ string implementations is a case in point.

Yes, each application has its own needs. That's why I would
propose that the user writes many specialized string
structures, that share a common description.

Length delimited strings are infinitely extensible with other
features.
Mike's right: use C++.

I answered that to Mike. See my answer in a parallel thread.
I think C is the last not object oriented language around.
That makes it very interesting.

jacob
http://www.cs.virginia.edu/~lcc-win32

Nov 14 '05 #8

Leor Zolman

On Thu, 4 Mar 2004 01:19:26 +0100, "jacob navia" <ja***@jacob.remcomp.fr>
wrote:

"Leor Zolman" <le**@bdsoft.com> a écrit dans le message de
news:7t********************************@4ax.com.. .
On Wed, 3 Mar 2004 23:07:25 +0100, "jacob navia" <ja***@jacob.remcomp.fr>
wrote:
>As everybody knows, C uses a zero delimited unbounded
>pointer for its representation of strings.
zero terminated, anyway.

Yes Sir!
Zero terminated and surely NOT zero delimited. What a deep
difference :-)

I think of delimiters as a matched set, terminated as asymmetric. Just
seemed off to use it there, but yes, I'm sure everyone knew what you meant.

>
>This is extremely inefficient because at each query of the
>length of the string, the computer starts an unbounded
>memory scan searching for a zero that ends the string.
It is "extremely inefficient" only if you're continuously recalculating

the
length.

Obviously. And this is a very common use, haven't you
notice it?

Knowing something about how the strings are going to be used is precisely
what drives the design decision of which flavor to use. When there's going
to be a lot of repeated length testing, that fact may contribute to a
decision against my using plain old char *'s /in that application/.

For applications where you're not, it is extremely efficient.
Sorry but this string was once constructed, and the
length was known. Why not keeping this information?

Keeping and maintaining it has a spacial and temporal cost. Is it always
justified? Sometimes, probably. Usually? Always?

What about the security?

What about the failure modes of unbounded pointers,
C doesn't provide any automatic protection for these things. The spirit of
C is to let the programmer program them if they're needed. Period.

>
>A more efficient representation is:
>
>struct string {
> size_t length;
> char data[];
>};
What is that [] about? That's not a legal definition.

C99 introduces variable length arrays. This is standard
notation.

Darn, I'm really going to actually have to write a piece of code using VLAs
some day, so I can at least recognize them when they get used (blush). But
the problem is, I don't like them ;-)

Are you implying that
a fixed-length array implementation (with an actual size in there) is an
improvement in any significant way over a simple char *?
Yes.

1 Length operation is trivial
2 Comparisons for equality are cheaper when the length
of the strings differ. You never know this in C strings
and you have to start scanning for that zero...

....or the first mismatch. If you happen to know that enough of your strings
will be identical for their first several characters /and/ be of different
lengths for this to make a significant difference, you'd have good reason
to use your implementation /in that application/.
3 Bounds checked strings can be implemented.
They can, but lots of things /can/ be implemented, it is just that C has no
pretense of supporting such things at the core language level. Neither does
C++, for that matter.

I don't think so.
Well. I think so for the reasons above. Can you
maybe go to those reasons in detail?

I'm not compelled to, no.

>
>The length operation becomes just a memory read.
>This would considerably speed the programs. The basic
>idea is to use a string type that is length prefixed and
>allows run-time checking against UB: undefined
>behavior.
Now it is starting to sound like Java.

In matters of languages I do not despise any. I am
sorry, I like C but I am not a zealot, and see
C's problems and weakness. A bad string type
is the reason for many bugs we could really get
rid of.

Nor do I despise Java (I've even written an article, still available on
line somewhere, outlining why I believe Java makes a great "first"
programming language.) But hand-holding features are just /not/ in C's job
description, I'm sorry.

>
>Comparing strings is speeded up also because when
>testing for equality, the first length comparison tells
>maybe the whole story with just a couple of
>memory reads.
Perhaps a bit; but on average, inequality is determined pretty quick the
conventional way, and equality would actually take /more/ time to
determine. But yes, you might net a teeny bit of an improvement.

And also net a big security improvement...

Which you may or may not want to pay for.

>
>A string like the one described above is not able to
>resize itself.
Are we talking about the one with the fixed-length array, or the version
with the mysterious empty brackets? Either way, /nothing/ in C can "resize
itself"...

Sorry, I thought realloc was part of C...

What I'm saying is that nothing "resizes itself", there has to be user code
to recognize the need, dispatch to the appropriate functions, etc. At any
given point in a design, a C programmer can choose whether or not to do
that stuff. She may choose not to, for reasons that make all the sense in
world for that application. She may not want that overhead forced upon her.

This is the classic first C++ class implementation exercise. Thinkingabout
it yields some good fundamental principles about class design.

Maybe but I do not want any class design. There are no classes
in C. I want strings for holding text. As I said, no
default instantiated template traits. Just chars please.

I'm not trying to force class design down your throat, I'm just saying
that "black-box" string management is always going to be either prejudicial
to some quality of the data being operated upon, or middle-of-the road and
thus probably not optimal for /your/ situation, whatever that may be; it
can't be sufficiently general-purpose and really efficient for some special
case...

But, to
achieve a true performance benefit in a string service, it ultimately
requires tailoring the string implementation to the specific circumstances
in which it will be used. There's no magic bullet; the "irrational
exuberance" surrounding the rise-and-fall of reference counted Standard

C++
string implementations is a case in point.

Yes, each application has its own needs. That's why I would
propose that the user writes many specialized string
structures, that share a common description.

Length delimited strings are infinitely extensible with other
features.
Mike's right: use C++.

I answered that to Mike. See my answer in a parallel thread.
I think C is the last not object oriented language around.
That makes it very interesting.

jacob
http://www.cs.virginia.edu/~lcc-win32

Okay. Good luck in your quest,
-leor

Leor Zolman
BD Software
le**@bdsoft.com
www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at www.bdsoft.com/tools/stlfilt.html

Nov 14 '05 #9

Mike Wahler

"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:c2**********@news-reader4.wanadoo.fr...

Are we talking about the one with the fixed-length array, or the version
with the mysterious empty brackets? Either way, /nothing/ in C can "resize itself"...

Sorry, I thought realloc was part of C...

Ever notice that the return value from 'realloc()' is
often not the same as returned by the original 'malloc()' or
'calloc()'? 'realloc()' often (in my experience almost
always, except in the cases of 'trivial' size increases)
does a new allocation followed by a copy and a deallocation.
Not really a 'resizing'.
-Mike

Nov 14 '05 #10

Arthur J. O'Dwyer

On Wed, 3 Mar 2004, jacob navia wrote:

As everybody knows, C uses a [null-terminated string model]
for its representation of strings. A more efficient representation is: <snip> struct string {
size_t length;
char *data;
};

There is no compelling reason to choose one or the other.
Except the aforementioned efficiency reasons, of course. ;-)
So far, so good; but here you start to go downhill.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.
Except for the fact that nobody in the world would accept that
kind of library bloat in C0x. I don't even like the dozens of
transcendental and date-manipulation functions in C90. :-D Besides,
if there's one thing a "simple" language *doesn't* need, it's two
different and incompatible implementations of one fundamental
concept (i.e., "string").
Syntactic sugar.
Not if it's accomplished by making the programmer do the bookkeeping
on those new library functions, it's not. I don't want to have to
remember what kind of string I'm using! Let the computer do it!
(This is one of the reasons I hate Java's library model: I don't care
whether I'm using a TreeFoo or a ListFoo or a HashFoo, I just want
some kind of Foo. They're all equally capable; why should I be
forced to keep books on which one I'm using at the moment?)
The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.
I.e., you're proposing to take a C++ compiler and strip it of
most of the goodies. Okay, but that won't be the C language any
more. Consider simplicity and orthogonality: you really think you
should be able to re-define the semantics of [] but not * or ->?
That's foolishness waiting to happen.
Do you agree?

Of course not!

But remember where I said "so far, so good"? What you *should*
have concluded, there, was that since the "Pascal-style" string
model is so superior to the "C-style" string model, that wouldn't
it be neat if somebody implemented a C compiler that used the
Pascal model internally?!
That is, the *programmer* would still see a completely conforming
standard C implementation; but when he writes

a[i] = strlen(a);

where a typical C compiler would assemble the equivalent of
[completely untested and not-real code, but you get the point:]

; strlen(a)
MOV AX, 0
L1: MOV BX, a[AX]
INC AX
JNZ L1
SUB AX, a
DEC AX
; a[i]
MOV BX, a
ADD BX, i
; =
MOV [BX], AX

this hypothetical "Pascal-style" implementation could do the
much faster

; strlen(a)
MOV AX, [internal_a]
; a[i]
MOV BX, [internal_a+4]
ADD BX, i
; =
MOV [BX], AX

Of course, this would require a *lot* of thinking-out ahead of
time, and a *lot* of compiler support (including clever workarounds
for users' trying to memcpy() over strings, or storing strings in
unions, or simply using 'malloc' in creative ways)... but it would
be really neat IMHO if you could get it to work.
It would certainly be a bigger and more widely interesting challenge
than simply re-implementing half of C++.

-Arthur

Nov 14 '05 #11

Chris Torek

In article <news:c2**********@news-reader4.wanadoo.fr>
jacob navia <ja***@jacob.remcomp.fr> writes:

In matters of languages I do not despise any. I am
sorry, I like C but I am not a zealot, and see
C's problems and weakness. A bad string type
is the reason for many bugs we could really get
rid of.

I do not think C's "string" data format is necessarily "bad", merely
"limited". The counted-strings other languages have used have their
own advantages and drawbacks.

Perhaps the biggest problem (as it were) with C is that it provides
such a limited built-in syntax for *generating* these anonymous
arrays. Anything inside double quotes is, ignoring the exception
for initializers, one of these special anonymous arrays. In C99
we at least can create anonymous "struct"s:

struct counted_string { size_t len; char *data; };
...
func((struct counted_string){sizeof "foo" - 1, "foo"});

Of course, as shown here, you cannot even use flexible array members
(initializing a variant of counted_string with "char data[]" instead
of "char *data" appears to be invalid). It might be nice if one
could get the above without resorting to preprocessor tricks.

(On the other hand, if you want Lisp, you know where to find it. :-) )
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 14 '05 #12

those who know me have no need of my name

in comp.lang.c i read:

since the "Pascal-style" string
model is so superior to the "C-style" string model, that wouldn't
it be neat if somebody implemented a C compiler that used the
Pascal model internally?!
That is, the *programmer* would still see a completely conforming
standard C implementation; Of course, this would require a *lot* of thinking-out ahead of
time, and a *lot* of compiler support (including clever workarounds
for users' trying to memcpy() over strings, or storing strings in
unions, or simply using 'malloc' in creative ways)...

or even the bog simple: a[strlen(a)-2] = 0;

--
a signature

Nov 14 '05 #13

Old Wolf

> As everybody knows, C uses a zero delimited unbounded

pointer for its representation of strings.
Pointers cannot be bounded or unbounded, as such.
String constants are represented by an array. C arrays
are bounded (but it is not required for these bounds
to ever be checked).
C library functions expect a pointer into an array
or some other piece of allocated memory, which contains
a zero-terminator. Calling a C library function with
anything else is a programming error.
This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.
It is inefficient to repeatedly calculate the length of the
string, as you say. However most of us are clever enough to
not do that.

A more efficient representation is:

struct string {
size_t length;
char data[];
};
Efficient? Apart from requiring C99 (harder to get a compiler
for than C++) , it uses a lot more memory and is slower at
runtime because you have to do extra checks and updates every
time you modify the string. Also it cannot be declared
statically or allocated using malloc (if my understanding of
C99 VLA is correct). This is not my idea of efficient.

If you meant "char data[N]" for some N, then it wastes
even more memory and limits your string size.
The length operation becomes just a memory read.
This would considerably speed the programs.
Programs which check length a lot and do not do much else
with the string, possibly. This is a small subset of programs.
The basic idea is to use a string type that is length
prefixed and allows run-time checking against UB: undefined
behavior.
How slow and inefficient. I prefer to prevent UB by coding
correctly. Also, how do you propose to check against UB
this way. Anyone can introduce UB by modifying the string
directly, unless you also enforce all modifications to it
to be via a library. (how slow).
Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.
And if they were equal length you have wasted a comparison.
If your application is the sort where you test string equality
so much that it is important, you would probably introduce
some other method of equality checking (eg. a hash).

Most instances of "comparing strings" are actually interested
in which one comes first alphabetically, for which your
counted string is slower.
A string like the one described above is not able to
resize itself. Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.
This seems to be a problem with any string representation
(including C's builtin one)
A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.

struct string {
size_t length;
char *data;
};
How does this allow resizing without invalidating pointers?
I presume you have something in mind like this:

char *ptr = str.data + 6;
resize_string(&str);
/* keep using ptr... */

To make a string bigger you have to get new memory (There is no
portable way of getting more memory at the same location because,
as you pointed out before, there might be something else already
using the desired memory).
There is no compelling reason to choose one or the other.
It depends on the application.
Congratulations, some sense. Wisely, the standard library
does not make either choice, leaving it up to the programmer
to do what is most efficient for his/her program.
In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.
What bloat. The C library is big enough as it is.
Syntactic sugar.

I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.

Not a big deal for today's compilers.

Length checked strings can then use:

String s;
...
s[2] = 'a';

I think I am proposing the obvious.
I think you are proposing bounds-checked arrays. The standard
allows you to implement this already. Why not go and do it
and then advertise your compiler as supporting bounds checking.
In fact, why not write a library as you have proposed, and package
it with your compiler. Then people can choose if it suits them or not.
Do you agree?

Not really, no. You seem to be making big assumptions about
the rest of the world's programming requirements.

Nov 14 '05 #14

Richard Bos

"jacob navia" <ja***@jacob.remcomp.fr> wrote:

As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

A more efficient representation is:

struct string {
size_t length;
char data[];
};
Possibly efficient in time, for certain programs, but almost certainly
inefficient in storage. The problem is two-pronged: either your strings
are large relative to your size_t, or they aren't. In the first case,
you will, sooner or later, run into a string that simply won't fit
inside a size_t.
Ok, so you need a size_t that is large enough to contain every possible
string length on the system, large enough to address all of your memory.
But in that case, you'll hit the second case: now, you're using four
bytes, maybe eight, to address a single string, AOT C's one-byte null
terminator. Not a problem, perhaps, if all your strings are dozens of
kilobytes, but most applications seem to use lots and lots of small
strings, every single one potentially costing you seven bytes extra, and
only a few large ones.
You might get away with this on systems where sizeof(size_t) ==
sizeof('\0'). IOW, an embedded device, most likely. But how many strings
does a microwave oven need, anyway? The only application I can think of
that makes this worthwhile is a mobile phone, where you may need quite a
few strings, most of them static.
The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.
How? What if I want a 30-char array, of which the first 20 to 29 chars
contain a string, and the last char contains a checksum? Modifying the
checksum wouldn't be undefined behaviour at all, but it would write
beyond data[length-1].
Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.
Erm... no. No, it doesn't at all. Because, you see, "aaa" < "zzzzz", but
"zzz" < "aaaaa". Length has _nothing_ to do with the lexicographical
ordering of strings, except when you've already compared the contents of
the strings and found them identical up to the length of the shortest.
In fact, I think you'll be hard pushed indeed to beat the efficiency of

strcmp(const char *s1, const char *s2)
{
while (*s1==*s2 && *s1) { s1++; s2++; }
return *s1-*s2;
}

using your length-indicated string.
Syntactic sugar.
Syntactic sugar causes cancer of the semi-colon.
I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.
YM bitter, I suspect. And that shows that truly civilised people drink
tea, without sugar, and program in C, without ++ :-)
Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

No, they couldn't. Puzzle for you: how would you extend the
interoperation between strings and char pointers, using your
length-strings? You can't point a char * inside one, because if you do,
you've lost track of its length and there isn't a null terminator to
help you find it...
And of course, without pointer arithmetic, array indexing is impossible
in the current Standard. You'd need to define an explicit exception for
your length-strings.

Richard

Nov 14 '05 #15

Richard Bos

Leor Zolman <le**@bdsoft.com> wrote:

Jaocb: There's even a common term used to describe C++ when used only for
those features that are a direct "clean-up" of messiness left over from C's
need to be backward-compatible with earlier incarnations of itself: "A
Better C".

Common, and used, by whom? C++ programmers, I suspect. _My_ term for
such a hybrid monstrum would be "A Bloody Mess".

If you want C++, be a man and program in C++. Don't go pretend that
you're almost using C.

Richard

Nov 14 '05 #16

Dik T. Winter

In article <Pi***********************************@unix49.andr ew.cmu.edu> "Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:
....

But remember where I said "so far, so good"? What you *should*
have concluded, there, was that since the "Pascal-style" string
model is so superior to the "C-style" string model, that wouldn't
it be neat if somebody implemented a C compiler that used the
Pascal model internally?!

Eh? The "Pascal-style"? In the only Pascal I ever used (on the CDC
Cyber, the original Pascal from ETH Zuerich), a string was implemented
as a sequence of characters, and that was it. Nearly the same as in C,
except that there was no terminator.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Nov 14 '05 #17

jacob navia

"Richard Bos" <rl*@hoekstra-uitgeverij.nl> a écrit dans le message de
news:40****************@news.individual.net...

"jacob navia" <ja***@jacob.remcomp.fr> wrote:
As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

A more efficient representation is:

struct string {
size_t length;
char data[];
};
Possibly efficient in time, for certain programs, but almost certainly
inefficient in storage. The problem is two-pronged: either your strings
are large relative to your size_t, or they aren't. In the first case,
you will, sooner or later, run into a string that simply won't fit
inside a size_t.

In lcc-win32 a size_t can contain up to 4GB strings.
I think that searching for the terminating zero in a 1GB string would
not be very efficient anyway... :-)

Ok, so you need a size_t that is large enough to contain every possible
string length on the system, large enough to address all of your memory.
Yes, normally size_t would do it.
But in that case, you'll hit the second case: now, you're using four
bytes, maybe eight, to address a single string, AOT C's one-byte null
terminator. Not a problem, perhaps, if all your strings are dozens of
kilobytes, but most applications seem to use lots and lots of small
strings, every single one potentially costing you seven bytes extra, and
only a few large ones.
There is no free lunch.
Yes, it will cost at least size_t bytes more than a zero terminated string.
So what?
Many applications can afford this.

The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.

How? What if I want a 30-char array, of which the first 20 to 29 chars
contain a string, and the last char contains a checksum? Modifying the
checksum wouldn't be undefined behaviour at all, but it would write
beyond data[length-1].

You can implement check sums in my schema. Anyway, I am proposing
string handling, where normal zero terminated strings are assumed.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.

Erm... no. No, it doesn't at all. Because, you see, "aaa" < "zzzzz", but
"zzz" < "aaaaa". Length has _nothing_ to do with the lexicographical
ordering of strings, except when you've already compared the contents of
the strings and found them identical up to the length of the shortest.

Please read carefully. I said
"When comparing strings for equality"

and not strcmp !!!

strcmp gives as a result a lexicographical ordering. This is NOT
NEEDED when I just want to know if a == b.

Syntactic sugar.

Syntactic sugar causes cancer of the semi-colon.

In great doses YES. (See my answer as to why I do not use C++ in a
parallel thread)

I small doses sugar is useful.

It comes to the dosage you see?

It is a question of knowing when to stop.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

No, they couldn't. Puzzle for you: how would you extend the
interoperation between strings and char pointers, using your
length-strings?

I have started a library in lcc-win32 that makes exactly that.
It can be done.

You can't point a char * inside one, because if you do,
you've lost track of its length and there isn't a null terminator to
help you find it...
All the strings are null terminated to keep interoperability
with existing code.
And of course, without pointer arithmetic, array indexing is impossible
in the current Standard. You'd need to define an explicit exception for
your length-strings.

Yes. I have done that.

jacob

Nov 14 '05 #18

Tor Husabø

Dik T. Winter wrote:

In article <Pi***********************************@unix49.andr ew.cmu.edu> "Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:
...
> But remember where I said "so far, so good"? What you *should*
> have concluded, there, was that since the "Pascal-style" string
> model is so superior to the "C-style" string model, that wouldn't
> it be neat if somebody implemented a C compiler that used the
> Pascal model internally?!

Eh? The "Pascal-style"? In the only Pascal I ever used (on the CDC
Cyber, the original Pascal from ETH Zuerich), a string was implemented
as a sequence of characters, and that was it. Nearly the same as in C,
except that there was no terminator.

Meaning you had to keep track of the length in a separate variable?
Sounds unlikely...

Nov 14 '05 #19

jacob navia

"Old Wolf" <ol*****@inspire.net.nz> a écrit dans le message de
news:84**************************@posting.google.c om...

As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.
Pointers cannot be bounded or unbounded, as such.
String constants are represented by an array. C arrays
are bounded (but it is not required for these bounds
to ever be checked).

A bounded pointer has size/limits information associated with it. For
example:

fread(buffer,1,100,file);

the buffer pointer is implicitely bounded by the 1,100 arguments.

Or

int process(int datalength, char *data);

C library functions expect a pointer into an array
or some other piece of allocated memory, which contains
a zero-terminator. Calling a C library function with
anything else is a programming error.
Yes. I do not discuss that this is an error. My proposition goes
into avoiding it.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

It is inefficient to repeatedly calculate the length of the
string, as you say. However most of us are clever enough to
not do that.

Then you keep the length and the string in separate variables.
You have to name each string length that you use more than
once, and never mix them up. This is very error prone and
doesn't fit into structured programming.

Why do we use:
struct s{
int age;
bool sex;
char *Name;
} Employee;

instead of
int ageEmployee1, int sexEmployee1, char *NameEmployee1???
Having a structure easies the way you program and avoids errors!

A more efficient representation is:

struct string {
size_t length;
char data[];
};

Efficient? Apart from requiring C99 (harder to get a compiler
for than C++) , it uses a lot more memory and is slower at
runtime because you have to do extra checks and updates every
time you modify the string.

This is important for security reasons. A length check is just 4 assembly
instructions at most!

The length operation becomes just a memory read.
This would considerably speed the programs.
Programs which check length a lot and do not do much else
with the string, possibly. This is a small subset of programs.

Sorry but the length operations is used VERY often.

The basic idea is to use a string type that is length
prefixed and allows run-time checking against UB: undefined
behavior.

How slow and inefficient. I prefer to prevent UB by coding
correctly.

And you never make mistakes of course. You are Superman.
Also, how do you propose to check against UB
this way. Anyone can introduce UB by modifying the string
directly, unless you also enforce all modifications to it
to be via a library. (how slow).

Strings should not be modified directly. Very simple.
Maybe 0.0000000001 seconds slower, but much faster to
develop.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.

And if they were equal length you have wasted a comparison.

This is around 2-3 assembly instructions...
At 2GHZ it is 0.000000000something seconds
This seems to be a problem with any string representation
(including C's builtin one)
A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.

struct string {
size_t length;
char *data;
};
How does this allow resizing without invalidating pointers?

This is a misunderstanding. I suppose you have pointers to the string
not to the middle of teh data.

In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.
What bloat. The C library is big enough as it is.

I am sorry but there are functions for calculating the
complex square root but not for using a reasonable
string type.

C string type is completely outdated. Yes, you can use
it in small machines but it is too ERROR PRONE!
I think you are proposing bounds-checked arrays. The standard
allows you to implement this already. Why not go and do it
and then advertise your compiler as supporting bounds checking.
Well I am doing that, but the intent here is to make people
aware that this problems should be solved in a general fashion.

I do not want to just add a "special" solution but to see if
we can solve a general problem in a collective way i.e.
in the way of standards.
In fact, why not write a library as you have proposed, and package
it with your compiler. Then people can choose if it suits them or not.

Yes, I am doing that as a "proof of concept"

Do you agree?

Not really, no. You seem to be making big assumptions about
the rest of the world's programming requirements.

Yes.
I assume that security is important, that extreme efficiency is not
that important, and that you can spare size_t bytes for the length

jacob

Nov 14 '05 #20

Nils Petter Vaskinn

On Thu, 04 Mar 2004 13:49:44 +0100, jacob navia wrote:

"Richard Bos" <rl*@hoekstra-uitgeverij.nl> a écrit dans le message de
news:40****************@news.individual.net...

But in that case, you'll hit the second case: now, you're using four
bytes, maybe eight, to address a single string, AOT C's one-byte null
terminator. Not a problem, perhaps, if all your strings are dozens of
kilobytes, but most applications seem to use lots and lots of small
strings, every single one potentially costing you seven bytes extra,
and only a few large ones.

There is no free lunch.
Yes, it will cost at least size_t bytes more than a zero terminated
string. So what?
Many applications can afford this.

There it is.

Many != all.

Your proposal would impose that cost on all applications. Leaving things
as they are would allow applications to store the length alongside the
string if they choose to. (or link against a library that provides such a
string type and replacement string manipulation functions)

--
NPV

"the large print giveth, and the small print taketh away"
Tom Waits - Step right up

Nov 14 '05 #21

Dan Pop

In <c2**********@news-reader4.wanadoo.fr> "jacob navia" <ja***@jacob.remcomp.fr> writes:

As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.
If I didn't already know what a C string is, I wouldn't have been able
to make any sense out of this sentence.
This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

It doesn't hurt to use your common sense in validating your opinions.
If C strings were "extremely inefficient", that would have been a much
bigger problem 30 years ago, when computers were orders of magnitude
slower than today. Yet, no one produced a fix then. No alternate
string libraries designed and implemented for C since then have
acquired any kind of popularity.

Since C programmers aren't the last people to care about efficiency,
what conclusion can you draw?

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #22

Richard Bos

=?ISO-8859-1?Q?Tor_Husab=F8?= <to***@student.hf.uio.no> wrote:

Dik T. Winter wrote:
Eh? The "Pascal-style"? In the only Pascal I ever used (on the CDC
Cyber, the original Pascal from ETH Zuerich), a string was implemented
as a sequence of characters, and that was it. Nearly the same as in C,
except that there was no terminator.

Meaning you had to keep track of the length in a separate variable?

Meaning original Pascal programs had to be written for fixed-length
strings, and each different fixed length would need a different set of
functions. Ditto for arrays of other types. And verily, it suckéd
mightily, and there was much wailing and gnashing of teeth. Forwhich
conformant array schemas were invented. And verily, it did still suck
mightily, but not nearly as mightily as before.

Richard

Nov 14 '05 #23

Richard Bos

"jacob navia" <ja***@jacob.remcomp.fr> wrote:

"Richard Bos" <rl*@hoekstra-uitgeverij.nl> a écrit dans le message de
news:40****************@news.individual.net...
"jacob navia" <ja***@jacob.remcomp.fr> wrote:

struct string {
size_t length;
char data[];
};
But in that case, you'll hit the second case: now, you're using four
bytes, maybe eight, to address a single string, AOT C's one-byte null
terminator. Not a problem, perhaps, if all your strings are dozens of
kilobytes, but most applications seem to use lots and lots of small
strings, every single one potentially costing you seven bytes extra, and
only a few large ones.

There is no free lunch.
Yes, it will cost at least size_t bytes more than a zero terminated string.
So what?
Many applications can afford this.

This is the attitude that gave us Access and Outlook. And people wonder
why modern programs are bloated and sluggardly...
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.

How? What if I want a 30-char array, of which the first 20 to 29 chars
contain a string, and the last char contains a checksum? Modifying the
checksum wouldn't be undefined behaviour at all, but it would write
beyond data[length-1].

You can implement check sums in my schema.

Not if you want to use run-time checks on the _string_ length rather
than on the _array_ length.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

No, they couldn't. Puzzle for you: how would you extend the
interoperation between strings and char pointers, using your
length-strings?

I have started a library

Library, schmiblary. In C, it is a simple matter of assigning a pointer.
Who wants to use a schema that only makes serious programming more
difficult?

And frankly, that sums up my reaction to this idea. It would make
programming with strings in C considerably more hassle-ful, to, as far
as I can tell, an entirely ephemeral speed advantage, which is more than
likely to be offset by the more complicated and time-consuming code
needed to handle these strings in any other circumstance than finding
their length - an operation which isn't performed as often as you'd
tihnk, in quality code.

Richard

Nov 14 '05 #24

jacob navia

"Dan Pop" <Da*****@cern.ch> a écrit dans le message de
news:c2**********@sunnews.cern.ch...

In <c2**********@news-reader4.wanadoo.fr> "jacob navia" <ja***@jacob.remcomp.fr> writes:
As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.
If I didn't already know what a C string is, I wouldn't have been able
to make any sense out of this sentence.

Zero delimited means a zero is written in the last byte to signal the end of
the string Dan
I wanted to emphasize "unbounded" because there is no way to know if
the zero is not there where the pointer will end pointing to...

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.
It doesn't hurt to use your common sense in validating your opinions.
If C strings were "extremely inefficient", that would have been a much
bigger problem 30 years ago, when computers were orders of magnitude
slower than today. Yet, no one produced a fix then. No alternate
string libraries designed and implemented for C since then have
acquired any kind of popularity.

There are many Dan. Just search in Google and you will find zig libraries
that implement this with different emphasis in different objectives.
The objective of this discussion is to see why the *language* doesn't
support any other schema for implementing strings.
Since C programmers aren't the last people to care about efficiency,
what conclusion can you draw?

Since language support doesn't encourage the use of bounded pointers
C string handling is much more error prone than it should be.

Never had the traps because of the missing zero?

The failure modes of the string functions in the library like strcpy
are just horrible. Memory corruption is guaranteed unless a zero
is found...

Nov 14 '05 #25

Leor Zolman

On Thu, 04 Mar 2004 11:05:07 GMT, rl*@hoekstra-uitgeverij.nl (Richard Bos)
wrote:

Leor Zolman <le**@bdsoft.com> wrote:
Jaocb: There's even a common term used to describe C++ when used only for
those features that are a direct "clean-up" of messiness left over from C's
need to be backward-compatible with earlier incarnations of itself: "A
Better C".
Common, and used, by whom? C++ programmers, I suspect

Well, yes...why would C progammers bother coming up with pet names for
/any/ particular usage style of C++?
. _My_ term for
such a hybrid monstrum would be "A Bloody Mess".

If you want C++, be a man and program in C++. Don't go pretend that
you're almost using C.
I think it makes perfect sense, if you're going to make the move from C to
C++, to start by learning one feature (or just a few features) at a time.
What's wrong with the first one to focus on being, say, std::string ?
-leor

Richard

Nov 14 '05 #26

Nils Petter Vaskinn

On Thu, 04 Mar 2004 15:23:52 +0100, jacob navia wrote:

There are many Dan. Just search in Google and you will find zig libraries
that implement this with different emphasis in different objectives.
The objective of this discussion is to see why the *language* doesn't
support any other schema for implementing strings.

Because then the language and/or compiler writers would have to choose one
of those implementations, instead of allowing the language user the choice.

Having one or more string implementations with replacement string handling
functions can be a good idea while choosing one as part of the language or
standard library isn't.

--
NPV

"the large print giveth, and the small print taketh away"
Tom Waits - Step right up

Nov 14 '05 #27

Dan Pop

In <hE*******************@news2.e.nsc.no> =?ISO-8859-1?Q?Tor_Husab=F8?= <to***@student.hf.uio.no> writes:

Dik T. Winter wrote:
In article <Pi***********************************@unix49.andr ew.cmu.edu> "Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:
...
> But remember where I said "so far, so good"? What you *should*
> have concluded, there, was that since the "Pascal-style" string
> model is so superior to the "C-style" string model, that wouldn't
> it be neat if somebody implemented a C compiler that used the
> Pascal model internally?!

Eh? The "Pascal-style"? In the only Pascal I ever used (on the CDC
Cyber, the original Pascal from ETH Zuerich), a string was implemented
as a sequence of characters, and that was it. Nearly the same as in C,
except that there was no terminator.

Meaning you had to keep track of the length in a separate variable?
Sounds unlikely...

Meaning that the string length was declared explicitly (the size of the
character array containing the string) or implicitly (the number of
characters in the string literal) and it was the compiler that kept
track of it. Also meaning that the original Pascal strings were a
lot less flexible than C strings. Particular Pascal implementations
(e.g. Turbo Pascal) extended the flexibility of the Pascal strings,
by introducing a character count.

The canonical example of language using counted strings is BASIC, not
Pascal. But, even in BASIC's case, it was not part of the (original)
language specification.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #28

Dan Pop

In <c2**********@news-reader1.wanadoo.fr> "jacob navia" <ja***@jacob.remcomp.fr> writes:

"Dan Pop" <Da*****@cern.ch> a écrit dans le message de
news:c2**********@sunnews.cern.ch...
In <c2**********@news-reader4.wanadoo.fr> "jacob navia"<ja***@jacob.remcomp.fr> writes:
>As everybody knows, C uses a zero delimited unbounded
>pointer for its representation of strings.

If I didn't already know what a C string is, I wouldn't have been able
to make any sense out of this sentence.

Zero delimited means a zero is written in the last byte to signal the end of
the string Dan

Nope, it means a zero before the string and another zero after. Delimited
is not a synonym for terminated.
I wanted to emphasize "unbounded" because there is no way to know if
the zero is not there where the pointer will end pointing to...
You don't know where the pointer will end pointing to. Your wording
simply didn't make any sense to anyone but you.

The representation of a string in C is the sequence of characters, up to
and including the null terminator. No kind of pointer is involved in the
representation of a C string.

>This is extremely inefficient because at each query of the
>length of the string, the computer starts an unbounded
>memory scan searching for a zero that ends the string.

It doesn't hurt to use your common sense in validating your opinions.
If C strings were "extremely inefficient", that would have been a much
bigger problem 30 years ago, when computers were orders of magnitude
slower than today. Yet, no one produced a fix then. No alternate
string libraries designed and implemented for C since then have
acquired any kind of popularity. ^^^^

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There are many Dan. Just search in Google and you will find zig libraries
that implement this with different emphasis in different objectives.

Are you reading impaired or what? Which of them qualifies as popular?
The objective of this discussion is to see why the *language* doesn't
support any other schema for implementing strings.
No other scheme proved to by better in a GENERAL PURPOSE context.
As you admit yourself, the alternate libraries are designed for well
defined goals, rather than as universal replacements for the C strings.

And the very existence of these libraries proves that the C language DOES
support alternate schemes. So, your point is moot.

Since C programmers aren't the last people to care about efficiency,
what conclusion can you draw?

Since language support doesn't encourage the use of bounded pointers
C string handling is much more error prone than it should be.

1. This is not a performance issue.

2. This is a *general* problem of C: most C features are error prone in
the hands of the incompetent.
Never had the traps because of the missing zero?
Nope.
The failure modes of the string functions in the library like strcpy
are just horrible. Memory corruption is guaranteed unless a zero
is found...

Dynamic memory allocation has exactly the same problems: write beyond
a dynamically allocated memory bolck (in either direction) and memory
corruption will (most likely) bite you, sooner or later. What is your
better replacement for malloc and friends?

C is a sharp tool *by design*. People who can't use sharp tools or are
afraid of them, should not use C. There are plenty of other programming
languages designed for them so there is no need to turn C into a less
sharp tool (and, therefore less effective in the hands of the competent
programmers) and annoy C's *intended* user base.

There are many ways in which C needs to be extended, but adding more
string formats is not one of them. You're wasting your time trying to
fix something that isn't broken.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de

Nov 14 '05 #29

Rob Thorpe

"Mike Wahler" <mk******@mkwahler.net> wrote in message news:<wm*******************@newsread2.news.pas.ear thlink.net>...

"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:c2**********@news-reader1.wanadoo.fr...

"Mike Wahler" <mk******@mkwahler.net> a écrit dans le message de
news:LM*******************@newsread2.news.pas.eart hlink.net...
"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:c2**********@news-reader4.wanadoo.fr...
I think you're proposing C++. Rather than try to 'reinvent' it,
I just use it.

Well I can't use it Mike.

Whatever.
Just too complex.

You needn't use all of it. You seem to be wanting
a 'real' string type, which C++ has, and it's not
at all difficult to use. Actually, if such a type
is needed, imo that's a good enough reason to use C++
(even if everything else is only the common subset
of the two languages).
Default instantiated template traits?

So don't use 'em.
I use the parts I find useful, discard the rest.
The crux of the matter is knowing when to stop. When a feature
becomes a nuisance, and doesn't simplify the task it is better
to drop it.

Or ignore it. Simple, huh?

The problem is you just can't ignore them, because on a project of any
size others will start to use them.

Do you think it's practical to use a subset of C++ for anything
outside of your own personal code, or maybe code developed by a couple
of people. Maybe in an organisation with very strict procedures.

Nov 14 '05 #30

Alan Balmer

On Wed, 3 Mar 2004 23:07:25 +0100, "jacob navia"
<ja***@jacob.remcomp.fr> wrote:

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.
But asking for length is not the only thing we do with strings, or
even the most common.
A more efficient representation is:

struct string {
size_t length;
char data[];

Perhaps more efficient for some uses, but less efficient for others.

As you probably know, there are languages which use this type of
string representation. And of course, there is nothing to prevent you
writing your own string-handling library based on this representation.
In fact, that would be an excellent exercise.

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 14 '05 #31

Ben Pfaff

rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:

You might get away with this on systems where sizeof(size_t) ==
sizeof('\0'). IOW, an embedded device, most likely.

sizeof(size_t) == sizeof('\0') is very common. For example, it
is true on most "32-bit" systems.

Maybe you meant sizeof(size_t) == sizeof(char).
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x1f6},*p=
b,x,i=24;for(;p+=!*p;*p/=4)switch(x=*p&3)case 0:{return 0;for(p--;i--;i--)case
2:{i++;if(1)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}

Nov 14 '05 #32

CBFalconer

"Dik T. Winter" wrote:

"Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:
...
But remember where I said "so far, so good"? What you *should*
have concluded, there, was that since the "Pascal-style" string
model is so superior to the "C-style" string model, that wouldn't
it be neat if somebody implemented a C compiler that used the
Pascal model internally?!

Eh? The "Pascal-style"? In the only Pascal I ever used (on the
CDC Cyber, the original Pascal from ETH Zuerich), a string was
implemented as a sequence of characters, and that was it. Nearly
the same as in C, except that there was no terminator.

And that all "strings" have fixed size, known to the compiler at
compile time. They also are indexed 1..length. This still
applies to ISO7185 level 0, but ISO10206 has an extended string
type.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #33

Derk Gwen

"jacob navia" <ja***@jacob.remcomp.fr> wrote:
# As everybody knows, C uses a zero delimited unbounded
# pointer for its representation of strings.
#
# This is extremely inefficient because at each query of the
# length of the string, the computer starts an unbounded
# memory scan searching for a zero that ends the string.
#
# A more efficient representation is:

Been there, someone else's done that, and all I got is this lousy T-Shirt
with Tcl_Obj* on it.

There are lots of libraries that provided counted strings, and you can
load these with your code.

--
Derk Gwen http://derkgwen.250free.com/html/index.html
God's a skeeball fanatic.

Nov 14 '05 #34

jacob navia

"Ben Pfaff" <bl*@cs.stanford.edu> a écrit dans le message de
news:87************@pfaff.stanford.edu...

--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[] ={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x1f6},*
p= b,x,i=24;for(;p+=!*p;*p/=4)switch(x=*p&3)case 0:{return 0;for(p--;i--;i--)case 2:{i++;if(1)break;else default:continue;if(0)case

1:putchar(a[i&15]);break;}}}

This is wrong Ben, since you get a "missing return value" warning :-)

I would change the sig to this:
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long
b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x1f6},*
p=
b,x,i=24;for(;p+=!*p;*p/=4)switch(x=*p&3)case 0:{return
0;for(p--;i--;i--)case
2:{i++;if(1)break;else default:continue;if(0)case
1:putchar(a[i&15]);break;}}return 0;}

See the return 0 at the end???

jacob

Nov 14 '05 #35

Ben Pfaff

"jacob navia" <ja***@jacob.remcomp.fr> writes:

"Ben Pfaff" <bl*@cs.stanford.edu> a Ã©crit dans le message de
news:87************@pfaff.stanford.edu...
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long

b[]

={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x1f6},*
p=
b,x,i=24;for(;p+=!*p;*p/=4)switch(x=*p&3)case 0:{return

0;for(p--;i--;i--)case
2:{i++;if(1)break;else default:continue;if(0)case

1:putchar(a[i&15]);break;}}}

This is wrong Ben, since you get a "missing return value" warning :-)

So? It's C99.
--
"Am I missing something?"
--Dan Pop

Nov 14 '05 #36

Peter Ammon

jacob navia wrote:

As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

A more efficient representation is:

struct string {
size_t length;
char data[];
};

The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.

A string like the one described above is not able to
resize itself. Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.

A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.

struct string {
size_t length;
char *data;
};

There is no compelling reason to choose one or the other.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.

Syntactic sugar.

I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.

Not a big deal for today's compilers.

Length checked strings can then use:

String s;
...
s[2] = 'a';

I think I am proposing the obvious.

Do you agree?

jacob

I don't understand why everyone is comparing this to C++. The obvious
parallel is to Pascal, which used exactly this sort of string
representation.

--
Pull out a splinter to reply.

Nov 14 '05 #37

jacob navia

I don't understand why everyone is comparing this to C++. The obvious
parallel is to Pascal, which used exactly this sort of string
representation.

Yes. Indeed. And it was a good thing, since bound checked strings
are NECESSARY, specially in the development phase of the program.

jacob

Nov 14 '05 #38

Default User

Dan Pop wrote:

Particular Pascal implementations
(e.g. Turbo Pascal) extended the flexibility of the Pascal strings,
by introducing a character count.

As I recall, Turbo Pascal stored the size of the string in the first
character slot, thus limiting the max string length to 255 (on the
DOS-based platform I used).

Not very handy if you needed longer strings than that.
Brian Rodenborn

Nov 14 '05 #39

Arthur J. O'Dwyer

On Thu, 4 Mar 2004, Dik T. Winter wrote:

"Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:
...
But remember where I said "so far, so good"? What you *should*
have concluded, there, was that since the "Pascal-style" string
model is so superior to the "C-style" string model, that wouldn't
it be neat if somebody implemented a C compiler that used the
Pascal model internally?!

Eh? The "Pascal-style"? In the only Pascal I ever used (on the CDC
Cyber, the original Pascal from ETH Zuerich), a string was implemented
as a sequence of characters, and that was it. Nearly the same as in C,
except that there was no terminator.

For the record and FWIW, I was thinking of Turbo Pascal's strings;
TP3.0 was one of the first languages I learned.

-Arthur

Nov 14 '05 #40

Arthur J. O'Dwyer

On Thu, 4 Mar 2004, Rob Thorpe wrote:

"Mike Wahler" <mk******@mkwahler.net> wrote...
"jacob navia" <ja***@jacob.remcomp.fr> wrote...
"Mike Wahler" <mk******@mkwahler.net> a écrit...
>
> I think you're proposing C++. Rather than try to 'reinvent' it,
> I just use it.

Well I can't use it Mike. [...] Just too complex.
I use the parts I find useful, discard the rest.
The crux of the matter is knowing when to stop. When a feature
becomes a nuisance, and doesn't simplify the task it is better
to drop it.
Or ignore it. Simple, huh?

The problem is you just can't ignore them, because on a project of
any size others will start to use them.

Solution 1: Don't care what the other guy is doing; that's *his*
part of the project, and you only need to know the interfaces. Make
sure the interfaces are all written using standard C types and
passing mechanisms, so that modules can talk to each other with
some sort of reliability.
Solution 2: Tell the other guy up front that he shouldn't use
the complicated parts of C++. Or better, get your boss to tell
him.
Solution 3: Tell the other guy that he can't use C++, period.
Then *you* use C++, and compile it so that it links with C code.
Link it with the other guy's module (written in nice readable C).
Do you think it's practical to use a subset of C++ for anything
outside of your own personal code, or maybe code developed by a couple
of people. Maybe in an organisation with very strict procedures.

Eh, probably not. ;-) But I think it's useless to say that
C++ is less practical than C; it's going to get used anyway,
because it really does make some things easier. std::string and
std::map are my friends, and I *do* use them when it's appropriate.
The main reason I don't use C++ for everything is that the STL
methods have such weird names; I have to keep a reference open
on my desktop whenever I'm doing anything with std::map!

-Arthur,
heretic

Nov 14 '05 #41

jacob navia

----- Original Message -----
From: "Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu>
Newsgroups: comp.lang.c
Sent: Thursday, March 04, 2004 9:31 PM
Subject: Re: Increasing efficiency in C

The main reason I don't use C++ for everything is that the STL
methods have such weird names; I have to keep a reference open
on my desktop whenever I'm doing anything with std::map!

-Arthur,
heretic

I am writing map_string. Will take a function and return a string built with
the results of applying the function to each character.

I would like to see your specs. It *can* be ritten in C!

jacob

Nov 14 '05 #42

Roc

> Dan Pop wrote:

Particular Pascal implementations
(e.g. Turbo Pascal) extended the flexibility of the Pascal strings,
by introducing a character count.

As I recall, Turbo Pascal stored the size of the string in the first
character slot, thus limiting the max string length to 255 (on the
DOS-based platform I used).

Not very handy if you needed longer strings than that.
Brian Rodenborn

As I recall, that was more convention for arrays than it was stipulated by
the language, wasn't it?

Nov 14 '05 #43

Alan Balmer

On Thu, 4 Mar 2004 21:48:12 +0100, "jacob navia"
<ja***@jacob.remcomp.fr> wrote:

----- Original Message -----
From: "Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu>
Newsgroups: comp.lang.c
Sent: Thursday, March 04, 2004 9:31 PM
Subject: Re: Increasing efficiency in C

The main reason I don't use C++ for everything is that the STL
methods have such weird names; I have to keep a reference open
on my desktop whenever I'm doing anything with std::map!

-Arthur,
heretic

I am writing map_string. Will take a function and return a string built with
the results of applying the function to each character.

I would like to see your specs. It *can* be ritten in C!

IIRC, that's an example in Koenig's "pitfalls" book.

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 14 '05 #44

Mike Wahler

"Peter Ammon" <ge******@splintermac.com> wrote in message
news:aD*******************@newssvr25.news.prodigy. com...

jacob navia wrote:
As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

A more efficient representation is:

struct string {
size_t length;
char data[];
};

The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.

A string like the one described above is not able to
resize itself. Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.

A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.

struct string {
size_t length;
char *data;
};

There is no compelling reason to choose one or the other.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.

Syntactic sugar.

I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.

Not a big deal for today's compilers.

Length checked strings can then use:

String s;
...
s[2] = 'a';

I think I am proposing the obvious.

Do you agree?

jacob

I don't understand why everyone is comparing this to C++. The obvious
parallel is to Pascal, which used exactly this sort of string
representation.

I believe I was the first in this thread to mention C++.
I did so 1) because I'm familiar with it, 2) because
Jacob seems to be clamoring for the 'safety' and
'intelligence' which is built into C++'s 'std::string'
type.

As for his remarks about C, it seems he wants to put
training wheels on a Harley-Davidson racing motorcycle.
No thanks. I'd certainly wear a helmet (take precautions),
but *I* will decide how far I should lean into the turns.

I could have easily said BASIC instead. I used C++ as
an example, not necessarily as a 'cure-all' (I actually
use C far most often than other languages in my production
work, for a variety of reasons).
-Mike

Nov 14 '05 #45

jacob navia

"Mike Wahler" <mk******@mkwahler.net> a écrit dans le message de
news:ug*******************@newsread1.news.pas.eart hlink.net...

As for his remarks about C, it seems he wants to put
training wheels on a Harley-Davidson racing motorcycle.
Yes. You got that 100%.

As you may know, even Harley-Davidson drivers weren't born
knowing how to drive those beasts.

They have to learn as anybody else.

At one time we were all beginners isn't it?

Training wheels are very useful since they allow to train
yourself using the machine without doing any
harm.

Undefined behavior, passing red lights without stopping
and all kinds of bad driving are to be actively eliminted.

This requires more training.
No thanks. I'd certainly wear a helmet (take precautions),
but *I* will decide how far I should lean into the turns.
Yes. But when you lean too far the machine should have
a safety net isn't it?

I know a computer crash is harmless compared to
a Harley Davidson crash, at least, nothing serious
happens to you even if you crash at full C speed.
:-)

I could have easily said BASIC instead. I used C++ as
an example, not necessarily as a 'cure-all' (I actually
use C far most often than other languages in my production
work, for a variety of reasons).

Well, I think that when driving a computer the
machine should have a safe environment. You can
drive even a Harley Davidson safely.

Specially with a fast machine is easy to lean too far,
as you know.

I prefer safer environments. Risk taking is boring at the
end. Why keep bugs around for years?

Above all:

Why can't be C conceived as an evolving language like
any other?

Are we stuck with those strings forever or what?

jacob

Nov 14 '05 #46

jacob navia

"Alan Balmer" <al******@att.net> a écrit dans le message de
news:rg********************************@4ax.com...

On Thu, 4 Mar 2004 21:48:12 +0100, "jacob navia"
<ja***@jacob.remcomp.fr> wrote:
I would like to see your specs. It *can* be written in C!

IIRC, that's an example in Koenig's "pitfalls" book.

C is a pitfall then.

I know this opinion is widespread among some people.
Specially in C++ circles :-)

Why can't be map be written in C?

I wrote my first map for a lisp interpreter, I wrote in
C around 1990.

The idea that map can't be written in C is absurd. You pass
to a function each element in a sequence. You can obtain (in
one of the possible versions) a similar list or vector, that is
a map of the function applied to each char.

Of course THAT map has not all the bells and whistles
of C++ and that is precisely the point. It can be written
in C, not in C++.

Of course C can't write C++. That is precisely what
makes C interesting.

A mapping function is no longer that complicated.
Apply a function to a container in sequence.

Very simple.

jacob

Nov 14 '05 #47

Martin Ambuhl

Default User wrote:

Dan Pop wrote:

Particular Pascal implementations
(e.g. Turbo Pascal) extended the flexibility of the Pascal strings,
by introducing a character count.

As I recall, Turbo Pascal stored the size of the string in the first
character slot, thus limiting the max string length to 255 (on the
DOS-based platform I used).

Not very handy if you needed longer strings than that.

This was the scheme implemented earlier in a number of languages. For
example, the MU-BASIC provided with RT-11 (for working scientists to
write real-time programs on the PDP-11 with a minimum of learning costs)
had string variables with element 0 containing the size and the indices
of the characters being 1-255.
MU-BASIC also provided virtual arrays, wherein an array's elements might
be stored on the disk rather than in memory and arrays that _had_ to be
in memory could be kept there.

Nov 14 '05 #48

Arthur J. O'Dwyer

On Thu, 4 Mar 2004, jacob navia wrote:

"Alan Balmer" <al******@att.net> a écrit...
On Thu, 4 Mar 2004 21:48:12 +0100, "jacob navia" wrote:

I would like to see your specs. It *can* be written in C!
IIRC, that's an example in Koenig's "pitfalls" book.

C is a pitfall then.

I know this opinion is widespread among some people.
Specially in C++ circles :-)

Why can't be map be written in C?

:) You missed the point. Your pitfall was not in your assumption
that "X can always be written in C," but rather in your assumption
that "because A does not use C for X, he must think that X *cannot*
be done in C."
Here's an example of a situation in which I have used C++, even
though it *could* be done in C. Note that this is a throw-away
program, not a big application:

Given a phone number composed of decimal digits, and a dictionary
of English words such as /usr/dict, produce a list of plausible
mnemonics for the number according to a standard telephone keypad.
E.g., given the input number "278487," the program would produce a
list including "Arthur," "2-rug-up," and "2-suits."

This IMHO is much easier to hack together when given the building
blocks of std::string and std::map, than it would be in C.

<snip> The idea that map can't be written in C is absurd. You pass
to a function each element in a sequence. You can obtain (in
one of the possible versions) a similar list or vector, that is
a map of the function applied to each char.

That's not std::map. std::map is a container that MAPS things
onto other things. The functional-programming function 'map'
is something else entirely, and is probably duplicated somewhere
in C++'s <algorithm> or <functional> headers, I dunno and I duncare.

-Arthur

Nov 14 '05 #49

Arthur J. O'Dwyer

On Thu, 4 Mar 2004, jacob navia wrote:

I am writing map_string. Will take a function and return a string
built with the results of applying the function to each character.

You *do* realize this is a one-liner, right?

void mapstr(char *d, char *s, int(*f)(int))
{
while (*s)
*d++ = f(*s++);
*d = '\0';
return;
}

Possible enhancements: Allow 'mapstr(s,t,0)' as a synonym
for 'strcpy(s,t)'. Control for the possibility that f(k)
equals zero for some k!=0. If d is NULL, allocate and return
space for the resulting string using 'malloc' or a static buffer.
<OT>
Implement 'foldl' and/or 'foldr' over strings (although 'foldr'
would probably be silly, and both really require template
programming to be useful, which C doesn't have).
</OT>

-Arthur

Nov 14 '05 #50

Increasing efficiency in C

Similar topics