473,899 Members | 4,012 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Increasing efficiency in C

As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

A more efficient representation is:

struct string {
size_t length;
char data[];
};

The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.

A string like the one described above is not able to
resize itself. Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.

A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.

struct string {
size_t length;
char *data;
};

There is no compelling reason to choose one or the other.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.

Syntactic sugar.

I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.

Not a big deal for today's compilers.

Length checked strings can then use:

String s;
....
s[2] = 'a';

I think I am proposing the obvious.

Do you agree?

jacob
Nov 14 '05
100 3672

On Thu, 4 Mar 2004, Rob Thorpe wrote:

"Mike Wahler" <mk******@mkwah ler.net> wrote...
"jacob navia" <ja***@jacob.re mcomp.fr> wrote...
"Mike Wahler" <mk******@mkwah ler.net> a écrit...
>
> I think you're proposing C++. Rather than try to 'reinvent' it,
> I just use it.

Well I can't use it Mike. [...] Just too complex.
I use the parts I find useful, discard the rest.
The crux of the matter is knowing when to stop. When a feature
becomes a nuisance, and doesn't simplify the task it is better
to drop it.
Or ignore it. Simple, huh?


The problem is you just can't ignore them, because on a project of
any size others will start to use them.


Solution 1: Don't care what the other guy is doing; that's *his*
part of the project, and you only need to know the interfaces. Make
sure the interfaces are all written using standard C types and
passing mechanisms, so that modules can talk to each other with
some sort of reliability.
Solution 2: Tell the other guy up front that he shouldn't use
the complicated parts of C++. Or better, get your boss to tell
him.
Solution 3: Tell the other guy that he can't use C++, period.
Then *you* use C++, and compile it so that it links with C code.
Link it with the other guy's module (written in nice readable C).
Do you think it's practical to use a subset of C++ for anything
outside of your own personal code, or maybe code developed by a couple
of people. Maybe in an organisation with very strict procedures.


Eh, probably not. ;-) But I think it's useless to say that
C++ is less practical than C; it's going to get used anyway,
because it really does make some things easier. std::string and
std::map are my friends, and I *do* use them when it's appropriate.
The main reason I don't use C++ for everything is that the STL
methods have such weird names; I have to keep a reference open
on my desktop whenever I'm doing anything with std::map!

-Arthur,
heretic
Nov 14 '05 #41

----- Original Message -----
From: "Arthur J. O'Dwyer" <aj*@nospam.and rew.cmu.edu>
Newsgroups: comp.lang.c
Sent: Thursday, March 04, 2004 9:31 PM
Subject: Re: Increasing efficiency in C

The main reason I don't use C++ for everything is that the STL
methods have such weird names; I have to keep a reference open
on my desktop whenever I'm doing anything with std::map!

-Arthur,
heretic


I am writing map_string. Will take a function and return a string built with
the results of applying the function to each character.

I would like to see your specs. It *can* be ritten in C!

jacob
Nov 14 '05 #42
Roc
> Dan Pop wrote:
Particular Pascal implementations
(e.g. Turbo Pascal) extended the flexibility of the Pascal strings,
by introducing a character count.


As I recall, Turbo Pascal stored the size of the string in the first
character slot, thus limiting the max string length to 255 (on the
DOS-based platform I used).

Not very handy if you needed longer strings than that.
Brian Rodenborn


As I recall, that was more convention for arrays than it was stipulated by
the language, wasn't it?
Nov 14 '05 #43
On Thu, 4 Mar 2004 21:48:12 +0100, "jacob navia"
<ja***@jacob.re mcomp.fr> wrote:

----- Original Message -----
From: "Arthur J. O'Dwyer" <aj*@nospam.and rew.cmu.edu>
Newsgroups: comp.lang.c
Sent: Thursday, March 04, 2004 9:31 PM
Subject: Re: Increasing efficiency in C

The main reason I don't use C++ for everything is that the STL
methods have such weird names; I have to keep a reference open
on my desktop whenever I'm doing anything with std::map!

-Arthur,
heretic


I am writing map_string. Will take a function and return a string built with
the results of applying the function to each character.

I would like to see your specs. It *can* be ritten in C!

IIRC, that's an example in Koenig's "pitfalls" book.

--
Al Balmer
Balmer Consulting
re************* ***********@att .net
Nov 14 '05 #44

"Peter Ammon" <ge******@splin termac.com> wrote in message
news:aD******** ***********@new ssvr25.news.pro digy.com...
jacob navia wrote:
As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

A more efficient representation is:

struct string {
size_t length;
char data[];
};

The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.

A string like the one described above is not able to
resize itself. Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.

A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.

struct string {
size_t length;
char *data;
};

There is no compelling reason to choose one or the other.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.

Syntactic sugar.

I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.

Not a big deal for today's compilers.

Length checked strings can then use:

String s;
...
s[2] = 'a';

I think I am proposing the obvious.

Do you agree?

jacob


I don't understand why everyone is comparing this to C++. The obvious
parallel is to Pascal, which used exactly this sort of string
representation.


I believe I was the first in this thread to mention C++.
I did so 1) because I'm familiar with it, 2) because
Jacob seems to be clamoring for the 'safety' and
'intelligence' which is built into C++'s 'std::string'
type.

As for his remarks about C, it seems he wants to put
training wheels on a Harley-Davidson racing motorcycle.
No thanks. I'd certainly wear a helmet (take precautions),
but *I* will decide how far I should lean into the turns.

I could have easily said BASIC instead. I used C++ as
an example, not necessarily as a 'cure-all' (I actually
use C far most often than other languages in my production
work, for a variety of reasons).
-Mike
Nov 14 '05 #45

"Mike Wahler" <mk******@mkwah ler.net> a écrit dans le message de
news:ug******** ***********@new sread1.news.pas .earthlink.net. ..

As for his remarks about C, it seems he wants to put
training wheels on a Harley-Davidson racing motorcycle.
Yes. You got that 100%.

As you may know, even Harley-Davidson drivers weren't born
knowing how to drive those beasts.

They have to learn as anybody else.

At one time we were all beginners isn't it?

Training wheels are very useful since they allow to train
yourself using the machine without doing any
harm.

Undefined behavior, passing red lights without stopping
and all kinds of bad driving are to be actively eliminted.

This requires more training.
No thanks. I'd certainly wear a helmet (take precautions),
but *I* will decide how far I should lean into the turns.
Yes. But when you lean too far the machine should have
a safety net isn't it?

I know a computer crash is harmless compared to
a Harley Davidson crash, at least, nothing serious
happens to you even if you crash at full C speed.
:-)

I could have easily said BASIC instead. I used C++ as
an example, not necessarily as a 'cure-all' (I actually
use C far most often than other languages in my production
work, for a variety of reasons).


Well, I think that when driving a computer the
machine should have a safe environment. You can
drive even a Harley Davidson safely.

Specially with a fast machine is easy to lean too far,
as you know.

I prefer safer environments. Risk taking is boring at the
end. Why keep bugs around for years?

Above all:

Why can't be C conceived as an evolving language like
any other?

Are we stuck with those strings forever or what?

jacob
Nov 14 '05 #46

"Alan Balmer" <al******@att.n et> a écrit dans le message de
news:rg******** *************** *********@4ax.c om...
On Thu, 4 Mar 2004 21:48:12 +0100, "jacob navia"
<ja***@jacob.re mcomp.fr> wrote:
I would like to see your specs. It *can* be written in C!

IIRC, that's an example in Koenig's "pitfalls" book.


C is a pitfall then.

I know this opinion is widespread among some people.
Specially in C++ circles :-)

Why can't be map be written in C?

I wrote my first map for a lisp interpreter, I wrote in
C around 1990.

The idea that map can't be written in C is absurd. You pass
to a function each element in a sequence. You can obtain (in
one of the possible versions) a similar list or vector, that is
a map of the function applied to each char.

Of course THAT map has not all the bells and whistles
of C++ and that is precisely the point. It can be written
in C, not in C++.

Of course C can't write C++. That is precisely what
makes C interesting.

A mapping function is no longer that complicated.
Apply a function to a container in sequence.

Very simple.

jacob


Nov 14 '05 #47
Default User wrote:
Dan Pop wrote:

Particular Pascal implementations
(e.g. Turbo Pascal) extended the flexibility of the Pascal strings,
by introducing a character count.

As I recall, Turbo Pascal stored the size of the string in the first
character slot, thus limiting the max string length to 255 (on the
DOS-based platform I used).

Not very handy if you needed longer strings than that.


This was the scheme implemented earlier in a number of languages. For
example, the MU-BASIC provided with RT-11 (for working scientists to
write real-time programs on the PDP-11 with a minimum of learning costs)
had string variables with element 0 containing the size and the indices
of the characters being 1-255.
MU-BASIC also provided virtual arrays, wherein an array's elements might
be stored on the disk rather than in memory and arrays that _had_ to be
in memory could be kept there.
Nov 14 '05 #48

On Thu, 4 Mar 2004, jacob navia wrote:

"Alan Balmer" <al******@att.n et> a écrit...
On Thu, 4 Mar 2004 21:48:12 +0100, "jacob navia" wrote:

I would like to see your specs. It *can* be written in C!
IIRC, that's an example in Koenig's "pitfalls" book.


C is a pitfall then.

I know this opinion is widespread among some people.
Specially in C++ circles :-)

Why can't be map be written in C?


:) You missed the point. Your pitfall was not in your assumption
that "X can always be written in C," but rather in your assumption
that "because A does not use C for X, he must think that X *cannot*
be done in C."
Here's an example of a situation in which I have used C++, even
though it *could* be done in C. Note that this is a throw-away
program, not a big application:

Given a phone number composed of decimal digits, and a dictionary
of English words such as /usr/dict, produce a list of plausible
mnemonics for the number according to a standard telephone keypad.
E.g., given the input number "278487," the program would produce a
list including "Arthur," "2-rug-up," and "2-suits."

This IMHO is much easier to hack together when given the building
blocks of std::string and std::map, than it would be in C.

<snip> The idea that map can't be written in C is absurd. You pass
to a function each element in a sequence. You can obtain (in
one of the possible versions) a similar list or vector, that is
a map of the function applied to each char.


That's not std::map. std::map is a container that MAPS things
onto other things. The functional-programming function 'map'
is something else entirely, and is probably duplicated somewhere
in C++'s <algorithm> or <functional> headers, I dunno and I duncare.

-Arthur

Nov 14 '05 #49

"Dan Pop" <Da*****@cern.c h> a écrit dans le message de
news:c2******** **@sunnews.cern .ch...
In <c2**********@n ews-reader1.wanadoo .fr> "jacob navia" <ja***@jacob.re mcomp.fr> writes:
"Dan Pop" <Da*****@cern.c h> a écrit dans le message de
news:c2******* ***@sunnews.cer n.ch...
In <c2**********@n ews-reader4.wanadoo .fr> "jacob navia"<ja***@jacob.r emcomp.fr> writes:
I wanted to emphasize "unbounded" because there is no way to know if
the zero is not there where the pointer will end pointing to...


You don't know where the pointer will end pointing to. Your wording
simply didn't make any sense to anyone but you.

The representation of a string in C is the sequence of characters, up to
and including the null terminator. No kind of pointer is involved in the
representation of a C string.


Wow Dan, this is news for me. No kind of pointer?

Not even a char * as it seems?

Strange. Are all those prototypes in string.h wrong?

I would fill a defect report.

Just do not exaggerate Dan. Let's keep cool ok?

I am speaking about a naked char * that points to the
start of a sequence o bytes that should end with a
terminating zero.

By definition of the data structure, its length is
not known, and the same scan must be repeated
each time we access the length.

More serious, the failure modes are quite horrible.

In writing mode a wild pointer is like a loaded
machine gun, ready to start shooting around
at random. Pieces of the program, essential data
like the return address are wiped by the gun,
without any way for the system to stop it.

The program is in an undeterminable state,
depending on the direction the machine gun was
shooting.

Ahh. How nice. We are fearful. We risk that but
it works you see?

*I* do not do any mistake, you say.

Well Dan, just keep cool.

I have no fear to recognize that I do make mistakes.
I am not a star programmer. I am a run of
the mill brain, that gets bored taking always this
new dangerous turn. Damm it. Can't the machine
do it for me?

You say:
It doesn't hurt to use your common sense in validating your opinions.
If C strings were "extremely inefficient", that would have been a much
bigger problem 30 years ago, when computers were orders of magnitude
slower than today. Yet, no one produced a fix then. No alternate
string libraries designed and implemented for C since then have
acquired any kind of popularity. ^^^^

^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^ ^
There are many Dan. Just search in Google and you will find zig libraries
that implement this with different emphasis in different objectives.


Are you reading impaired or what? Which of them qualifies as popular?


Well, Microsoft proposed one recently. And there are several.
I can't tell you which are "popular" since I am not doing
that kind of research. But they are surely used.
The objective of this discussion is to see why the *language* doesn't
support any other schema for implementing strings.


No other scheme proved to by better in a GENERAL PURPOSE context.
As you admit yourself, the alternate libraries are designed for well
defined goals, rather than as universal replacements for the C strings.


Safety was one of the more widespread goals. I am trying to
build checked strings into lcc-win32. I think that a more
debuggable environment is easier to work with.
And the very existence of these libraries proves that the C language DOES
support alternate schemes. So, your point is moot.

The language doesn't support it.

I repeat that length prefixed strings should be easy to
use: name[2] should do what is supposed to.

My whole point is that data structure development should
be opened up to the C user that should be able to
specify data types that follow special rules he/she defines.

For instance you could add a "flags" field to the standard
length prefixed string, and implement read only
strings, or time stamp based data, or whatever.

The language should allow people defining programs
that handle the data structures in a way it suits them the
best.

C is not object oriented but we all use lists, stacks, hash tables
in our everyday programming.
Since C programmers aren't the last people to care about efficiency,
what conclusion can you draw?


Since language support doesn't encourage the use of bounded pointers
C string handling is much more error prone than it should be.


1. This is not a performance issue.


No, this is a human performance issue. People get bored of
details. Computers do not.
People use computers to make repetitive work. Why can't
we use the computer to check for mistakes?

Your answer is:
2. This is a *general* problem of C: most C features are error prone in
the hands of the incompetent.

Your are competent Dan. Surely more than me.

I belong to the other ones.
The ones that make mistakes. I am not afraid of saying this,
maybe because I think knowing this is the start of
knowledge.

When you realize your mistakes you can start learning.
Only then.
Never had the traps because of the missing zero?


Nope.


:-)

Of course not Dan. Sure. I believe you that 100%
The failure modes of the string functions in the library like strcpy
are just horrible. Memory corruption is guaranteed unless a zero
is found...


Dynamic memory allocation has exactly the same problems: write beyond
a dynamically allocated memory bolck (in either direction) and memory
corruption will (most likely) bite you, sooner or later. What is your
better replacement for malloc and friends?


The garbage collector. I wrote one for my Lisp interpreter
in the 90ties, and I have adapted Mt Boehm's work to lcc-win32.

The GC is much better than malloc/free. But I know, that's
another discussion ...
C is a sharp tool *by design*. People who can't use sharp tools or are
afraid of them, should not use C. There are plenty of other programming
languages designed for them so there is no need to turn C into a less
sharp tool (and, therefore less effective in the hands of the competent
programmers) and annoy C's *intended* user base.

I want it to be sharper Dan. C is not sharp enough
with all those bugs that creep the programs.
You can't be sure of a tool if it is not designed to
be sharp and safe.

You take the knife not at the edge?

A knife is a sharp tool by its very nature.

But it can only be used because
you do not touch at the edge isn't it?

That blunt side, that provides safety for your hand
makes for a usable knife. Without it, using a knife
is cutting yourself in the fingers :-)
There are many ways in which C needs to be extended, but adding more
string formats is not one of them. You're wasting your time trying to
fix something that isn't broken.


That is the start. A better string library would be an achievement.

Nothing spectacular, and very simple.
Nov 14 '05 #50

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
8668
by: Good Man | last post by:
Hi everyone I'm using the "MySQL Administrator" program to keep tabs on the health of a web system i am developing. I think it's nice to have quick (gui) feedback on the query cache, memory variables, and other status variables. I've noticed that one of the status variables, "Aborted_connects" has been increasing steadily. This is defined by MySQL as "Number of tries to connect to the MySQL server that failed". I googled around a...
6
1843
by: Matik | last post by:
Hello all, I've following problem. Please forgive me not posting script, but I think it won't help anyway. I've a table, which is quite big (over 5 milions records). Now, this table contains one field (varchar), which contains some data in the chain. Now, there is a view on this table, to present the data to user. The
92
4141
by: Dave Rudolf | last post by:
Hi all, Normally, I would trust that the ANSI libraries are written to be as efficient as possible, but I have an application in which the majority of the run time is calling the acos(...) method. So now I start to wonder how that method, and others in the math.h library, are implemented. Dave
1
2287
by: Tomás | last post by:
dynamic_cast can be used to obtain a pointer or to obtain a reference. If the pointer form fails, then you're left with a null pointer. If the reference form fails, then an exception is thrown. Would "Feed1" or "Feed2" be preferable in the following: #include <iostream>
335
11990
by: extrudedaluminiu | last post by:
Hi, Is there any group in the manner of the C++ Boost group that works on the evolution of the C language? Or is there any group that performs an equivalent function? Thanks, -vs
19
2939
by: vamshi | last post by:
Hi all, This is a question about the efficiency of the code. a :- int i; for( i = 0; i < 20; i++ ) printf("%d",i); b:- int i = 10;
9
3330
by: OldBirdman | last post by:
Efficiency I've never stumbled on any discussion of efficiency of various methods of coding, although I have found posts on various forums where individuals were concerned with efficiency. I'm not concerned when dealing with user typing, but I am if a procedure is called by a query. Does the VBA compiler generate "in-line" code for some apparent function calls? For example, y = Abs(x) might be compiled as y = x & mask. The string...
4
8277
by: Rahul B | last post by:
Hi, I was getting the error: sqlcode: -911 sqlstate: 40001 , which is "The maximum number of lock requests has been reached for the database." So i increased the locklist size to 200 from the default value of 100. I wanted to know what other effects it will have on the database? Like, will the performance reduce, if the locklist size is 200 and 120 locks are on it as compared to when the locklist size is 130 and 120
4
6362
by: =?Utf-8?B?cmFuZHkxMjAw?= | last post by:
Visual Studio 2005, C# WinForms application: Here’s the question: How can I increase the standard 1 MB stack size of the UI thread in a C# WinForms application? Here’s why I ask: I’ve inherited some code that at the view (User Interface) layer kicks off a background worker thread. At the service layer (think CAB service layer), there’s quite a lot of the following:
0
9997
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
11272
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10971
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9666
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5887
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6081
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4720
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4300
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3317
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.