473,903 Members | 3,336 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Increasing efficiency in C

As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

A more efficient representation is:

struct string {
size_t length;
char data[];
};

The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.

A string like the one described above is not able to
resize itself. Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.

A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.

struct string {
size_t length;
char *data;
};

There is no compelling reason to choose one or the other.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.

Syntactic sugar.

I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.

Not a big deal for today's compilers.

Length checked strings can then use:

String s;
....
s[2] = 'a';

I think I am proposing the obvious.

Do you agree?

jacob
Nov 14 '05
100 3674

On Thu, 4 Mar 2004, jacob navia wrote:

I am writing map_string. Will take a function and return a string
built with the results of applying the function to each character.


You *do* realize this is a one-liner, right?

void mapstr(char *d, char *s, int(*f)(int))
{
while (*s)
*d++ = f(*s++);
*d = '\0';
return;
}

Possible enhancements: Allow 'mapstr(s,t,0)' as a synonym
for 'strcpy(s,t)'. Control for the possibility that f(k)
equals zero for some k!=0. If d is NULL, allocate and return
space for the resulting string using 'malloc' or a static buffer.
<OT>
Implement 'foldl' and/or 'foldr' over strings (although 'foldr'
would probably be silly, and both really require template
programming to be useful, which C doesn't have).
</OT>

-Arthur
Nov 14 '05 #51
On Thu, 4 Mar 2004 23:13:54 +0100, "jacob navia"
<ja***@jacob.re mcomp.fr> wrote:

"Alan Balmer" <al******@att.n et> a écrit dans le message de
news:rg******* *************** **********@4ax. com...
On Thu, 4 Mar 2004 21:48:12 +0100, "jacob navia"
<ja***@jacob.re mcomp.fr> wrote:
>I would like to see your specs. It *can* be written in C!
>

IIRC, that's an example in Koenig's "pitfalls" book.


C is a pitfall then.

I know this opinion is widespread among some people.
Specially in C++ circles :-)

Why can't be map be written in C?


You misunderstand. Of course it can be written in C, as illustrated in
Koenig's book. Who said otherwise?

As for the book title, "Traps and Pitfalls" books and articles have
been written for a number of languages and end-user programs. It does
not mean that C itself is a pitfall, just that there are pitfalls that
programmers can fall into, particularly novice programmers. This is
true of any language.

--
Al Balmer
Balmer Consulting
re************* ***********@att .net
Nov 14 '05 #52

"Dan Pop" <Da*****@cern.c h> wrote in message

The representation of a string in C is the sequence of characters, up > to and including the null terminator. No kind of pointer is involved in the representation of a C string.
This leads to the array pointer equivalence issue. Enough to say that
strings are passed around internally in C as naked char *s. [ better libraries ] Are you reading impaired or what? Which of them qualifies as
popular?
This is the crux of the issue. Every man and his dog writes a new C string
library. Part of the reason is that it's something a newbie programmer can
do, part is that it is so obvious that efficiency can be improved by passing
a length along with a pointer. However you need a standards body to make the
library stick, plus an end to the C convention that
foo("bar");
passes a char * to function foo.
There are many ways in which C needs to be extended, but adding > more string formats is not one of them. You're wasting your time trying to fix something that isn't broken.

I dunno. For my purposes C strings are perfectly adequate because all I
usually need to do is a trivial bit of text programming, maybe to get a
filename from a user or to print out a high score table. However if I was
writing a performance-critical webpage server that spits out html pages at a
great rate, then I might well want something better. A standard library of
high-perfomance string functions would then be nice.
Nov 14 '05 #53

"Arthur J. O'Dwyer" <aj*@nospam.and rew.cmu.edu> a écrit dans le message de
news:Pi******** *************** ************@un ix43.andrew.cmu .edu...

On Thu, 4 Mar 2004, jacob navia wrote:

I am writing map_string. Will take a function and return a string
built with the results of applying the function to each character.
You *do* realize this is a one-liner, right?

void mapstr(char *d, char *s, int(*f)(int))
{
while (*s)
*d++ = f(*s++);
*d = '\0';
return;
}

Possible enhancements: Allow 'mapstr(s,t,0)' as a synonym
for 'strcpy(s,t)'.


Interesting idea. The function is the identity function
by default. In the case of a copy, the function is

int fn(int c) { return c; } // identity function.

The call is optimized away of course, but it is still consistent.
Control for the possibility that f(k)
equals zero for some k!=0.
Using length prefixed strings this is not a problem.
Actually embedded zeroes can appear in a length
delimited string anywhere. They do not have any
meaning any more. Zeros are that: zeroes. There is
no "special" character.
If d is NULL, allocate and return
space for the resulting string using 'malloc' or a static buffer.


With length prefixed strings we know that the
length of the result is fixed, and allocation doesn't
imply a memory scan.
Nov 14 '05 #54

"Arthur J. O'Dwyer" <aj*@nospam.and rew.cmu.edu> a écrit dans le message de
news:Pi******** *************** ************@un ix43.andrew.cmu .edu...

Given a phone number composed of decimal digits, and a dictionary
of English words such as /usr/dict, produce a list of plausible
mnemonics for the number according to a standard telephone keypad.
E.g., given the input number "278487," the program would produce a
list including "Arthur," "2-rug-up," and "2-suits."


Sorry but i do not get it.

What algorithm you use for mapping 278487 to "Arthur" ??

It is not that evident. I mean not the code in C++ but
an english description of a way of getting "Arthur"
from 278487 and not "Dan" or "whatever" :-)

P.S. Please note that the programmer doesn't know your
phone number of course :-)
Nov 14 '05 #55


jacob navia wrote:

[ much snippage ]
The objective of this discussion is to see why the *language* doesn't
support any other schema for implementing strings.
No other scheme proved to by better in a GENERAL PURPOSE context.
As you admit yourself, the alternate libraries are designed for well
defined goals, rather than as universal replacements for the C strings.



Safety was one of the more widespread goals. I am trying to
build checked strings into lcc-win32. I think that a more
debuggable environment is easier to work with.

And the very existence of these libraries proves that the C language DOES
support alternate schemes. So, your point is moot.



The language doesn't support it.


At the risk of being in over my depth, the language
supports any string representation you may dream up.
The "standard libraries" support one particular
reprsentation. This implementation was, I guess,
chosen by the standards committee in order to
be backwards compatible with the large number
of C-programs which were in existence prior to 89.

One is free to re-create all of the standard routines using
a different convention for representing strings
(e.g. char-count in s[0], some "magic cookie" means
to flash the character on an IE browser, etc., etc.)
and put this into an implementation-specific library
using different names than the standard specifies for
the "standard" library functions. That is currently
supported by the language, is it not?

[ more snipped ]

My whole point is that data structure development should
be opened up to the C user that should be able to
specify data types that follow special rules he/she defines.


It's already there. However, by convention, almost all people
use the routines in the standard. As Dan has said, there
have been attempts to do this in the past, but not
accepted by the vast majority of C programmers,
nor ported from one implementation to the next.

[ much snipped ]



That is the start. A better string library would be an achievement.

Nothing spectacular, and very simple.



But would it be backward compatible with tons of
code already out there? This may be an
intractible problem.

--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #56
In article <40************ ****@news.indiv idual.net> rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:
=?ISO-8859-1?Q?Tor_Husab=F 8?= <to***@student. hf.uio.no> wrote:
Dik T. Winter wrote:
Eh? The "Pascal-style"? In the only Pascal I ever used (on the CDC
Cyber, the original Pascal from ETH Zuerich), a string was implemented
as a sequence of characters, and that was it. Nearly the same as in C,
except that there was no terminator.
Meaning you had to keep track of the length in a separate variable?


Meaning original Pascal programs had to be written for fixed-length
strings, and each different fixed length would need a different set of
functions. Ditto for arrays of other types. And verily, it suckéd
mightily, and there was much wailing and gnashing of teeth.


The much wailing and gnashing of teeth was largely at the fixed array
length of other types than characters (the length was part of the type).
I have seen the implementing done of a numerical library in Pascal.
(Luckily I was only partly involved.)
Forwhich
conformant array schemas were invented. And verily, it did still suck
mightily, but not nearly as mightily as before.


The version of Pascal from Minnesota University was much better than the
original, and also much better than the successor... But of course,
Pascal was written at a time when string handling was still minimal.

I have used only one language where an array was really a first-class
citizen: Algol 68, with all kinds of nifty possibilities. Like with:
'mode' 'complex' = 'struct'('real' re, im);
'complex' a[1:n, 1:m];
you could select:
a[,2].re
which would be an array reference to the real parts of the second
column of a, and you could use it anyplace where an array of reals
was needed. But I digress.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Nov 14 '05 #57

On Fri, 5 Mar 2004, jacob navia wrote:

"Arthur J. O'Dwyer" <aj*@nospam.and rew.cmu.edu> a écrit...

Given a phone number composed of decimal digits, and a dictionary
of English words such as /usr/dict, produce a list of plausible
mnemonics for the number according to a standard telephone keypad.
E.g., given the input number "278487," the program would produce a
list including "Arthur," "2-rug-up," and "2-suits."


Sorry but i do not get it.

What algorithm you use for mapping 278487 to "Arthur" ??


Hmm. I would have expected my explanation to be a little opaque
to a Russian native, perhaps, but I honestly expected the French
telephone to follow the American mold [see diagram]. Do your telephones
have any letters on the keys? Which ones?
The standard American telephone keypad:

ABC DEF
1 2 3

GHI JKL MNO
4 5 6

PRS TUV WXY
7 8 9

OPER
* 0 #

-Arthur
Nov 14 '05 #58
Ben Pfaff <bl*@cs.stanfor d.edu> wrote:
rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:
You might get away with this on systems where sizeof(size_t) ==
sizeof('\0'). IOW, an embedded device, most likely.


sizeof(size_t) == sizeof('\0') is very common. For example, it
is true on most "32-bit" systems.

Maybe you meant sizeof(size_t) == sizeof(char).


Er, yes, of course I did <smacks self>.

Richard
Nov 14 '05 #59
"jacob navia" <ja***@jacob.re mcomp.fr> wrote:
"Mike Wahler" <mk******@mkwah ler.net> a écrit dans le message de
news:ug******** ***********@new sread1.news.pas .earthlink.net. ..

As for his remarks about C, it seems he wants to put
training wheels on a Harley-Davidson racing motorcycle.


Yes. You got that 100%.

As you may know, even Harley-Davidson drivers weren't born
knowing how to drive those beasts.

They have to learn as anybody else.


Yeah. But they don't do so on a full-powered Harley, they do so on a
250cc Toyota. If you really tried to put training wheels on a Harley,
you'd be found dead the day after. In four different dumpsters. You just
be grateful you're dealing with programmers, not with Hell's Angels -
the worst you can expect here is ridicule, and the realisation that you
don't know what the blazes you're on about.

Richard
Nov 14 '05 #60

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
8668
by: Good Man | last post by:
Hi everyone I'm using the "MySQL Administrator" program to keep tabs on the health of a web system i am developing. I think it's nice to have quick (gui) feedback on the query cache, memory variables, and other status variables. I've noticed that one of the status variables, "Aborted_connects" has been increasing steadily. This is defined by MySQL as "Number of tries to connect to the MySQL server that failed". I googled around a...
6
1843
by: Matik | last post by:
Hello all, I've following problem. Please forgive me not posting script, but I think it won't help anyway. I've a table, which is quite big (over 5 milions records). Now, this table contains one field (varchar), which contains some data in the chain. Now, there is a view on this table, to present the data to user. The
92
4143
by: Dave Rudolf | last post by:
Hi all, Normally, I would trust that the ANSI libraries are written to be as efficient as possible, but I have an application in which the majority of the run time is calling the acos(...) method. So now I start to wonder how that method, and others in the math.h library, are implemented. Dave
1
2287
by: Tomás | last post by:
dynamic_cast can be used to obtain a pointer or to obtain a reference. If the pointer form fails, then you're left with a null pointer. If the reference form fails, then an exception is thrown. Would "Feed1" or "Feed2" be preferable in the following: #include <iostream>
335
11990
by: extrudedaluminiu | last post by:
Hi, Is there any group in the manner of the C++ Boost group that works on the evolution of the C language? Or is there any group that performs an equivalent function? Thanks, -vs
19
2939
by: vamshi | last post by:
Hi all, This is a question about the efficiency of the code. a :- int i; for( i = 0; i < 20; i++ ) printf("%d",i); b:- int i = 10;
9
3330
by: OldBirdman | last post by:
Efficiency I've never stumbled on any discussion of efficiency of various methods of coding, although I have found posts on various forums where individuals were concerned with efficiency. I'm not concerned when dealing with user typing, but I am if a procedure is called by a query. Does the VBA compiler generate "in-line" code for some apparent function calls? For example, y = Abs(x) might be compiled as y = x & mask. The string...
4
8277
by: Rahul B | last post by:
Hi, I was getting the error: sqlcode: -911 sqlstate: 40001 , which is "The maximum number of lock requests has been reached for the database." So i increased the locklist size to 200 from the default value of 100. I wanted to know what other effects it will have on the database? Like, will the performance reduce, if the locklist size is 200 and 120 locks are on it as compared to when the locklist size is 130 and 120
4
6362
by: =?Utf-8?B?cmFuZHkxMjAw?= | last post by:
Visual Studio 2005, C# WinForms application: Here’s the question: How can I increase the standard 1 MB stack size of the UI thread in a C# WinForms application? Here’s why I ask: I’ve inherited some code that at the view (User Interface) layer kicks off a background worker thread. At the service layer (think CAB service layer), there’s quite a lot of the following:
0
11279
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10872
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10981
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10499
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9675
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5893
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6085
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4725
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3323
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.