473,388 Members | 1,335 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

String Buffers?

Hello Newsgroup:

I'm not a C programmer, though I've dabbled on and off through the years. (wish
I could justify doing more in C/C++ because it is enjoyable, just highly time
consuming compared to other languages)

Mostly when I use 'C' it's just to fork a process set-ID or send a signal to
another process, etc... real basic stuff. I'm not seasoned in it and still
find the syntax of structs and arrays confusing.

Anyhow, I was surprised my books don't discuss the area of basic, mundane
string/append buffers.

Classic example being something that accumulates text (like an expat
handler in perl might need to do)

I wrote a linked list / string buffer as a pet project, mainly because
I would like to get better at C, in case I ever have the opportunity
to use C.

It's linked list is in "blocks", tracks the TAIL of the list rather then the
head to make appends faster, and it doesn't need to call strlen() on the whole
string each time an append is required (it only needs strlen() to get the length
of the new part)

For review/manglement/flames and entanglements:

http://geniegate.com/other/slbuf/

These compile on a UNIX machine. Using the 'time' command, under *ideal*
conditions, (larger strings with many many appends and a decent sized block
size) a combination linked list with a stringbuffer APPEARS to run about 10
times faster than realloc + strcat. (but, in less then ideal conditions, it can
end up running slower, low block sizes really slow it down)

The "sample program" is a utility that slurps stdin into a stringbuffer translating
[<>&"] into their respective HTML entities and spits out a crude HTML document
of the text. Not exactly a useful program (would of course be best to spit it
out in chunks) but it does illustrate using the stringbuffer in a "real" program.
(don't run it on a HUGE file, it slurps all of stdin)

On an old box, 100,000 iterations of 50+ appends takes about 2-3 seconds -vs-
around 13 seconds for the realloc + strcat approach.

I'm surprised my books don't talk about stringbuffers (Or, perhaps
I wasn't looking in the right places?) Seems like a really common operation.

Am I missing something obvious?

In practice, are linked lists better at dealing with memory fragmentation,
or is realloc best? (how do you know when there is fragmentation?)

Jamie
--
http://www.geniegate.com Custom web programming
gu******@lnubb.pbz (rot13) User Management Solutions
May 27 '06 #1
5 2728
Jamie wrote:
Hello Newsgroup:

I'm not a C programmer, though I've dabbled on and off through the years. (wish
I could justify doing more in C/C++ because it is enjoyable, just highly time
consuming compared to other languages)

Mostly when I use 'C' it's just to fork a process set-ID or send a signal to
another process, etc... real basic stuff. I'm not seasoned in it and still
find the syntax of structs and arrays confusing.

Anyhow, I was surprised my books don't discuss the area of basic, mundane
string/append buffers.

Classic example being something that accumulates text (like an expat
handler in perl might need to do)

I wrote a linked list / string buffer as a pet project, mainly because
I would like to get better at C, in case I ever have the opportunity
to use C.

It's linked list is in "blocks", tracks the TAIL of the list rather then the
head to make appends faster, and it doesn't need to call strlen() on the whole
string each time an append is required (it only needs strlen() to get the length
of the new part)

For review/manglement/flames and entanglements:

http://geniegate.com/other/slbuf/

These compile on a UNIX machine. Using the 'time' command, under *ideal*
conditions, (larger strings with many many appends and a decent sized block
size) a combination linked list with a stringbuffer APPEARS to run about 10
times faster than realloc + strcat. (but, in less then ideal conditions, it can
end up running slower, low block sizes really slow it down)

The "sample program" is a utility that slurps stdin into a stringbuffer translating
[<>&"] into their respective HTML entities and spits out a crude HTML document
of the text. Not exactly a useful program (would of course be best to spit it
out in chunks) but it does illustrate using the stringbuffer in a "real" program.
(don't run it on a HUGE file, it slurps all of stdin)

On an old box, 100,000 iterations of 50+ appends takes about 2-3 seconds -vs-
around 13 seconds for the realloc + strcat approach.

I'm surprised my books don't talk about stringbuffers (Or, perhaps
I wasn't looking in the right places?) Seems like a really common operation.

Am I missing something obvious?

In practice, are linked lists better at dealing with memory fragmentation,
or is realloc best? (how do you know when there is fragmentation?)

I would start by reading the thread "Critic of the Given Code", started
yesterday. People in the know are always going back and forth about
realloc around here. frank
May 27 '06 #2
no****@geniegate.com (Jamie) wrote:

# size) a combination linked list with a stringbuffer APPEARS to run about 10
# times faster than realloc + strcat. (but, in less then ideal conditions, it can

Na•ve use of strcat can increase runtime by an order of magnitude. Also you
want to reallocate the string but a constant factor instead of making it
just long enough each time. Something like

int m = 0,n = 0; char *s = 0;
char *appendstring(char *t) {
int l = strlen(t);
if (n+l+1>==m) {m = 2*(n+l+1); s = realloc(s,m);
strcpy(s+n,t); n += l;
return s;
}

--
SM Ryan http://www.rawbw.com/~wyrmwif/
Title does not dictate behaviour.
May 28 '06 #3
In <12*************@corp.supernews.com>,
SM Ryan <wy*****@tango-sierra-oscar-foxtrot-tango.fake.org> mentions:
# size) a combination linked list with a stringbuffer APPEARS to run about 10
# times faster than realloc + strcat. (but, in less then ideal conditions, it can

Na•ve use of strcat can increase runtime by an order of magnitude. Also you
want to reallocate the string but a constant factor instead of making it
just long enough each time. Something like

int m = 0,n = 0; char *s = 0;
char *appendstring(char *t) {
int l = strlen(t);
if (n+l+1>==m) {m = 2*(n+l+1); s = realloc(s,m);
strcpy(s+n,t); n += l;
return s;
}


Hmm... I like your way better. :-) Totally eliminates the linked list.

Only thing is, it appears to need global variables.

Wonder how a combo would work, something like:

StringBuf *sb = mythical_create_sbuf(size_t bsz, const char *initial_value);
mythical_append(sb,"String to append");

Where sb is a pointer to a struct with the state information, bsz is a
setting used for realloc and initial size.

Also, your approach provides ready access to the string itself, no need for
a "pack" operation or a separate copy. Very nice!

Next time I poke around with it, I think I'll try yours and check to see which
are faster.

I do wonder which is better for memory fragmentation. Is it best to malloc
several small(er) chunks and string together or realloc. Your approach takes
less space. (in both data size AND code size)
Jamie
--
http://www.geniegate.com Custom web programming
gu******@lnubb.pbz (rot13) User Management Solutions
May 28 '06 #4
no****@geniegate.com (Jamie) wrote:

# Only thing is, it appears to need global variables.
#
# Wonder how a combo would work, something like:
#
# StringBuf *sb = mythical_create_sbuf(size_t bsz, const char *initial_value);
# mythical_append(sb,"String to append");

It's been implemented that way by many different people.

# I do wonder which is better for memory fragmentation. Is it best to malloc
# several small(er) chunks and string together or realloc. Your approach takes
# less space. (in both data size AND code size)

The one problem is that it can overallocate space by a very large number
and end up wasting lots of space. This is less of problem when you have
virtual memory around, but on smaller systems it could blow out the heap
for pretty modest strings.

The point of doubling the size each time is you get exponential increase
in size (and cost of copying the string on realloc), but the number of
reallocations drops off logarithmically. The overall effect is linear
time and space, and a linear waste of space. Chained string segments
is also linear, but then you have the reassembly problem for libraries
that expect the string as a contiguous array.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
No pleasure, no rapture, no exquisite sin greater than central air.
May 28 '06 #5
"Jamie" <no****@geniegate.com> wrote

Also, your approach provides ready access to the string itself, no need
for
a "pack" operation or a separate copy. Very nice!
I do wonder which is better for memory fragmentation. Is it best to malloc
several small(er) chunks and string together or realloc. Your approach
takes
less space. (in both data size AND code size)

If you've got to worry about memory usage, use the small chunk approach.
If you are running on a modern PC and the string is less than a million
characters, or a short novel, then the realloc() approach with a single
block will be fine.
--
Buy my book 12 Common Atheist Arguments (refuted)
$1.25 download or $7.20 paper, available www.lulu.com/bgy1mm

May 28 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: manders2k | last post by:
Hi all -- I'm contemplating the idea of writing a simple emacs-like editor in python (for fun and the experience of doing so). In reading through Craig Finseth's "The Craft of Text Editing": ...
9
by: AMT2K5 | last post by:
Hello, how would I go about breaking up a string that is returned by a function. After I do that, I will strcpy that data to a class data member . I have the following functions void...
1
by: Daniel | last post by:
Is there any way for System.IO.StreamWriter Write method to write out part of the string to file. such as if the machine is shut down half way through? or does the file not actually exist until the...
8
by: Duncan Winn | last post by:
I am new to VC++7. I am using a method GetPrivateProfileString that requires an LPTSTR. I have defined this as a: char * data_name; I am then trying to convert this to an LPOLESTR and I...
9
by: rsine | last post by:
I have developed a program that sends a command through the serial port to our business system and then reads from the buffer looking for a number. Everything worked great on my WinXP system, but...
4
by: Scott Lemen | last post by:
Hi, Some Win APIs expect a structure with a fixed length string. How is it defined in VB .Net 2003? When I try to use the FixedLengthString class I get an "Array bounds cannot appear in type...
5
by: psimakov | last post by:
I just read a great article by Dave Johnson on comparative performance of XML and JSON parsing. It is a very important for anyone doing AJAX. But the parsing is not the only place where CPU can...
30
by: nano2k | last post by:
Hi I need an efficient method to convert a string object to it's byte equivalent. I know there are LOTS of methods, but they lack in efficiency. All methods allocate new memory to create the byte...
232
by: robert maas, see http://tinyurl.com/uh3t | last post by:
I'm working on examples of programming in several languages, all (except PHP) running under CGI so that I can show both the source files and the actually running of the examples online. The first...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.