By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,887 Members | 1,246 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,887 IT Pros & Developers. It's quick & easy.

A string collection abstract data type

P: n/a
Abstract:

Continuing the discussion about abstract data types, in this
discussion group, a string collection data type is presented,
patterned after the collection in C# and similar languages (Java).
It stores character strings, and resizes itself to accommodate
new strings when needed.

Interface:
----------

The data structure uses a table of function pointers to provide an
extensible basis for many functions without cluttering the
user's workspace with too many names.

This architecture is not only extensible at the API level, but
it allows "subclassing". If the user is unsatisfied with some
of the functions of the API, he/she can:

o Replace one or more of the function pointers in the table
with a function of his/her own.

o Store somewhere the old function pointer, and replace it with
a function of his own that after doing some work calls the
original function pointer, optionally doing some work after the
original function returns.

This type of extensibility can only be achieved with function pointers.
The basic problem with many of the proposals presented here is that
they present too much names that can conflict with user names.

This hiding of names is useful in another way, since it allows the API
to use completely generic names like "Add" or similar without any
ambiguity.

Since those generic names can be used in *other* abstract data types
that use the same type of structure, the user code can be made truly
general, and it is easy to change from a string collection to a list
without too much trouble.

The objective would be to make a standard way of naming things within
a container, so that user code is shielded from change when passing from
one container to another.

The name of the table of functions member is "lpVtbl" following
an old function naming convention under the windows OS and in
the COM system. It can be changed of course.

The interface is described in the header file <strcollection.h>
---------------------------------------------------------------------
#include <string.h>
// Forward declaration of the string collection type
typedef struct _StringCollection StringCollection;
typedef struct {
// Returns the number of elements stored
int (*GetCount)(StringCollection *SC);

// Is this collection read only?
int (*IsReadOnly)(StringCollection *SC);

// Sets this collection read-only or unsets the read-only flag
int (*SetReadOnly)(StringCollection *SC,int flag);

// Adds one element at the end. Given string is copied
int (*Add)(StringCollection *SC,char *newval);

// Adds a NULL terminated table of strings
int (*AddRange)(StringCollection *SC,char **newvalues);

// Clears all data and frees the memory
int (*Clear)(StringCollection *SC);

//Case sensitive search of a character string in the data
int (*Contains)(StringCollection *SC,char *str);

// Copies all strings into a NULL terminated vector
char **(*CopyTo)(StringCollection *SC);

//Returns the index of the given string or -1 if not found
int (*IndexOf)(StringCollection *SC,char *SearchedString);

// Inserts a string at the position zero.
int (*Insert)(StringCollection *SC,char *);

// Inserts a string at the given position
int (*InsertAt)(StringCollection *SC,int idx,char *newval);

// Returns the string at the given position
char *(*IndexAt)(StringCollection *SC,int idx);

// Removes the given string if found
int (*Remove)(StringCollection *SC,char *);

//Removes the string at the indicated position
int (*RemoveAt)(StringCollection *SC,int idx);

// Frees the memory used by the collection
int (*Finalize)(StringCollection *SC);

// Returns the current capacity of the collection
int (*GetCapacity)(StringCollection *SC);

// Sets the capacity if there are no items in the collection
int (*SetCapacity)(StringCollection *SC,int newCapacity);

// Calls the given function for all strings.
// "Arg" is a user supplied argument (that can be NULL)
// which is passed to the function to call
void (*Apply)(StringCollection *SC,int (*Applyfn)(char *,void *
arg),void *arg);

// Calls the given function for each string and saves
// all results in an integer vector
int *(*Map)(StringCollection *SC,int (*Applyfn)(char *));

// Pushes a string, using the collection as a stack
int (*Push)(StringCollection *SC,char *str);

// Pops the last string off the collection
char * (*Pop)(StringCollection *SC);

// Replaces the character string at the given position
// with a new one
char *(*ReplaceAt)(StringCollection *SC,int idx,char *newval);

} StringCollectionFunctions;

// Definition of the String Collection type
struct _StringCollection {
StringCollectionFunctions *lpVtbl; // The table of functions
size_t count; /* in element size units */
char **contents; /* The contents of the collection */
size_t capacity; /* in element_size units */
unsigned int flags; // Read-only or other flags
};
// This is the only exported function from this module
StringCollection * newStringCollection(int startsize);

------------------------------------------end of stringcollection.h

Implementation
--------------

I haven't had the time to comment this more in depth. I hope that
this will be done in the ensuing discussion.

------------------------------------------stringcollection.c
#include "strcollection.h"
// Forward definitions
static int GetCount( struct _StringCollection *SC);
static int IsReadOnly( struct _StringCollection *SC);
static int SetReadOnly( struct _StringCollection *SC,int newval);
static int Add( struct _StringCollection *SC,char *newval);
static int AddRange( struct _StringCollection *SC,char **newvalues);
static int Clear( struct _StringCollection *SC);
static int Contains( struct _StringCollection *SC,char *str);
static char **CopyTo( struct _StringCollection *SC);
static int IndexOf( struct _StringCollection *SC,char *SearchedString);
static int Insert( struct _StringCollection *SC,char *);
static int InsertAt( struct _StringCollection *SC,int idx,char *newval);
static char *IndexAt( struct _StringCollection *SC,int idx);
static int Remove( struct _StringCollection *SC,char *);
static int RemoveAt( struct _StringCollection *SC,int idx);
static int Finalize( struct _StringCollection *SC);
static int GetCapacity( struct _StringCollection *SC);
static int SetCapacity( struct _StringCollection *SC,int newCapacity);
static void Apply(struct _StringCollection *SC,int(*Applyfn)(char *,void
*),void *arg);
static int *Map(struct _StringCollection *SC,int (*Applyfn)(char *));
static int Push(struct _StringCollection *SC,char *str);
static char *Pop(struct _StringCollection *SC);
static char *ReplaceAt(struct _StringCollection *SC,int idx,char *newval);
static StringCollectionFunctions lpVtableSC = {
GetCount, IsReadOnly, SetReadOnly, Add, AddRange,
Clear, Contains, CopyTo, IndexOf, Insert,
InsertAt, IndexAt, Remove, RemoveAt,
Finalize, GetCapacity, SetCapacity, Apply,
Map, Push, Pop, ReplaceAt,
};

static char *DuplicateString(char *str)
{
char *result;
if (str == NULL)
return NULL;
result = MALLOC(strlen(str)+1);
if (result == NULL)
return NULL;
strcpy(result,str);
return result;
}

#define SC_READONLY 1
#define CHUNKSIZE 20

StringCollection * newStringCollection(int startsize)
{
StringCollection *result = MALLOC(sizeof(StringCollection));
if (result == NULL)
return NULL;
result->count = 0;
if (startsize == 0)
startsize = DEFAULT_START_SIZE;
result->contents = MALLOC(startsize*sizeof(char *));
if (result->contents == NULL) {
FREE(result);
return NULL;
}
memset(result->contents,0,sizeof(char *)*startsize);
result->capacity = startsize;
result->count = 0;
result->flags = 0;
result->lpVtbl = &lpVtableSC;
return result;
}

static int GetCount(struct _StringCollection *SC)
{
return SC->count;
}
static int IsReadOnly(struct _StringCollection *SC)
{
return SC->flags * SC_READONLY ? 1 : 0;
}
static int SetReadOnly(struct _StringCollection *SC,int newval)
{
int oldval = SC->flags * SC_READONLY ? 1 : 0;
if (newval)
SC->flags |= SC_READONLY;
else
SC->flags &= ~SC_READONLY;
return oldval;
}

static int Resize(struct _StringCollection *SC)
{
int newcapacity = SC->capacity + CHUNKSIZE;
char **oldcontents = SC->contents;
SC->contents = MALLOC(newcapacity*sizeof(char *));
if (SC->contents == NULL) {
SC->contents = oldcontents;
return 0;
}
memset(SC->contents,0,sizeof(char *)*newcapacity);
memcpy(SC->contents,oldcontents,SC->count*sizeof(char *));
SC->capacity = newcapacity;
return 1;
}

static int Add(struct _StringCollection *SC,char *newval)
{
if (SC->flags & SC_READONLY)
return -1;
if (SC->count >= SC->capacity) {
if (!Resize(SC))
return 0;
}

if (newval) {
SC->contents[SC->count] = DuplicateString(newval);
if (SC->contents[SC->count] == NULL) {
return 0;
}
}
else
SC->contents[SC->count] = NULL;
SC->count++;
return SC->count;
}

static int AddRange(struct _StringCollection * SC,char **data)
{
int i = 0;
if (SC->flags & SC_READONLY)
return 0;
while (data[i] != NULL) {
int r = Add(SC,data[i]);
if (r <= 0)
return r;
i++;
}
return SC->count;
}

static int Clear(struct _StringCollection * SC)
{
int oldval = SC->count,i;
if (SC->flags & SC_READONLY)
return 0;
for (i=0; i<SC->count;i++) {
FREE(SC->contents[i]);
SC->contents[i] = NULL;
}
SC->count = 0;
return oldval;
}

static int Contains(struct _StringCollection * SC,char *str)
{
int c,i;
if (str == NULL)
return 0;
c = *str;
for (i=0; i<SC->count;i++) {
if (c == SC->contents[i][0] && !strcmp(SC->contents[i],str))
return 1;
}
return 0;
}

static char **CopyTo(struct _StringCollection *SC)
{
char **result = MALLOC((1+SC->count)*sizeof(char *));
int i;
if (result == NULL)
return NULL;
for (i=0; i<SC->count;i++) {
result[i] = DuplicateString(SC->contents[i]);
}
result[i] = NULL;
return result;
}

#ifdef __LCC__
/* The lcc-win compiler allows operator overloading, and allows using
this abstract data type with the more natural [ ] operators */
char * __declspec(naked) operator[](StringCollection *SC, int idx)
{
}
#endif
static int IndexOf(struct _StringCollection *SC,char *str)
{
int i;

for (i=0; i<SC->count;i++) {
if (!strcmp(SC->contents[i],str)) {
return i;
}
}
return -1;
}
static char *IndexAt(struct _StringCollection *SC,int idx)
{
if (idx >=SC->count || idx < 0)
return NULL;
return SC->contents[idx];
}

static int InsertAt(struct _StringCollection *SC,int idx,char *newval)
{
if (SC->flags & SC_READONLY)
return 0;
if (SC->count >= (SC->capacity-1)) {
if (!Resize(SC))
return 0;
}
if (idx < 0 || idx SC->count)
return -1;
if (idx == 0) {
if (SC->count 0)
memmove(SC->contents+1,SC->contents,SC->count*sizeof(char *));
SC->contents[0] = newval;
}
else if (idx == SC->count) {
SC->contents[idx] = newval;
}
else if (idx < SC->count) {

memmove(SC->contents+idx+1,SC->contents+idx,(SC->count-idx+1)*sizeof(char
*));
SC->contents[idx] = newval;
}
SC->count++;
return SC->count;
}

static int Insert(struct _StringCollection *SC,char *newval)
{
if (SC->flags & SC_READONLY)
return 0;
return InsertAt(SC,0,newval);
}

static int RemoveAt(struct _StringCollection *SC,int idx)
{
if (idx >= SC->count || idx < 0)
return -1;
if (SC->count == 0)
return -2;
FREE(SC->contents[idx]);
memmove(SC->contents+idx,SC->contents+idx+1,(SC->count-idx)*sizeof(char
*));
SC->contents[SC->count-1]=NULL;
SC->count--;
return SC->count;
}

static int Remove(struct _StringCollection *SC,char *str)
{
int idx = IndexOf(SC,str);
if (idx < 0)
return idx;
return RemoveAt(SC,idx);
}

static int Push(struct _StringCollection *SC,char *str)
{
char *r;
if (SC->flags&SC_READONLY)
return 0;
if (SC->count >= SC->capacity) {
if (!Resize(SC))
return 0;
}
r = DuplicateString(str);
if (r == NULL)
return 0;
SC->contents[SC->count++] = r;
return SC->count;
}

static char * Pop(struct _StringCollection *SC)
{
char *result;
if ((SC->flags&SC_READONLY) || SC->count == 0)
return NULL;
SC->count--;
result = SC->contents[SC->count];
SC->contents[SC->count] = NULL;
return result;
}

static int Finalize(struct _StringCollection *SC)
{
int result = SC->count,i;
for (i=0; i<SC->count;i++) {
FREE(SC->contents[i]);
}
FREE(SC->contents);
FREE(SC);
return result;
}

static int GetCapacity(struct _StringCollection *SC)
{
return SC->capacity;
}

static int SetCapacity(struct _StringCollection *SC,int newCapacity)
{
if (SC->count != 0)
return 0;
FREE(SC->contents);
SC->contents = MALLOC(newCapacity*sizeof(char *));
memset(SC->contents,0,sizeof(char *)*newCapacity);
SC->capacity = newCapacity;
return 1;
}

static void Apply(struct _StringCollection *SC,int (*Applyfn)(char
*,void *),void *arg)
{
int i;
for (i=0; i<SC->count;i++) {
Applyfn(SC->contents[i],arg);
}
}

static int *Map(struct _StringCollection *SC,int (*Applyfn)(char *))
{
int *result = MALLOC(SC->count*sizeof(int)),i;

if (result == NULL)
return NULL;
for (i=0; i<SC->count;i++) {
result[i] = Applyfn(SC->contents[i]);
}
return result;
}

#ifdef __LCC__
/* The lcc-win compiler allows operator overloading, and allows using
this abstract data type with the more natural [ ]= operators */
char * __declspec(naked) operator[]=(StringCollection *SC,int idx,char
*newval)
{
}

static char *ReplaceAt(StringCollection *SC,int idx,char *newval)
{
if (SC->flags & SC_READONLY)
return NULL;
if (idx < 0 || idx SC->count)
return NULL;
FREE(SC->contents[idx]);
SC->contents[idx] = newval;
return newval;
}
#ifdef TEST
#include <stdio.h>
static void PrintStringCollection(StringCollection *SC)
{
int i;
printf("Count %d, Capacity %d\n",SC->count,SC->capacity);
for (i=0; i<SC->count;i++) {
printf("%s\n",SC->lpVtbl->IndexAt(SC,i));
}
printf("\n");
}
int main(void)
{
StringCollection *SC = newStringCollection(10);
char *p;
SC->lpVtbl->Add(SC,"Martin");
SC->lpVtbl->Insert(SC,"Jakob");
printf("Count should be 2, is %d\n",SC->lpVtbl->GetCount(SC));
PrintStringCollection(SC);
SC->lpVtbl->InsertAt(SC,1,"Position 1");
SC->lpVtbl->InsertAt(SC,2,"Position 2");
PrintStringCollection(SC);
SC->lpVtbl->Remove(SC,"Jakob");
PrintStringCollection(SC);
SC->lpVtbl->Push(SC,"pushed");
PrintStringCollection(SC);
SC->lpVtbl->Pop(SC);
PrintStringCollection(SC);
p = SC->lpVtbl->IndexAt(SC,1);
printf("Item position 1:%s\n",p);
PrintStringCollection(SC);
}
#endif

--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
Oct 28 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Charlie Gordon wrote:
"jacob navia" <ja***@nospam.orga écrit dans le message de news:
47***********************@news.orange.fr...

<original post snipped>

Thank you for your contribution Jacob.
The code was flushed left probably because of tabs, I reformated it and
fixed a number of small and not so small issues. See below:

Problems fixed:

added API remarks
fixed missing #endif
fixed indentation and spacing
used typedef instead of struct tag in implementation
added const on unmodified string parameters
#include <stddef.hinstead of <string.h>
fixed some bogus bit tests: SC->flags * SC_READONLY ?
added definition for DEFAULT_START_SIZE, allowed it to be 0
removed redundant initialization of result->count
used safer MALLOC idiom (à la c.l.c)
fixed off by one error in ReplaceAt
fixed memory leak in Resize
fixed crash on NULL strings in Contains and IndexOf
used same semantics and method in IndexOf and Contains and shared code
fixed two off by one errors in InsertAt
simplified InsertAt
removed test for condition never reached in RemoveAt
fixed off by one bug in RemoveAt
make SetCapacity work for a non empty collection
added test for MALLOC failure in SetCapacity
duplicate string in Insert, InsertAt, ReplaceAt; handle NULL and failure
allow ReplaceAt with an index of SC->count
used int instead of size_t for count and size for consistency with API.
improved PrintStringCollection and test main output
added missing return 0 in main
Thanks a lot!

The original version of this code is written for
my compiler system and distributed with the source code
in the lcc-win distribution.

I eliminated the lcc-win specific parts in this code.
When doing this, I may have introduced some of the bugs
above. Specifically the replacing of references with pointers
intoruced the bug replacing the "&" in an AND operation
with a multiplication sign!

When I comment about it then, I will refer to this version.

[snip code]

--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
Oct 29 '07 #2

P: n/a
On Sun, 28 Oct 2007 20:52:11 +0100, jacob navia wrote:
>Abstract:
Continuing the discussion about abstract data types, in this
discussion group, a string collection data type is presented,
patterned after the collection in C# and similar languages (Java).
It stores character strings, and resizes itself to accommodate
new strings when needed.
A collection for strings certainly can be useful in many cases. I have
two major objection to the current design:

1. The function pointers are unnecessary. I can hardly imagine anyone
wants to override/replace some of the container functions. OTOH, the
function pointers make usage clumsy. One has to write:
SC->lpVtbl->Remove(SC,"Jakob");
instead of just:
Remove(SC,"Jakob");

Actually, I don't see why the vtable is exposed to the user at all.
Even if you insist on function pointers you can implement something
like:

static
int Remove (StringCollection *SC, const char *str) {
return SC->lpVtbl->Remove(SC, str);
}

2. Memory management:
Strings in the container are managed by the container (via
Finalize()). But in some cases the returned strings must be freed by
the caller. A clear and unambiguous strategy would prevent memory
leaks.
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch
Oct 29 '07 #3

P: n/a
Roland Pibinger wrote:
On Sun, 28 Oct 2007 20:52:11 +0100, jacob navia wrote:
>Abstract:
Continuing the discussion about abstract data types, in this
discussion group, a string collection data type is presented,
patterned after the collection in C# and similar languages (Java).
It stores character strings, and resizes itself to accommodate
new strings when needed.

A collection for strings certainly can be useful in many cases. I have
two major objection to the current design:

1. The function pointers are unnecessary. I can hardly imagine anyone
wants to override/replace some of the container functions. OTOH, the
function pointers make usage clumsy. One has to write:
SC->lpVtbl->Remove(SC,"Jakob");
instead of just:
Remove(SC,"Jakob");
This is certainly possible woth some macros.

However, the key idea of the design is precisely to give several
containers the SAME interface, so that if you decide to
replace a string collection with a list, you can still have

SC->lpVtbl->Remove(SC,"Jacob");

without changing a single line from your user code besides a few
declarations.

An added advantage is that it could very well exist some "Remove"
function in a LOT of user code, so that we would be forced to
use SOME kind of identifier prefix like

CLC_LIST_Remove

or similar anyway.

Those macros are so trivial to write that they could be proposed but
not be a forced part of the interface.
Actually, I don't see why the vtable is exposed to the user at all.
Even if you insist on function pointers you can implement something
like:

static
int Remove (StringCollection *SC, const char *str) {
return SC->lpVtbl->Remove(SC, str);
}

The advantages of exposing the vtable is to give the user the freedom to
change to behavior of the containers as he/she wishes WITHOUT changing
a single line in the user code.

For instance if you want to store only unique strings in the table,
and keep it sorted at all times, you just change the table of
functions and add the new functionality WITHOUT any change
to all your code at all!!

Besides, this gives the user the possibility of changing the behavior
of only ONE instance or changing the behavior of ALL instances
without any limitations with a VERY fast pointer assignment operation!

>
2. Memory management:
Strings in the container are managed by the container (via
Finalize()). But in some cases the returned strings must be freed by
the caller. A clear and unambiguous strategy would prevent memory
leaks.

This is true, but I do not see any easy way out. When I return a string
should I allocate a new copy and free the one in the container?

But maybe there is a better solution. You have a valid point.
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
Oct 29 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.