splitting a string and put it into an array

Kai Jaensch

Hello,

i am an newbie and i have to to solve this problem as fast as i can. But
at this time i don´t have a lot of success.
Can anybody help me (and understand my english :-))?

I have a .txt-file in which the data is structured in that way:
Project-Nr. ID name lastname
33 9 Lars Lundel
33 12 Emil Korla
34 19 Lara Keuler
33 13 Thorsten Lammert

These data have to be read out row by row.
Every row has to be splitted (delimiter is TAB) and has to be saved in
an two-dimensional array.

The background is that in the next step i have to search for an ID in
that array and after that this part of an array is to be used, that has
the content of data (of the row) which belongs to the ID.
The data are used later in the program.

I tried different ways, but without success.
And here is the code i´ve written til now just to open the file and to
show the data:

FILE *importFile;
char row[100] = {0};
char test_array[100];
int anzahl=0;
importFile = fopen("datei.txt","r");
if(importFile == NULL) printf("Datei geschlossen.\n");
else printf("Datei offen.\n");

while(fgets(row, sizeof row, importFile) != NULL)
{
puts(row);
anzahl++;
}
if(EOF) printf("%d\n", anzahl);
fclose(importFile);
I hope, anybody can help me.
Thanx a lot.

Kai Jaensch

Nov 14 '05 #1

Subscribe Post Reply

4432

lallous

"Kai Jaensch" <ka*********@gmx.de> wrote in message
news:bu************@ID-189932.news.uni-berlin.de...

Hello,

i am an newbie and i have to to solve this problem as fast as i can. But
at this time i don´t have a lot of success.
Can anybody help me (and understand my english :-))?

I have a .txt-file in which the data is structured in that way:
Project-Nr. ID name lastname
33 9 Lars Lundel
33 12 Emil Korla
34 19 Lara Keuler
33 13 Thorsten Lammert

These data have to be read out row by row.
Every row has to be splitted (delimiter is TAB) and has to be saved in
an two-dimensional array.

The background is that in the next step i have to search for an ID in
that array and after that this part of an array is to be used, that has
the content of data (of the row) which belongs to the ID.
The data are used later in the program.

I tried different ways, but without success.
And here is the code i´ve written til now just to open the file and to
show the data:

FILE *importFile;
char row[100] = {0};
char test_array[100];
int anzahl=0;
importFile = fopen("datei.txt","r");
if(importFile == NULL) printf("Datei geschlossen.\n");
else printf("Datei offen.\n");

while(fgets(row, sizeof row, importFile) != NULL)
{
puts(row);
anzahl++;
}
if(EOF) printf("%d\n", anzahl);
fclose(importFile);
I hope, anybody can help me.
Thanx a lot.

Kai Jaensch

Hello,

Since you're posting in C++, it is recommended that you use: file streams
instead of C file I/O.
Also use std::map to create a 2D array that will associate the project
number/Id with a user name

--
Elias

Nov 14 '05 #2

Sean Kenwrick

"lallous" <la*****@lgwm.org> wrote in message
news:bu************@ID-161723.news.uni-berlin.de...

"Kai Jaensch" <ka*********@gmx.de> wrote in message
news:bu************@ID-189932.news.uni-berlin.de...
Hello,

i am an newbie and i have to to solve this problem as fast as i can. But
at this time i don´t have a lot of success.
Can anybody help me (and understand my english :-))?

I have a .txt-file in which the data is structured in that way:
Project-Nr. ID name lastname
33 9 Lars Lundel
33 12 Emil Korla
34 19 Lara Keuler
33 13 Thorsten Lammert

These data have to be read out row by row.
Every row has to be splitted (delimiter is TAB) and has to be saved in
an two-dimensional array.

The background is that in the next step i have to search for an ID in
that array and after that this part of an array is to be used, that has
the content of data (of the row) which belongs to the ID.
The data are used later in the program.

I tried different ways, but without success.
And here is the code i´ve written til now just to open the file and to
show the data:

FILE *importFile;
char row[100] = {0};
char test_array[100];
int anzahl=0;
importFile = fopen("datei.txt","r");
if(importFile == NULL) printf("Datei geschlossen.\n");
else printf("Datei offen.\n");

while(fgets(row, sizeof row, importFile) != NULL)
{
puts(row);
anzahl++;
}
if(EOF) printf("%d\n", anzahl);
fclose(importFile);
I hope, anybody can help me.
Thanx a lot.

Kai Jaensch

Hello,

Since you're posting in C++, it is recommended that you use: file streams
instead of C file I/O.
Also use std::map to create a 2D array that will associate the project
number/Id with a user name

--
Elias

Which part of his code is C++?

Nov 14 '05 #3

Sean Kenwrick

"Sean Kenwrick" <sk*******@hotmail.com> wrote in message
news:bu**********@sparta.btinternet.com...

"lallous" <la*****@lgwm.org> wrote in message
news:bu************@ID-161723.news.uni-berlin.de...
"Kai Jaensch" <ka*********@gmx.de> wrote in message
news:bu************@ID-189932.news.uni-berlin.de...
Hello,

i am an newbie and i have to to solve this problem as fast as i can. But at this time i don´t have a lot of success.
Can anybody help me (and understand my english :-))?

I have a .txt-file in which the data is structured in that way:
Project-Nr. ID name lastname
33 9 Lars Lundel
33 12 Emil Korla
34 19 Lara Keuler
33 13 Thorsten Lammert

These data have to be read out row by row.
Every row has to be splitted (delimiter is TAB) and has to be saved in
an two-dimensional array.

The background is that in the next step i have to search for an ID in
that array and after that this part of an array is to be used, that has the content of data (of the row) which belongs to the ID.
The data are used later in the program.

I tried different ways, but without success.
And here is the code i´ve written til now just to open the file and to
show the data:

FILE *importFile;
char row[100] = {0};
char test_array[100];
int anzahl=0;
importFile = fopen("datei.txt","r");
if(importFile == NULL) printf("Datei geschlossen.\n");
else printf("Datei offen.\n");

while(fgets(row, sizeof row, importFile) != NULL)
{
puts(row);
anzahl++;
}
if(EOF) printf("%d\n", anzahl);
fclose(importFile);
I hope, anybody can help me.
Thanx a lot.

Kai Jaensch

Hello,

Since you're posting in C++, it is recommended that you use: file streams instead of C file I/O.
Also use std::map to create a 2D array that will associate the project
number/Id with a user name

--
Elias

Which part of his code is C++?

Sorry I see now that your reply was because he cross posted to comp.lang.c++

Nov 14 '05 #4

Dario

Sean Kenwrick wrote:

Which part of his code is C++?

Every part that can be successfully compiled by a C++ compiler.

- Dario

Nov 14 '05 #5

Al Bowers

Kai Jaensch wrote:

Hello,

i am an newbie and i have to to solve this problem as fast as i can. But
at this time i don´t have a lot of success.
Can anybody help me (and understand my english :-))?

I have a .txt-file in which the data is structured in that way:
Project-Nr. ID name lastname
33 9 Lars Lundel
33 12 Emil Korla
34 19 Lara Keuler
33 13 Thorsten Lammert

These data have to be read out row by row.
Every row has to be splitted (delimiter is TAB) and has to be saved in
an two-dimensional array.

The background is that in the next step i have to search for an ID in
that array and after that this part of an array is to be used, that has
the content of data (of the row) which belongs to the ID.
The data are used later in the program.

I tried different ways, but without success.
And here is the code i´ve written til now just to open the file and to
show the data:

FILE *importFile;
char row[100] = {0};
char test_array[100];
int anzahl=0;
importFile = fopen("datei.txt","r");
if(importFile == NULL) printf("Datei geschlossen.\n");
else printf("Datei offen.\n");

while(fgets(row, sizeof row, importFile) != NULL)
{
puts(row);
anzahl++;
}
if(EOF) printf("%d\n", anzahl);
fclose(importFile);

IF your requirement is to put the line data in a two-d array
then change the variable row to such an array like:
char row[25][200];

Also, you have a logic error if there is a failure to open the
file.

Correcting these errors the code should be looking somewhat like
this:

#include <stdio.h>

#define MAXLN 25 /* max lines in the file */
#define MAXLNLEN 200 /* max line length in the file */

int main(void)
{
FILE *importFile;
char row[MAXLN][MAXLNLEN] ;
int i, anzahl = 0;
importFile = fopen("datei.txt","r");
if(importFile == NULL) printf("Datei geschlossen.\n");
else
{
printf("Datei offen.\n");
for( ; anzahl < MAXLN &&fgets(row[anzahl],
sizeof row[anzahl], importFile) != NULL; anzahl++) ;
fclose(importFile);
}
/* print results */
for( i = 0; i < anzahl ;i++)
printf(row[i]);
putchar('\n');
return 0;
}
--
Al Bowers
Tampa, Fl USA
mailto: xa******@myrapidsys.com (remove the x to send email)
http://www.geocities.com/abowers822/

Nov 14 '05 #6

Chris Theis

"Dario" <da***@despammed.com> wrote in message
news:bu**********@grillo.cs.interbusiness.it...

Sean Kenwrick wrote:
Which part of his code is C++?

Every part that can be successfully compiled by a C++ compiler.

- Dario

Well, that's a bold statement :-)

Chris

Nov 14 '05 #7

Lyn Powell

"Kai Jaensch" <ka*********@gmx.de> wrote

I have a .txt-file in which the data is structured in that way:
Project-Nr. ID name lastname
33 9 Lars Lundel
33 12 Emil Korla
34 19 Lara Keuler
33 13 Thorsten Lammert

These data have to be read out row by row.
Every row has to be splitted (delimiter is TAB) and has to be saved in
an two-dimensional array.

The background is that in the next step i have to search for an ID in
that array and after that this part of an array is to be used, that has
the content of data (of the row) which belongs to the ID.
The data are used later in the program.

There are a few ways you can do this, but because you don't have to do any
numeric conversions, strtok would be one simple way:

for (line = 0; fgets(buff, sizeof buff, fp) != NULL; line++) {
char *t = strtok(line, "\t");
for (i = 0; t != NULL && i < 3; i++) {
strcpy(array[line][i], t);
strtok(NULL, "\t");
}
}

Of course, A better way IMO would be to use an array of structures:

struct record {
int project;
int ID;
char *name;
};

....

for (line = 0; fgets(buff, sizeof buff, fp) != NULL; line++) {
char *endp, buffp;
array[line].project = (int)strtol(buffp, &endp, 0);
buffp = endp;
array[line].ID = (int)strtol(buffp, &endp, 0);
/* You didn't specify how many tabs are allowed */
while (*endp == '\t')
endp++;
array[line].name = malloc(strlen(endp) + 1);
strcpy(array[line].name, endp);
}

Nov 14 '05 #8

Mark McIntyre

On Wed, 14 Jan 2004 11:15:50 +0100, in comp.lang.c , Dario
<da***@despammed.com> wrote:

Sean Kenwrick wrote:
Which part of his code is C++?

Every part that can be successfully compiled by a C++ compiler.

Thats so obviously wrong, it scarcely merits comment.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nov 14 '05 #9

Sidney Cadot

Mark McIntyre wrote:

On Wed, 14 Jan 2004 11:15:50 +0100, in comp.lang.c , Dario
<da***@despammed.com> wrote:

Sean Kenwrick wrote:

Which part of his code is C++?

Every part that can be successfully compiled by a C++ compiler.

Thats so obviously wrong, it scarcely merits comment.

How's that?

Assume a (correct) C++ compiler. Consider any text. Assume that this
text, when fed to the aforementioned C++ compiler, leads to "succesful
compilation". Why is it wrong to conclude that the text is C++ code?
Could you provide a counter-example, or explain where my reasoning fails?

Best regards,

Sidney

Nov 14 '05 #10

Adam Fineman

Kai Jaensch wrote:

Hello,

i am an newbie and i have to to solve this problem as fast as i can. But
at this time i don´t have a lot of success.
Can anybody help me (and understand my english :-))?

I have a .txt-file in which the data is structured in that way:
Project-Nr. ID name lastname
33 9 Lars Lundel
33 12 Emil Korla
34 19 Lara Keuler
33 13 Thorsten Lammert

These data have to be read out row by row.
Every row has to be splitted (delimiter is TAB) and has to be saved in
an two-dimensional array.

The background is that in the next step i have to search for an ID in
that array and after that this part of an array is to be used, that has
the content of data (of the row) which belongs to the ID.
The data are used later in the program.

I tried different ways, but without success.
And here is the code i´ve written til now just to open the file and to
show the data:

FILE *importFile;
char row[100] = {0};
char test_array[100];
int anzahl=0;
importFile = fopen("datei.txt","r");
if(importFile == NULL) printf("Datei geschlossen.\n");
else printf("Datei offen.\n");

while(fgets(row, sizeof row, importFile) != NULL)
{
puts(row);
anzahl++;
}
if(EOF) printf("%d\n", anzahl);
fclose(importFile);
I hope, anybody can help me.
Thanx a lot.

Kai Jaensch

#include <vector>
#include <string>
#include <iterator>
#include <iostream>
#include <algorithm>

using namespace std;

class row_t
{
int project_num;
int id;
string first_name;
string last_name;

public:
friend istream& operator>>(istream& is, row_t& r);
friend ostream& operator<<(ostream& os, const row_t& r);
};

istream&
operator>>(istream& is, row_t& r)
{
is >> r.project_num >> r.id >> r.first_name >> r.last_name;

return is;
}

ostream&
operator<<(ostream& os, const row_t& r)
{
os << r.project_num << '\t'
<< r.id << '\t'
<< r.first_name << '\t'
<< r.last_name;

return os;
}
int
main()
{
vector<row_t> array;

copy(istream_iterator<row_t>(cin), istream_iterator<row_t>(),
back_inserter(array));

copy(array.begin(), array.end(),
ostream_iterator<row_t>(cout, "\n"));

return 0;
}
--
Reverse domain name to reply.

Nov 14 '05 #11

Jack Klein

On Thu, 15 Jan 2004 00:20:08 +0100, Sidney Cadot <si****@jigsaw.nl>
wrote in comp.lang.c:

Mark McIntyre wrote:
On Wed, 14 Jan 2004 11:15:50 +0100, in comp.lang.c , Dario
<da***@despammed.com> wrote:

Sean Kenwrick wrote:
Which part of his code is C++?

Every part that can be successfully compiled by a C++ compiler.

Thats so obviously wrong, it scarcely merits comment.

How's that?

Assume a (correct) C++ compiler. Consider any text. Assume that this
text, when fed to the aforementioned C++ compiler, leads to "succesful
compilation". Why is it wrong to conclude that the text is C++ code?
Could you provide a counter-example, or explain where my reasoning fails?

Best regards,

Sidney

The key is the phrase "successful compilation", which means no
required diagnostics produced by the compiler and an executable
output. This does not guarantee "equivalent output" when executed.

Even something so simple as:

#include <stdio.h>

int main(void)
{
printf("size of character literal '!' is %d bytes\n",
(int)sizeof('!'));
return 0;
}

Will compile successfully on any conforming C or C++ compiler. The
output will be different on most, but not all, implementations, even
if compiled alternately as C and C++ with an implementation that
supports both languages.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Nov 14 '05 #12

Sidney Cadot

Jack Klein wrote:

On Thu, 15 Jan 2004 00:20:08 +0100, Sidney Cadot <si****@jigsaw.nl>
wrote in comp.lang.c:

Mark McIntyre wrote:

On Wed, 14 Jan 2004 11:15:50 +0100, in comp.lang.c , Dario
<da***@despammed.com> wrote:

Sean Kenwrick wrote:

>Which part of his code is C++?

Every part that can be successfully compiled by a C++ compiler.

Thats so obviously wrong, it scarcely merits comment.
How's that?

Assume a (correct) C++ compiler. Consider any text. Assume that this
text, when fed to the aforementioned C++ compiler, leads to "succesful
compilation". Why is it wrong to conclude that the text is C++ code?
Could you provide a counter-example, or explain where my reasoning fails?

Best regards,

Sidney

The key is the phrase "successful compilation", which means no
required diagnostics produced by the compiler and an executable
output. This does not guarantee "equivalent output" when executed.

This is the first mention of C/C++ equivalence here. That was not the
issue as far as I can tell.
Even something so simple as:

#include <stdio.h>

int main(void)
{
printf("size of character literal '!' is %d bytes\n",
(int)sizeof('!'));
return 0;
}

Will compile successfully on any conforming C or C++ compiler. The
output will be different on most, but not all, implementations, even
if compiled alternately as C and C++ with an implementation that
supports both languages.

All true. I'm just saying that your example, apart from being a valid C
program, is also a valid C++ program. The fact that they do different
things is immaterial to the discussion.

Best regards,

Sidney

Nov 14 '05 #13

Daniel Haude

["Followup-To:" header set to comp.lang.c.]
On 15 Jan 2004 03:58:59 -0800,
Paul Hsieh <qe*@pobox.com> wrote
in Msg. <79**************************@posting.google.com >

The C library has a totally worthless complement of string parsing
functions.

As far as I'm aware of it only has one function that deserves the name
"parsing function", which is sscanf().

Anyway, it's simple to roll your own. For delimiter-separated text data I
always use this function:

char **dh_splitstring(char *line, int maxfields, int sep)
{
char *a, *e, *s;
int i;
char **arr;

if (NULL == (arr = malloc((maxfields+1) * sizeof *arr))) return NULL;

e = line;
for (i = 0; i < maxfields; i++) {
/* skip leading WS */
for (s = e; *s && isspace(*s) && *s != sep; s++) ;
if (!*s) break;
for (a = s; *a && *a != sep; a++) ;
e = *a == sep ? a+1 : a;
/* chop off trailing WS */
for (a--; isspace(*a) && a > s; a--) ;
if (i < maxfields-1) *++a = 0;
arr[i] = s;
}
for ( ; i < maxfields+1; i++) {
arr[i] = NULL;
}
return arr;
}
line: The string to be parsed
maxfields: max number of fields
sep: field separator char (i.e., '\t')
Returns: A newly-allocated, NULL-terminated array of pointers to strings
with maxfields+1 elements. All fields have leading and trailing whitespace
removed.

Warning: Modifies line. If this is not what you want, pass a copy.

The returned pointers point into different positions of 'line', so 'line'
must not be freed or modified as long as you want to use the result.

Multiple separator tokens are not lumped together, but result in empty
tokens ("\0"). This makes using tab-separated data files dangerous because
there may be different numbers of tabs between columns. I always use
semicolons.

--Daniel

--
"With me is nothing wrong! And with you?" (from r.a.m.p)

Nov 14 '05 #14

Default User

Paul Hsieh wrote:

The C library has a totally worthless complement of string parsing
functions.

Oh yes. It's completely impossible to create any application using the
string-parsing utilities in C.

Unless you are a competent programmer.

You know, there are *other* things besides strtok() available. It's
really not that hard to use strchr() and your own state machine for
parsing if strtok() doesn't fit your needs.

Don't get wrong, std::string has a lot of nice features, but to poo-poo
the C string capabilities portrays either a lack of experience, or lack
of skill.

Brian Rodenborn

Nov 14 '05 #15

Peter Nilsson

Sidney Cadot <si****@jigsaw.nl> wrote in message news:<bu**********@news.tudelft.nl>...

Mark McIntyre wrote:
On Wed, 14 Jan 2004 11:15:50 +0100, in comp.lang.c , Dario
<da***@despammed.com> wrote:

Sean Kenwrick wrote:
Which part of his code is C++?

Every part that can be successfully compiled by a C++ compiler.

Thats so obviously wrong, it scarcely merits comment.

How's that?

Assume a (correct) C++ compiler. Consider any text. Assume that this
text, when fed to the aforementioned C++ compiler, leads to "succesful
compilation". Why is it wrong to conclude that the text is C++ code?

Nothing, so long as you're willing to accept binary streams of random
noise as valid C++ code.

I don't know about C++, but the only C construct where a translation
must fail is the #error directive. [And Dan Pop has argued that this
is still compilable since, in such circumstances, it can be considered
'unsuccessful' translation.]

--
Peter

Nov 14 '05 #16

Jeremy Yallop

Peter Nilsson wrote:

I don't know about C++, but the only C construct where a translation
must fail is the #error directive.

In C89 an #error directive causes the issuance of a diagnostic
message; it isn't required to cause translation failure. In fact, a
conforming implementation isn't even generally /allowed/ to stop
translation on encountering #error, although there are a couple of
loopholes that can be used to justify doing so. There is at least one
implementation where an #error directive doesn't cause translation to
fail.

In C99 the implementation is not allowed to ("successfully") translate
programs containing active #error directives.

Jeremy.

Nov 14 '05 #17

Sidney Cadot

Peter Nilsson wrote:

Sidney Cadot <si****@jigsaw.nl> wrote in message news:<bu**********@news.tudelft.nl>...
Mark McIntyre wrote:

On Wed, 14 Jan 2004 11:15:50 +0100, in comp.lang.c , Dario
<da***@despammed.com> wrote:

Sean Kenwrick wrote:

>Which part of his code is C++?

Every part that can be successfully compiled by a C++ compiler.

Thats so obviously wrong, it scarcely merits comment.
How's that?

Assume a (correct) C++ compiler. Consider any text. Assume that this
text, when fed to the aforementioned C++ compiler, leads to "succesful
compilation". Why is it wrong to conclude that the text is C++ code?

Nothing, so long as you're willing to accept binary streams of random
noise as valid C++ code.

Hang on, I've lost you there... Here's a bunch of random characters:

!&^fgfctq9786h0%(*&%h2owyp[1238n

Are you saying that a correct C++ compiler may compile this succesfully?
Same question for a correct C compiler?
I don't know about C++, but the only C construct where a translation
must fail is the #error directive. [And Dan Pop has argued that this
is still compilable since, in such circumstances, it can be considered
'unsuccessful' translation.]

Ok, under the assumption that feeding

!&^fgfctq9786h0%(*&%h2owyp[1238n

....to a correct C compiler does not have to fail, what /does/ the
standard have to say on what can be expected of a compiler on such input?
Best regards,

Sidney

Nov 14 '05 #18

Jeremy Yallop

Sidney Cadot wrote:

Hang on, I've lost you there... Here's a bunch of random characters:

!&^fgfctq9786h0%(*&%h2owyp[1238n

Are you saying that a correct C++ compiler may compile this succesfully?
Same question for a correct C compiler?
Yes (for C, at least). Also, if it is "accepted" (whatever that may
mean) by a conforming implementation then it's a conforming program.
Ok, under the assumption that feeding

!&^fgfctq9786h0%(*&%h2owyp[1238n

...to a correct C compiler does not have to fail, what /does/ the
standard have to say on what can be expected of a compiler on such input?

The compiler must issue a diagnostic about the syntax error.

Jeremy.

Nov 14 '05 #19

CBFalconer

Sidney Cadot wrote:

.... snip ...
Hang on, I've lost you there... Here's a bunch of random characters:

!&^fgfctq9786h0%(*&%h2owyp[1238n

Are you saying that a correct C++ compiler may compile this
succesfully? Same question for a correct C compiler?

Offhand the !&^ and &% sequences may be a problem, but with
patience and a few macros I believe we could translate and parse
the rest :-)

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #20

Paul Hsieh

Daniel Haude <ha***@physnet.uni-hamburg.de> wrote:

["Followup-To:" header set to comp.lang.c.]
On 15 Jan 2004 03:58:59 -0800,
Paul Hsieh <qe*@pobox.com> wrote
in Msg. <79**************************@posting.google.com >
The C library has a totally worthless complement of string parsing
functions.
As far as I'm aware of it only has one function that deserves the name
"parsing function", which is sscanf().

An interesting point of view -- considering that sscanf is probably
the least generic of all the parsing mechanisms. Actually strtok
*would* be a good function, if only it were more like strtok_r as is
implemented in Linux.
Anyway, it's simple to roll your own.
And even simpler to get it wrong, or create an inadequate solution as
we shall see.
[...] For delimiter-separated text data I always use this function:

char **dh_splitstring(char *line, int maxfields, int sep)
{
char *a, *e, *s;
int i;
char **arr;

if (NULL == (arr = malloc((maxfields+1) * sizeof *arr))) return NULL;

e = line;
for (i = 0; i < maxfields; i++) {
/* skip leading WS */
for (s = e; *s && isspace(*s) && *s != sep; s++) ;
if (!*s) break;
for (a = s; *a && *a != sep; a++) ;
e = *a == sep ? a+1 : a;
/* chop off trailing WS */
for (a--; isspace(*a) && a > s; a--) ;
if (i < maxfields-1) *++a = 0;
arr[i] = s;
}
for ( ; i < maxfields+1; i++) {
arr[i] = NULL;
}
return arr;
}
A reasonably effort, but it betrays the limited scope that is very
typical of C programmers.

1) This code is complicated. You have several loops, and state
machines, and I have difficulty knowing how to know that this code is
correct. In the middle there you've hidden a nice "a > s" comparison
.... I didn't know you could compare pointers like that in a portable
way.

2) The code only shows the inner loop -- the original request asks for
filling a two dimensional array.

3) It encodes the classic C worthlessness of requiring that you
specify the size of your containers (array) up front, before you know
the size of the data you require. The original poster also posted to
C++ -- using an STL vector to hold the result is probably the right
answer.
line: The string to be parsed
With or without the '\n' on the end? And do you compensate for a
potential '\r' in there? Are you doing the typical C thing of pushing
issues and complexities upward?
maxfields: max number of fields
Its also the minimum malloc size. BTW, what if you decide to pass a
negative value for maxfields?
sep: field separator char (i.e., '\t')
tabs but not spaces? Tabs/spaces are chosen because of their human
readability/editability features. I means if you use them as
seperators you use them interchangeably, in arbitrary numbers also
with the possibility that one or the other might not be present. A
quick look at your algorithm makes it look like it will fail if one
pair of entries it only separated by spaces.
Returns: A newly-allocated, NULL-terminated array of pointers to strings
with maxfields+1 elements. All fields have leading and trailing whitespace
removed.

Warning: Modifies line. If this is not what you want, pass a copy.
Which makes it only barely better than strtok ...
The returned pointers point into different positions of 'line', so 'line'
must not be freed or modified as long as you want to use the result.
Ok ... this is another classic case of pushing up complexity upwards.
Multiple separator tokens are not lumped together, but result in empty
tokens ("\0"). This makes using tab-separated data files dangerous because
there may be different numbers of tabs between columns. I always use
semicolons.

Ok, I'm not sure the OP was asking that you change the policy they
have decided upon. Using arbitrary space or tab delimiters allows for
easy human editability, which might be a fairly high concern. For
humans its easy to miss a ";" and its also easy to confuse one for a
":".

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #21

Paul Hsieh

Default User <fi********@boeing.com.invalid> wrote:

Paul Hsieh wrote:
The C library has a totally worthless complement of string parsing
functions.
Oh yes. It's completely impossible to create any application using the
string-parsing utilities in C.

I never said that. However, attempting to do so ends up contributing
to the coffers of Security Focus, CERT, Reasoning, and other late/post
production software failure analysis companies.
Unless you are a competent programmer.
With a masochistic bent, of course. For this particular task its
actually best to completely leave the entire C string library behind
-- it just doesn't offer anything useful that can't be done better
using some other way.
You know, there are *other* things besides strtok() available. It's
really not that hard to use strchr() and your own state machine for
parsing if strtok() doesn't fit your needs.
Actually, strcspn(), and strspn() would probably be the better choices
if you really wanted to push it. But the point is that if you are
forced to implement your own state machines to hand hold the parsing
anyway, then you might as well roll your own right down to the raw
characters. A big for-switch statement with a few counters and flags
-- it will be just as readable.

Compare this to my Bstrlib based solution -- can you find a state
machine of any kind in there? If anywhere, it would be hidden away in
the thoroughly tested Bstrlib functions and is implemented in a
generically usable manner.
Don't get wrong, std::string has a lot of nice features, but to poo-poo
the C string capabilities portrays either a lack of experience, or lack
of skill.

As demonstrated by your impressive proposal to solve the OP's problem.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #22

Peter Nilsson

"Jeremy Yallop" <je****@jdyallop.freeserve.co.uk> wrote in message
news:sl*******************@hehe.cl.cam.ac.uk...

Peter Nilsson wrote:
I don't know about C++, but the only C construct where a translation
must fail is the #error directive.

In C89 an #error directive causes the issuance of a diagnostic
message; it isn't required to cause translation failure.

Ah yes, I see that in my copy of the C89 draft. Can I ask, what did C95 have
to say on the subject of #error?

--
Peter

Nov 14 '05 #23

pete

Paul Hsieh wrote:

Default User <fi********@boeing.com.invalid> wrote:
Paul Hsieh wrote:
The C library has a totally worthless complement of string parsing
functions.

Oh yes. It's completely impossible to
create any application using the
string-parsing utilities in C.

I never said that. However, attempting to do so ends up contributing
to the coffers of Security Focus, CERT, Reasoning, and other late/post
production software failure analysis companies.
Unless you are a competent programmer.

With a masochistic bent, of course. For this particular task its
actually best to completely leave the entire C string library behind
-- it just doesn't offer anything useful that can't be done better
using some other way.
You know, there are *other* things besides strtok() available. It's
really not that hard to use strchr() and your own state machine for
parsing if strtok() doesn't fit your needs.

Actually, strcspn(), and strspn() would probably be the better choices
if you really wanted to push it.

There's
my head.

There's a beat in my head.

/* BEGIN str_tok_r.c */

#include <stdio.h>

char *str_tok_r (char *, const char *, char **);
size_t str_spn (const char *, const char *);
size_t str_cspn (const char *, const char *);
char *str_chr (const char *, int);
char *str_cpy(char *, const char *);
char *squeeze (char *, const char *);

#define STRING "\tThere's\n a\r beat in \r\tmy head. \n"

int main(void)
{
const char *const original = STRING;
char s1[sizeof STRING];

puts(original);
puts(squeeze(str_cpy(s1, original), "\n\r\t"));
return 0;
}

char *str_tok_r(char *s1, const char *s2, char **p1)
{
if (s1 != NULL) {
*p1 = s1;
}
s1 = *p1 + str_spn(*p1, s2);
if (*s1 == '\0') {
return NULL;
}
*p1 = s1 + str_cspn(s1, s2);
if (**p1 != '\0') {
*(*p1)++ = '\0';
}
return s1;
}

size_t str_spn(const char *s1, const char *s2)
{
const char *const p1 = s1;

while (*s1 != '\0' && str_chr(s2, *s1) != NULL) {
++s1;
}
return s1 - p1;
}

size_t str_cspn(const char *s1, const char *s2)
{
const char *const p1 = s1;

while (str_chr(s2, *s1) == NULL) {
++s1;
}
return s1 - p1;
}

char *str_chr(const char *s, int c)
{
while (*s != (char)c) {
if (!*s) {
return NULL;
}
++s;
}
return (char *)s;
}

char *str_cpy(char *s1, const char *s2)
{
char *const p1 = s1;

do {
*s1 = *s2++;
} while (*s1++ != '\0');
return p1;
}

char *squeeze(char *s1, const char *s2)
{
char *p3;
char const *p2;
char *const p1 = s1;

p2 = str_tok_r(s1, s2, &p3);
while (p2 != NULL) {
while (*p2 != '\0') {
*s1++ = *p2++;
}
p2 = str_tok_r(NULL, s2, &p3);
}
*s1 = '\0';
return p1;
}

/* END str_tok_r.c */

--
pete

Nov 14 '05 #24

Daniel Haude

["Followup-To:" header set to comp.lang.c.]
On 16 Jan 2004 02:50:32 -0800,
Paul Hsieh <qe*@pobox.com> wrote
in Msg. <79*************************@posting.google.com>

And even simpler to get it wrong, or create an inadequate solution as
we shall see.
I've never claimed that my solution was adequate in that it solved the
OP's problem by 100%.
1) This code is complicated.
It's a mere 23 lines. You want to see complicated code?
You have several loops,
Just two, non-nested, and only one of them non-trivial.
and state machines,
zero state machines
and I have difficulty knowing how to know that this code is
correct.
Like with any third-party function you've got to either trust the
documentation or write your own.
In the middle there you've hidden a nice "a > s" comparison
... I didn't know you could compare pointers like that in a portable
way.
You can as long as they point into the same array.
2) The code only shows the inner loop -- the original request asks for
filling a two dimensional array.
Again, I never claimed to've been trying to solve the OP's problem. I was
merely giving an example of how to parse a string in C.
3) It encodes the classic C worthlessness of requiring that you
specify the size of your containers (array) up front, before you know
the size of the data you require.
Easily fixed by a reallocing mechanism that I didn't bother with. The
function I gave is specifically geared to parsing csv tables where the
number of columns is usually known.
The original poster also posted to
C++ -- using an STL vector to hold the result is probably the right
answer.
In C++ it would be.

line: The string to be parsed

With or without the '\n' on the end? And do you compensate for a
potential '\r' in there?

The "doc" (my comments) states that leading and trailing whitespace gets
chopped off all tokens.
Are you doing the typical C thing of pushing
issues and complexities upward?
No, but you're doing the typical thing of not reading the documentation.
Its also the minimum malloc size. BTW, what if you decide to pass a
negative value for maxfields?
UB, obviously. Trivially fixed with a single line. A bug.

sep: field separator char (i.e., '\t')

tabs but not spaces? Tabs/spaces are chosen because of their human
readability/editability features.

Yes, but it's difficult to parse tables that contain empty cells or
elements with whitespace in them.
I means if you use them as
seperators you use them interchangeably, in arbitrary numbers also
with the possibility that one or the other might not be present. A
quick look at your algorithm makes it look like it will fail if one
pair of entries it only separated by spaces.
You're right: My algorithm gets tripped up when anything that's
isspace() is used a s separator.

Warning: Modifies line. If this is not what you want, pass a copy.

Which makes it only barely better than strtok ...

It's re-entrant, and in the usual file-parsing situation (reading csv
data) each line is typically only used once.

I specifically didn't want to allocate additional memory for the field's
contents in the function because it's 1) an unnecessary waste of memory
and performance most of the time, and 2) it's easily provided by 2 extra
standard function calls outside my routine.
Ok ... this is another classic case of pushing up complexity upwards.
It's better to push complexity upwards than implementing it downstairs
where it, although unneeded in most cases, may lead to resource and
performance penalties. Especially when the "complexity" involves nothing
but one call to strdup() and another one to free().
Ok, I'm not sure the OP was asking that you change the policy they
have decided upon.

Hey, I have better things to do than writing code for other people. All I
did was give a simple example of how to do efficient string parsing with a
few lines in C.

--Daniel

--
"With me is nothing wrong! And with you?" (from r.a.m.p)

Nov 14 '05 #25

Default User

Paul Hsieh wrote:

Default User <fi********@boeing.com.invalid> wrote:
Paul Hsieh wrote:
The C library has a totally worthless complement of string parsing
functions.
Oh yes. It's completely impossible to create any application using the
string-parsing utilities in C.

I never said that.

Yes, you did, you said the string parsing functions were worthless. You
can't use worthless things to create things of worth. I would by
"tricky" or "able to blow your foot off", but not "worthless.
However, attempting to do so ends up contributing
to the coffers of Security Focus, CERT, Reasoning, and other late/post
production software failure analysis companies.
Nonsense. You are saying that any attempt to use them, no matter the
dedication and skill level of the programmer, will fail. That is
manifestly untrue. While there have been products with problems, there
are products without problems as well.

Unless you are a competent programmer.

With a masochistic bent, of course.

If you are working in C, then the choices are limited. While using these
functions takes some time to learn, once you become comfortable it's not
particularly onerous.
For this particular task its
actually best to completely leave the entire C string library behind
-- it just doesn't offer anything useful that can't be done better
using some other way.

That's your opinion, not one that I share. I've been using these
functions for a long time, and find them to be useful. For C++
programming, I would (and do) use std::string.

You know, there are *other* things besides strtok() available. It's
really not that hard to use strchr() and your own state machine for
parsing if strtok() doesn't fit your needs.

Actually, strcspn(), and strspn() would probably be the better choices
if you really wanted to push it. But the point is that if you are
forced to implement your own state machines to hand hold the parsing
anyway, then you might as well roll your own right down to the raw
characters. A big for-switch statement with a few counters and flags
-- it will be just as readable.

No, go the other way. Encapsulate this and make your own version of
strtok() that is safe. Then you have that in your personal library. Or
find one of the many ones already available.

Don't get wrong, std::string has a lot of nice features, but to poo-poo
the C string capabilities portrays either a lack of experience, or lack
of skill.

As demonstrated by your impressive proposal to solve the OP's problem.

I made no attempt to solve said problem. The "problem" I was addressing
was your post.

Brian Rodenborn

Nov 14 '05 #26

Jerry Coffin

In article <bu************@ID-189932.news.uni-berlin.de>,
ka*********@gmx.de says...

Hello,

i am an newbie and i have to to solve this problem as fast as i can. But
at this time i don´t have a lot of success.
Can anybody help me (and understand my english :-))?

I have a .txt-file in which the data is structured in that way:
Project-Nr. ID name lastname
33 9 Lars Lundel
33 12 Emil Korla
34 19 Lara Keuler
33 13 Thorsten Lammert

These data have to be read out row by row.
Every row has to be splitted (delimiter is TAB) and has to be saved in

Under the circumstances, I would NOT use a 2D array -- instead, I'd use
a struct (or in C++ a class). I'd then create an array of those structs
(or in C++, a map or perhaps a set).

I'm not going to post code since you've cross-posted to c.l.c and
c.l.c++, and any code that's well-written for one will be off-topic in
the other.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Nov 14 '05 #27

splitting a string and put it into an array

Similar topics