By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,900 Members | 1,358 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,900 IT Pros & Developers. It's quick & easy.

sscanf parsing doubt

P: n/a
hi All,
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
.....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
I understand it is not finding '\0' to scan (%s) strings.
but then I cannot use %c also .
I think i can use like "%64c%*[,]%64c" .
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?

Thanks In Advance,
Simone Mehta.

--
live life Queen Size.
Nov 14 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
Hi,
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
I understand it is not finding '\0' to scan (%s) strings.
Nope. It gives you junk because %s spans from white space to
white space. Commas are not white spaces, so s1 gets it all.

Check the return value of scanf(), this tells you how many
input items you actually could read.

Use the scanset: For example, you can scan for "%[^, \t]"
which stops at the first comma, blank or tabulator.

but then I cannot use %c also .
I think i can use like "%64c%*[,]%64c" .


No. The c conversion specifier will not give you strings
but character arrays which can be nasty to handle.
Apart from that, the problem of the comma being gobbled
by %64c still persists.
Apart from that, using a field width for reading in the
strings to be stored in s1 through s5 is a Good Idea.
If a string before the last item was too long, the return value
of scanf will tell you. For the last item, look up
Pop's Device here in the newsgroup to see how to get
rid of the rest of the line.
Cheers
Michael
#include <stdio.h>
#include <stdlib.h>
#define MAXITEMLEN 32

#define STRINGIZE(s) # s
#define XSTR(s) STRINGIZE(s)

#define DONTSCAN ", \t"
#define ITEMFORMAT "[^" DONTSCAN "]"
#define MAXITEMFORMAT XSTR(MAXITEMLEN) ITEMFORMAT

#define ONEITEM "%" MAXITEMFORMAT
#define SEP "%*[" DONTSCAN "]"

int main (void)
{
char foo[128] = "hello,world, bye ,\tbye\t,world";
char s0[MAXITEMLEN], s1[MAXITEMLEN], s2[MAXITEMLEN];
char s3[MAXITEMLEN], s4[MAXITEMLEN];
int rv;

rv = sscanf(foo, " " ONEITEM SEP ONEITEM SEP ONEITEM SEP
ONEITEM SEP ONEITEM, s0, s1, s2, s3, s4);

switch (rv) {
case 5:
fprintf(stdout,"s4: %s\n",s4);
case 4:
fprintf(stdout,"s3: %s\n",s3);
case 3:
fprintf(stdout,"s2: %s\n",s2);
case 2:
fprintf(stdout,"s1: %s\n",s1);
case 1:
fprintf(stdout,"s0: %s\n",s0);
default:
if (rv != 5) {
fprintf(stderr, "Did not get all items!\n");
exit(EXIT_FAILURE);
}
}
return 0;
}

Nov 14 '05 #2

P: n/a
Simone Mehta wrote:

hi All,
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
I understand it is not finding '\0' to scan (%s) strings.
but then I cannot use %c also .
I think i can use like "%64c%*[,]%64c" .
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?


I think the smimplest way, is to read whole lines from the file
into strings, and then to process the strings in memory.

/* BEGIN output from new.c */

helloworldbyebyeworld

/* END output from new.c */

/* BEGIN new.c */

#include <stdio.h>
#include <string.h>

int main(void)
{
char foo[128] = "hello,world,bye,bye,world";
char *pointer;

for (pointer = foo; *pointer != '\0'; ++pointer) {
if (*pointer == ',') {
memmove(pointer, pointer + 1, strlen(pointer));
}
}
puts("\n/* BEGIN output from new.c */\n");
puts(foo);
puts("\n/* END output from new.c */");
return 0;
}

/* END new.c */
--
pete
Nov 14 '05 #3

P: n/a
Hi pete,
it seems to me that you misunderstood the OP's question:
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^extract strings from it. ^^^^^^^^^^^^^^^^^^^^^^^^^
Note: The OP is doing things line by line.
He wants to set s1 through s5.
[snip! code <snippet> and questions to that]
I think the smimplest way, is to read whole lines from the file
into strings, and then to process the strings in memory.


Which is what the OP does, if I understood him/her correctly.

/* BEGIN output from new.c */

helloworldbyebyeworld

/* END output from new.c */
/* BEGIN new.c */

#include <stdio.h>
#include <string.h>

int main(void)
{
char foo[128] = "hello,world,bye,bye,world";
char *pointer;

for (pointer = foo; *pointer != '\0'; ++pointer) {
if (*pointer == ',') {
memmove(pointer, pointer + 1, strlen(pointer));
}
}
puts("\n/* BEGIN output from new.c */\n");
puts(foo);
puts("\n/* END output from new.c */");
return 0;
}

/* END new.c */
I would suggest the following modification:
#include <stdio.h>
#include <string.h> #include <assert.h>

#define MAXNUMENTRIES 5
int main(void)
{
char foo[128] = "hello,world,bye,bye,world"; char *pointer, *s[MAXNUMENTRIES+1];
size_t i=0; s[i++] = foo; for (pointer = foo; *pointer != '\0'; ++pointer) {
if (*pointer == ',') { *pointer = '\0';
s[i++] = pointer+1; }
} assert(i<=MAXNUMENTRIES);
s[i] = NULL; /* Signify end of valid entries */ puts("\n/* BEGIN output from new.c */\n"); for (i=0; s[i] != NULL; i++)
puts(s[i]); puts("\n/* END output from new.c */");
return 0;
}


I did not test it, though; just wanted to make clear
how to do it :-)
Cheers
Michael

Nov 14 '05 #4

P: n/a
Hi pete,Michael,
thanks for the useful replies.
Michael Mair <ma********************@ians.uni-stuttgart.de>

it seems to me that you misunderstood the OP's question:
you are right Michael I want to scan line by line.

I would suggest the following modification:
> #include <stdio.h>
> #include <string.h>

#include <assert.h>

#define MAXNUMENTRIES 5

I am able to get the same using your program michael.
but need to go for sscanf is that .
csv files have strings with quotes also.
like "hello",world,"foo",FSM,"comp,lang,c"
so this being the case. I will have to maintain a small FSM when it
comes to quote
which can make things difficult.
So i wanted to train sscanf to identify quotes or strings without
them.
but sscanf seems to have a real bad man page or maybe I am not able to
understand much from it.
I would in the above case be interested in
s1=hello
s2=world
s3=foo
s4=FSM
s5=comp,lang,c

any sscanf URLs/bookmarks any one has, explaining a little more would
be a great help. google has helped me a lot but not much on this one
though...

TIA,
Simone Mehta
Nov 14 '05 #5

P: n/a
Hi Simone,
I would suggest the following modification:
[Modified code, original code from pete]
I am able to get the same using your program michael.
but need to go for sscanf is that .
csv files have strings with quotes also.
like "hello",world,"foo",FSM,"comp,lang,c"
so this being the case. I will have to maintain a small FSM when it
comes to quote which can make things difficult.
So i wanted to train sscanf to identify quotes or strings without
them.
Hmmm, considering that, I would advise you to abandon sscanf
as a solution for the whole line -- you just cannot get that
in readable code. So, sscanf essentially will give you more
of a headache than it gains in (seeming) shortness and
conciseness.
but sscanf seems to have a real bad man page or maybe I am not able to
understand much from it. ..... any sscanf URLs/bookmarks any one has, explaining a little more would
be a great help. google has helped me a lot but not much on this one
though...
Well, it is not very good, but the man pages at dinkumware.com
( http://www.dinkumware.com/refxc.html ) about formatted I/O may
help you a little bit more. Apart from that: Many people are
requesting scanf-format help around here, so maybe a google-search
through comp.lang.c archives can give you a better understanding
of what is happening.

I would in the above case be interested in
s1=hello
s2=world
s3=foo
s4=FSM
s5=comp,lang,c


If you know _beforehand_ in which places to expect quotation marks,
you can easily adjust the format in my example.
Otherwise, I would just go through the string in the way pete
has showed. If you encounter a '\"' as first character after
a comma (and zero or more white spaces), just search for '\"'
instead of a terminating ',' and after finding it, throw away
everything up to the next ','...
Cheers
Michael

Nov 14 '05 #6

P: n/a
"Simone Mehta" <si******@indiatimes.com> wrote in message
news:49*************************@posting.google.co m...
hi All,
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
I understand it is not finding '\0' to scan (%s) strings.
but then I cannot use %c also .
I think i can use like "%64c%*[,]%64c" .
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?

Thanks In Advance,
Simone Mehta.


You could use
sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^,]", s1, s2, s3, s4, s5);
where s1,s2,s3,s4,s5 all point to string buffers;

You could also try this:

char foo[128] = "hello,world,bye,bye,world";
char* sep = ",";
char* str;
int n;
for (n=0, str=strtok(foo,sep); n++, str!=NULL; str=strtok(NULL,sep))
printf("%d: %s\n", n, str);

which gives me the output:
1: hello
2: world
3: bye
4: bye
5: world

Note that strtok will replace the commas with a NULLs in foo. Also, avoid
strtok in multi-threaded applications since it uses static data to preserve
context.

Dag
Nov 14 '05 #7

P: n/a
In <49*************************@posting.google.com> si******@indiatimes.com (Simone Mehta) writes:
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
What else can you expect from your brain dead sscanf call?
I understand it is not finding '\0' to scan (%s) strings.
You appear to be completely clueless about how %s works.
but then I cannot use %c also .
%c is useful only when you know in advance how many characters you want
to read. And it doesn't store its output as a properly terminated string.
I think i can use like "%64c%*[,]%64c" .
%64c is hardly any better than %s. I'd say it's actually worse...
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?


Nope. Which is to be expected, since you have obviously not bothered to
*carefully* read the specification of the sscanf function. The first rule
of programming: if you don't know what you're doing, don't do it at all.

A %s directive starts by skipping white space (if any) and then it
consumes everything until a white space character or the null character
terminating the input string are encountered. Your string has no white
space characters, so the first %s will store the whole string in s1.
So, %s is useless for your purpose. The right solution is:

rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);

The last conversion specification can be %s if your fields cannot contain
white space. No need for %*[,] unless you want to skip multiple commas,
which doesn't make much sense (no point in skipping multiple commas if
you don't know their exact position inside the input string).

Always check the value of rc, instead of blindly assuming that all 5
fields were properly extracted from the input string.

Trivia quiz: why did I use %[^\n] for the last conversion?

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
Nov 14 '05 #8

P: n/a
In article <news:49**************************@posting.google. com>
Simone Mehta <si******@indiatimes.com> wrote:
csv files have strings with quotes also.
like "hello",world,"foo",FSM,"comp,lang,c"
so this being the case. I will have to maintain a small FSM when it
comes to quote
which can make things difficult.
So i wanted to train sscanf to identify quotes or strings without
them. ...


The scanf engine is less powerful than regular expressions, and
in this case, is not powerful enough to do what you want.

Note that even regular expressions -- which *can* match quotes,
at least in some RE systems -- cannot handle more-general parsing
tasks, such as matching parentheses. But clearly the scanf engine,
which does only literal matches without alternation, is not enough
by itself to handle both quoted and unquoted strings. The closest
you can get is a sort of "manual alternation" scheme:

while (there is more to scan) {
if (this item begins with a double quote) {
run scanf engine on RE-subset "[^"]+", e.g.:

ret = sscanf(&buf[offset], "\"%79[^\"]%c%n",
dequoted_string, &doublequote_char, &more_offset);
if (ret != 2) ... handle error ...

now doublequote_char is " and more_offset says how many
characters were scanned. Note that this assumes the
dequoted_string[] array has size 80 or more (%79 above).
} else {
run scanf engine on RE-subset [^,]+
}
}

This is still not good enough for "real" CSV files, which allow
quoting the quote marks (in various ways).

I recommend writing a real (but ad-hoc) lexer (or finding one, e.g.,
via google search, and adapting it if needed).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #9

P: n/a

"Dan Pop" <Da*****@cern.ch> wrote in message
news:cj***********@sunnews.cern.ch...
In <49*************************@posting.google.com> si******@indiatimes.com (Simone Mehta) writes:
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
What else can you expect from your brain dead sscanf call?
I understand it is not finding '\0' to scan (%s) strings.


You appear to be completely clueless about how %s works.
but then I cannot use %c also .


%c is useful only when you know in advance how many characters you want
to read. And it doesn't store its output as a properly terminated string.
I think i can use like "%64c%*[,]%64c" .


%64c is hardly any better than %s. I'd say it's actually worse...
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?


Nope. Which is to be expected, since you have obviously not bothered to
*carefully* read the specification of the sscanf function. The first rule
of programming: if you don't know what you're doing, don't do it at all.

A %s directive starts by skipping white space (if any) and then it
consumes everything until a white space character or the null character
terminating the input string are encountered. Your string has no white
space characters, so the first %s will store the whole string in s1.
So, %s is useless for your purpose. The right solution is:

rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);

The last conversion specification can be %s if your fields cannot contain
white space. No need for %*[,] unless you want to skip multiple commas,
which doesn't make much sense (no point in skipping multiple commas if
you don't know their exact position inside the input string).

Always check the value of rc, instead of blindly assuming that all 5
fields were properly extracted from the input string.

Trivia quiz: why did I use %[^\n] for the last conversion?

Does it serve any purpose ? Because sscanf would terminate anyways if it
encounters '\0' which in the OP
code is present.
Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union

Nov 14 '05 #10

P: n/a
In <1096348626.492309@sj-nntpcache-3> "Ravi Uday" <ra******@gmail.com> writes:

"Dan Pop" <Da*****@cern.ch> wrote in message
news:cj***********@sunnews.cern.ch...

Trivia quiz: why did I use %[^\n] for the last conversion?

Does it serve any purpose ? Because sscanf would terminate anyways if it
encounters '\0' which in the OP
code is present.


Try broadening your horizon, beyond the artificial example of the OP.
In real programs, where do such strings come from?

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
Nov 14 '05 #11

P: n/a

sscanf( str, "%s%*c%s%*c%s%*c%s%*c%s", would suffice or have you trie
strtok()
-
Mooni
-----------------------------------------------------------------------
Posted via http://www.codecomments.co
-----------------------------------------------------------------------

Nov 14 '05 #12

P: n/a
Da*****@cern.ch (Dan Pop) wrote in message news:<cj***********@sunnews.cern.ch>...
In <49*************************@posting.google.com> si******@indiatimes.com (Simone Mehta) writes:
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
What else can you expect from your brain dead sscanf call?
I understand it is not finding '\0' to scan (%s) strings.


You appear to be completely clueless about how %s works.

It appears you are in complete control of the situation then pray give
the right answer , stop bullying around the OP.
but then I cannot use %c also .
%c is useful only when you know in advance how many characters you want
to read. And it doesn't store its output as a properly terminated string.
I think i can use like "%64c%*[,]%64c" .


%64c is hardly any better than %s. I'd say it's actually worse...
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?


Nope. Which is to be expected, since you have obviously not bothered to
*carefully* read the specification of the sscanf function. The first rule
of programming: if you don't know what you're doing, don't do it at all.

The OP has some confusions thats why he has turned to the list.
don't scare her. I am sure she must have tried the Circumflex with
lilttle success,.
A %s directive starts by skipping white space (if any) and then it
consumes everything until a white space character or the null character
terminating the input string are encountered. Your string has no white
space characters, so the first %s will store the whole string in s1.
So, %s is useless for your purpose. The right solution is:

rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);

The last conversion specification can be %s if your fields cannot contain
white space. No need for %*[,] unless you want to skip multiple commas,
which doesn't make much sense (no point in skipping multiple commas if
you don't know their exact position inside the input string).

Always check the value of rc, instead of blindly assuming that all 5
fields were properly extracted from the input string. Please stop thinking people will paste complete code here. some code
is always left out for clarity.

Trivia quiz: why did I use %[^\n] for the last conversion?

Dan


your signature says ur looking for a job...
Such arrogance from you can only lead to the search getting prolonged
..
Nov 14 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.