473,322 Members | 1,526 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

sscanf parsing doubt

hi All,
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
.....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
I understand it is not finding '\0' to scan (%s) strings.
but then I cannot use %c also .
I think i can use like "%64c%*[,]%64c" .
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?

Thanks In Advance,
Simone Mehta.

--
live life Queen Size.
Nov 14 '05 #1
12 8654
Hi,
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
I understand it is not finding '\0' to scan (%s) strings.
Nope. It gives you junk because %s spans from white space to
white space. Commas are not white spaces, so s1 gets it all.

Check the return value of scanf(), this tells you how many
input items you actually could read.

Use the scanset: For example, you can scan for "%[^, \t]"
which stops at the first comma, blank or tabulator.

but then I cannot use %c also .
I think i can use like "%64c%*[,]%64c" .


No. The c conversion specifier will not give you strings
but character arrays which can be nasty to handle.
Apart from that, the problem of the comma being gobbled
by %64c still persists.
Apart from that, using a field width for reading in the
strings to be stored in s1 through s5 is a Good Idea.
If a string before the last item was too long, the return value
of scanf will tell you. For the last item, look up
Pop's Device here in the newsgroup to see how to get
rid of the rest of the line.
Cheers
Michael
#include <stdio.h>
#include <stdlib.h>
#define MAXITEMLEN 32

#define STRINGIZE(s) # s
#define XSTR(s) STRINGIZE(s)

#define DONTSCAN ", \t"
#define ITEMFORMAT "[^" DONTSCAN "]"
#define MAXITEMFORMAT XSTR(MAXITEMLEN) ITEMFORMAT

#define ONEITEM "%" MAXITEMFORMAT
#define SEP "%*[" DONTSCAN "]"

int main (void)
{
char foo[128] = "hello,world, bye ,\tbye\t,world";
char s0[MAXITEMLEN], s1[MAXITEMLEN], s2[MAXITEMLEN];
char s3[MAXITEMLEN], s4[MAXITEMLEN];
int rv;

rv = sscanf(foo, " " ONEITEM SEP ONEITEM SEP ONEITEM SEP
ONEITEM SEP ONEITEM, s0, s1, s2, s3, s4);

switch (rv) {
case 5:
fprintf(stdout,"s4: %s\n",s4);
case 4:
fprintf(stdout,"s3: %s\n",s3);
case 3:
fprintf(stdout,"s2: %s\n",s2);
case 2:
fprintf(stdout,"s1: %s\n",s1);
case 1:
fprintf(stdout,"s0: %s\n",s0);
default:
if (rv != 5) {
fprintf(stderr, "Did not get all items!\n");
exit(EXIT_FAILURE);
}
}
return 0;
}

Nov 14 '05 #2
Simone Mehta wrote:

hi All,
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
I understand it is not finding '\0' to scan (%s) strings.
but then I cannot use %c also .
I think i can use like "%64c%*[,]%64c" .
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?


I think the smimplest way, is to read whole lines from the file
into strings, and then to process the strings in memory.

/* BEGIN output from new.c */

helloworldbyebyeworld

/* END output from new.c */

/* BEGIN new.c */

#include <stdio.h>
#include <string.h>

int main(void)
{
char foo[128] = "hello,world,bye,bye,world";
char *pointer;

for (pointer = foo; *pointer != '\0'; ++pointer) {
if (*pointer == ',') {
memmove(pointer, pointer + 1, strlen(pointer));
}
}
puts("\n/* BEGIN output from new.c */\n");
puts(foo);
puts("\n/* END output from new.c */");
return 0;
}

/* END new.c */
--
pete
Nov 14 '05 #3
Hi pete,
it seems to me that you misunderstood the OP's question:
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^extract strings from it. ^^^^^^^^^^^^^^^^^^^^^^^^^
Note: The OP is doing things line by line.
He wants to set s1 through s5.
[snip! code <snippet> and questions to that]
I think the smimplest way, is to read whole lines from the file
into strings, and then to process the strings in memory.


Which is what the OP does, if I understood him/her correctly.

/* BEGIN output from new.c */

helloworldbyebyeworld

/* END output from new.c */
/* BEGIN new.c */

#include <stdio.h>
#include <string.h>

int main(void)
{
char foo[128] = "hello,world,bye,bye,world";
char *pointer;

for (pointer = foo; *pointer != '\0'; ++pointer) {
if (*pointer == ',') {
memmove(pointer, pointer + 1, strlen(pointer));
}
}
puts("\n/* BEGIN output from new.c */\n");
puts(foo);
puts("\n/* END output from new.c */");
return 0;
}

/* END new.c */
I would suggest the following modification:
#include <stdio.h>
#include <string.h> #include <assert.h>

#define MAXNUMENTRIES 5
int main(void)
{
char foo[128] = "hello,world,bye,bye,world"; char *pointer, *s[MAXNUMENTRIES+1];
size_t i=0; s[i++] = foo; for (pointer = foo; *pointer != '\0'; ++pointer) {
if (*pointer == ',') { *pointer = '\0';
s[i++] = pointer+1; }
} assert(i<=MAXNUMENTRIES);
s[i] = NULL; /* Signify end of valid entries */ puts("\n/* BEGIN output from new.c */\n"); for (i=0; s[i] != NULL; i++)
puts(s[i]); puts("\n/* END output from new.c */");
return 0;
}


I did not test it, though; just wanted to make clear
how to do it :-)
Cheers
Michael

Nov 14 '05 #4
Hi pete,Michael,
thanks for the useful replies.
Michael Mair <ma********************@ians.uni-stuttgart.de>

it seems to me that you misunderstood the OP's question:
you are right Michael I want to scan line by line.

I would suggest the following modification:
> #include <stdio.h>
> #include <string.h>

#include <assert.h>

#define MAXNUMENTRIES 5

I am able to get the same using your program michael.
but need to go for sscanf is that .
csv files have strings with quotes also.
like "hello",world,"foo",FSM,"comp,lang,c"
so this being the case. I will have to maintain a small FSM when it
comes to quote
which can make things difficult.
So i wanted to train sscanf to identify quotes or strings without
them.
but sscanf seems to have a real bad man page or maybe I am not able to
understand much from it.
I would in the above case be interested in
s1=hello
s2=world
s3=foo
s4=FSM
s5=comp,lang,c

any sscanf URLs/bookmarks any one has, explaining a little more would
be a great help. google has helped me a lot but not much on this one
though...

TIA,
Simone Mehta
Nov 14 '05 #5
Hi Simone,
I would suggest the following modification:
[Modified code, original code from pete]
I am able to get the same using your program michael.
but need to go for sscanf is that .
csv files have strings with quotes also.
like "hello",world,"foo",FSM,"comp,lang,c"
so this being the case. I will have to maintain a small FSM when it
comes to quote which can make things difficult.
So i wanted to train sscanf to identify quotes or strings without
them.
Hmmm, considering that, I would advise you to abandon sscanf
as a solution for the whole line -- you just cannot get that
in readable code. So, sscanf essentially will give you more
of a headache than it gains in (seeming) shortness and
conciseness.
but sscanf seems to have a real bad man page or maybe I am not able to
understand much from it. ..... any sscanf URLs/bookmarks any one has, explaining a little more would
be a great help. google has helped me a lot but not much on this one
though...
Well, it is not very good, but the man pages at dinkumware.com
( http://www.dinkumware.com/refxc.html ) about formatted I/O may
help you a little bit more. Apart from that: Many people are
requesting scanf-format help around here, so maybe a google-search
through comp.lang.c archives can give you a better understanding
of what is happening.

I would in the above case be interested in
s1=hello
s2=world
s3=foo
s4=FSM
s5=comp,lang,c


If you know _beforehand_ in which places to expect quotation marks,
you can easily adjust the format in my example.
Otherwise, I would just go through the string in the way pete
has showed. If you encounter a '\"' as first character after
a comma (and zero or more white spaces), just search for '\"'
instead of a terminating ',' and after finding it, throw away
everything up to the next ','...
Cheers
Michael

Nov 14 '05 #6
"Simone Mehta" <si******@indiatimes.com> wrote in message
news:49*************************@posting.google.co m...
hi All,
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
I understand it is not finding '\0' to scan (%s) strings.
but then I cannot use %c also .
I think i can use like "%64c%*[,]%64c" .
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?

Thanks In Advance,
Simone Mehta.


You could use
sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^,]", s1, s2, s3, s4, s5);
where s1,s2,s3,s4,s5 all point to string buffers;

You could also try this:

char foo[128] = "hello,world,bye,bye,world";
char* sep = ",";
char* str;
int n;
for (n=0, str=strtok(foo,sep); n++, str!=NULL; str=strtok(NULL,sep))
printf("%d: %s\n", n, str);

which gives me the output:
1: hello
2: world
3: bye
4: bye
5: world

Note that strtok will replace the commas with a NULLs in foo. Also, avoid
strtok in multi-threaded applications since it uses static data to preserve
context.

Dag
Nov 14 '05 #7
In <49*************************@posting.google.com> si******@indiatimes.com (Simone Mehta) writes:
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
What else can you expect from your brain dead sscanf call?
I understand it is not finding '\0' to scan (%s) strings.
You appear to be completely clueless about how %s works.
but then I cannot use %c also .
%c is useful only when you know in advance how many characters you want
to read. And it doesn't store its output as a properly terminated string.
I think i can use like "%64c%*[,]%64c" .
%64c is hardly any better than %s. I'd say it's actually worse...
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?


Nope. Which is to be expected, since you have obviously not bothered to
*carefully* read the specification of the sscanf function. The first rule
of programming: if you don't know what you're doing, don't do it at all.

A %s directive starts by skipping white space (if any) and then it
consumes everything until a white space character or the null character
terminating the input string are encountered. Your string has no white
space characters, so the first %s will store the whole string in s1.
So, %s is useless for your purpose. The right solution is:

rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);

The last conversion specification can be %s if your fields cannot contain
white space. No need for %*[,] unless you want to skip multiple commas,
which doesn't make much sense (no point in skipping multiple commas if
you don't know their exact position inside the input string).

Always check the value of rc, instead of blindly assuming that all 5
fields were properly extracted from the input string.

Trivia quiz: why did I use %[^\n] for the last conversion?

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
Nov 14 '05 #8
In article <news:49**************************@posting.google. com>
Simone Mehta <si******@indiatimes.com> wrote:
csv files have strings with quotes also.
like "hello",world,"foo",FSM,"comp,lang,c"
so this being the case. I will have to maintain a small FSM when it
comes to quote
which can make things difficult.
So i wanted to train sscanf to identify quotes or strings without
them. ...


The scanf engine is less powerful than regular expressions, and
in this case, is not powerful enough to do what you want.

Note that even regular expressions -- which *can* match quotes,
at least in some RE systems -- cannot handle more-general parsing
tasks, such as matching parentheses. But clearly the scanf engine,
which does only literal matches without alternation, is not enough
by itself to handle both quoted and unquoted strings. The closest
you can get is a sort of "manual alternation" scheme:

while (there is more to scan) {
if (this item begins with a double quote) {
run scanf engine on RE-subset "[^"]+", e.g.:

ret = sscanf(&buf[offset], "\"%79[^\"]%c%n",
dequoted_string, &doublequote_char, &more_offset);
if (ret != 2) ... handle error ...

now doublequote_char is " and more_offset says how many
characters were scanned. Note that this assumes the
dequoted_string[] array has size 80 or more (%79 above).
} else {
run scanf engine on RE-subset [^,]+
}
}

This is still not good enough for "real" CSV files, which allow
quoting the quote marks (in various ways).

I recommend writing a real (but ad-hoc) lexer (or finding one, e.g.,
via google search, and adapting it if needed).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #9

"Dan Pop" <Da*****@cern.ch> wrote in message
news:cj***********@sunnews.cern.ch...
In <49*************************@posting.google.com> si******@indiatimes.com (Simone Mehta) writes:
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
What else can you expect from your brain dead sscanf call?
I understand it is not finding '\0' to scan (%s) strings.


You appear to be completely clueless about how %s works.
but then I cannot use %c also .


%c is useful only when you know in advance how many characters you want
to read. And it doesn't store its output as a properly terminated string.
I think i can use like "%64c%*[,]%64c" .


%64c is hardly any better than %s. I'd say it's actually worse...
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?


Nope. Which is to be expected, since you have obviously not bothered to
*carefully* read the specification of the sscanf function. The first rule
of programming: if you don't know what you're doing, don't do it at all.

A %s directive starts by skipping white space (if any) and then it
consumes everything until a white space character or the null character
terminating the input string are encountered. Your string has no white
space characters, so the first %s will store the whole string in s1.
So, %s is useless for your purpose. The right solution is:

rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);

The last conversion specification can be %s if your fields cannot contain
white space. No need for %*[,] unless you want to skip multiple commas,
which doesn't make much sense (no point in skipping multiple commas if
you don't know their exact position inside the input string).

Always check the value of rc, instead of blindly assuming that all 5
fields were properly extracted from the input string.

Trivia quiz: why did I use %[^\n] for the last conversion?

Does it serve any purpose ? Because sscanf would terminate anyways if it
encounters '\0' which in the OP
code is present.
Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union

Nov 14 '05 #10
In <1096348626.492309@sj-nntpcache-3> "Ravi Uday" <ra******@gmail.com> writes:

"Dan Pop" <Da*****@cern.ch> wrote in message
news:cj***********@sunnews.cern.ch...

Trivia quiz: why did I use %[^\n] for the last conversion?

Does it serve any purpose ? Because sscanf would terminate anyways if it
encounters '\0' which in the OP
code is present.


Try broadening your horizon, beyond the artificial example of the OP.
In real programs, where do such strings come from?

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
Nov 14 '05 #11

sscanf( str, "%s%*c%s%*c%s%*c%s%*c%s", would suffice or have you trie
strtok()
-
Mooni
-----------------------------------------------------------------------
Posted via http://www.codecomments.co
-----------------------------------------------------------------------

Nov 14 '05 #12
Da*****@cern.ch (Dan Pop) wrote in message news:<cj***********@sunnews.cern.ch>...
In <49*************************@posting.google.com> si******@indiatimes.com (Simone Mehta) writes:
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
What else can you expect from your brain dead sscanf call?
I understand it is not finding '\0' to scan (%s) strings.


You appear to be completely clueless about how %s works.

It appears you are in complete control of the situation then pray give
the right answer , stop bullying around the OP.
but then I cannot use %c also .
%c is useful only when you know in advance how many characters you want
to read. And it doesn't store its output as a properly terminated string.
I think i can use like "%64c%*[,]%64c" .


%64c is hardly any better than %s. I'd say it's actually worse...
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?


Nope. Which is to be expected, since you have obviously not bothered to
*carefully* read the specification of the sscanf function. The first rule
of programming: if you don't know what you're doing, don't do it at all.

The OP has some confusions thats why he has turned to the list.
don't scare her. I am sure she must have tried the Circumflex with
lilttle success,.
A %s directive starts by skipping white space (if any) and then it
consumes everything until a white space character or the null character
terminating the input string are encountered. Your string has no white
space characters, so the first %s will store the whole string in s1.
So, %s is useless for your purpose. The right solution is:

rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);

The last conversion specification can be %s if your fields cannot contain
white space. No need for %*[,] unless you want to skip multiple commas,
which doesn't make much sense (no point in skipping multiple commas if
you don't know their exact position inside the input string).

Always check the value of rc, instead of blindly assuming that all 5
fields were properly extracted from the input string. Please stop thinking people will paste complete code here. some code
is always left out for clarity.

Trivia quiz: why did I use %[^\n] for the last conversion?

Dan


your signature says ur looking for a job...
Such arrogance from you can only lead to the search getting prolonged
..
Nov 14 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Simone Mehta | last post by:
hi All, I am parsing a CSV file. I want to read every row into a char array of reasonable size and then extract strings from it. <snippet> char foo="hello,world,bye,bye,world"; ........
4
by: smshahriar | last post by:
Hi, I want to scan from the following string all the hex numbers and populate an array of integers: 0x27 0x00 0x30 0x00 0x33 0x00 0x36 0x00
7
by: Michael R. Copeland | last post by:
I'm processing a control file comprised of many types of lines, with some containing variable data. I have a problem parsing the following data: 18 12.2 7.145 6.214 Phase distances First,...
8
by: Artemio | last post by:
Dear folks, I need some help with using the sscanf() function. I need to parse a string which has several parameters given in a "A=... B=... C=..." way, and each has a different type (one is a...
20
by: AMP | last post by:
Hello, Anybody know if anything exists like sscanf in c. I found a few things OL but most were pretty old. Maybe something has come along since 2004? Thanks Mike
5
by: Alex Mathieu | last post by:
Hi, using sscanf, I'm trying to retrieve something, but nothing seems to work. Here's the pattern: SS%*sþ0þ%6s Heres the data: SS000000395000000000DC-þ0þ799829þ1174503725þ Actually, I...
8
by: Tarique | last post by:
Hello. #include<stdio.h> int main(void) { int i=- -2; printf("%d",i); return 0; }
5
by: Timo | last post by:
I haven't been using ANSI-C for string parsing for some time, so even this simple task is problematic: I have a string tmp_str, which includes date + time + newline in format: "25.6.2008 21:49"....
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.