hi All,
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo[128]="hello,world,bye,bye,world";
.....
sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
<snippet/>
This is giving me junk .
I understand it is not finding '\0' to scan (%s) strings.
but then I cannot use %c also .
I think i can use like "%64c%*[,]%64c" .
Please enlighten me as to the algo to be used here . Am i doing it the
right way ?
Thanks In Advance,
Simone Mehta.
--
live life Queen Size. 12 8654
Hi, <snippet> char foo[128]="hello,world,bye,bye,world"; .... sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5); <snippet/> This is giving me junk . I understand it is not finding '\0' to scan (%s) strings.
Nope. It gives you junk because %s spans from white space to
white space. Commas are not white spaces, so s1 gets it all.
Check the return value of scanf(), this tells you how many
input items you actually could read.
Use the scanset: For example, you can scan for "%[^, \t]"
which stops at the first comma, blank or tabulator.
but then I cannot use %c also . I think i can use like "%64c%*[,]%64c" .
No. The c conversion specifier will not give you strings
but character arrays which can be nasty to handle.
Apart from that, the problem of the comma being gobbled
by %64c still persists.
Apart from that, using a field width for reading in the
strings to be stored in s1 through s5 is a Good Idea.
If a string before the last item was too long, the return value
of scanf will tell you. For the last item, look up
Pop's Device here in the newsgroup to see how to get
rid of the rest of the line.
Cheers
Michael
#include <stdio.h>
#include <stdlib.h>
#define MAXITEMLEN 32
#define STRINGIZE(s) # s
#define XSTR(s) STRINGIZE(s)
#define DONTSCAN ", \t"
#define ITEMFORMAT "[^" DONTSCAN "]"
#define MAXITEMFORMAT XSTR(MAXITEMLEN) ITEMFORMAT
#define ONEITEM "%" MAXITEMFORMAT
#define SEP "%*[" DONTSCAN "]"
int main (void)
{
char foo[128] = "hello,world, bye ,\tbye\t,world";
char s0[MAXITEMLEN], s1[MAXITEMLEN], s2[MAXITEMLEN];
char s3[MAXITEMLEN], s4[MAXITEMLEN];
int rv;
rv = sscanf(foo, " " ONEITEM SEP ONEITEM SEP ONEITEM SEP
ONEITEM SEP ONEITEM, s0, s1, s2, s3, s4);
switch (rv) {
case 5:
fprintf(stdout,"s4: %s\n",s4);
case 4:
fprintf(stdout,"s3: %s\n",s3);
case 3:
fprintf(stdout,"s2: %s\n",s2);
case 2:
fprintf(stdout,"s1: %s\n",s1);
case 1:
fprintf(stdout,"s0: %s\n",s0);
default:
if (rv != 5) {
fprintf(stderr, "Did not get all items!\n");
exit(EXIT_FAILURE);
}
}
return 0;
}
Simone Mehta wrote: hi All, I am parsing a CSV file. I want to read every row into a char array of reasonable size and then extract strings from it. <snippet> char foo[128]="hello,world,bye,bye,world"; .... sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5); <snippet/> This is giving me junk . I understand it is not finding '\0' to scan (%s) strings. but then I cannot use %c also . I think i can use like "%64c%*[,]%64c" . Please enlighten me as to the algo to be used here . Am i doing it the right way ?
I think the smimplest way, is to read whole lines from the file
into strings, and then to process the strings in memory.
/* BEGIN output from new.c */
helloworldbyebyeworld
/* END output from new.c */
/* BEGIN new.c */
#include <stdio.h>
#include <string.h>
int main(void)
{
char foo[128] = "hello,world,bye,bye,world";
char *pointer;
for (pointer = foo; *pointer != '\0'; ++pointer) {
if (*pointer == ',') {
memmove(pointer, pointer + 1, strlen(pointer));
}
}
puts("\n/* BEGIN output from new.c */\n");
puts(foo);
puts("\n/* END output from new.c */");
return 0;
}
/* END new.c */
--
pete
Hi pete,
it seems to me that you misunderstood the OP's question: I am parsing a CSV file. I want to read every row into a char array of reasonable size and then
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^extract strings from it.
^^^^^^^^^^^^^^^^^^^^^^^^^
Note: The OP is doing things line by line.
He wants to set s1 through s5.
[snip! code <snippet> and questions to that] I think the smimplest way, is to read whole lines from the file into strings, and then to process the strings in memory.
Which is what the OP does, if I understood him/her correctly.
/* BEGIN output from new.c */
helloworldbyebyeworld
/* END output from new.c */
/* BEGIN new.c */
#include <stdio.h> #include <string.h>
int main(void) { char foo[128] = "hello,world,bye,bye,world"; char *pointer;
for (pointer = foo; *pointer != '\0'; ++pointer) { if (*pointer == ',') { memmove(pointer, pointer + 1, strlen(pointer)); } } puts("\n/* BEGIN output from new.c */\n"); puts(foo); puts("\n/* END output from new.c */"); return 0; }
/* END new.c */
I would suggest the following modification:
#include <stdio.h> #include <string.h>
#include <assert.h>
#define MAXNUMENTRIES 5
int main(void) { char foo[128] = "hello,world,bye,bye,world";
char *pointer, *s[MAXNUMENTRIES+1];
size_t i=0;
s[i++] = foo; for (pointer = foo; *pointer != '\0'; ++pointer) { if (*pointer == ',') {
*pointer = '\0';
s[i++] = pointer+1; } }
assert(i<=MAXNUMENTRIES);
s[i] = NULL; /* Signify end of valid entries */ puts("\n/* BEGIN output from new.c */\n");
for (i=0; s[i] != NULL; i++)
puts(s[i]); puts("\n/* END output from new.c */"); return 0; }
I did not test it, though; just wanted to make clear
how to do it :-)
Cheers
Michael
Hi pete,Michael,
thanks for the useful replies. Michael Mair <ma********************@ians.uni-stuttgart.de>
it seems to me that you misunderstood the OP's question:
you are right Michael I want to scan line by line.
I would suggest the following modification:
> #include <stdio.h> > #include <string.h> #include <assert.h>
#define MAXNUMENTRIES 5
I am able to get the same using your program michael.
but need to go for sscanf is that .
csv files have strings with quotes also.
like "hello",world,"foo",FSM,"comp,lang,c"
so this being the case. I will have to maintain a small FSM when it
comes to quote
which can make things difficult.
So i wanted to train sscanf to identify quotes or strings without
them.
but sscanf seems to have a real bad man page or maybe I am not able to
understand much from it.
I would in the above case be interested in
s1=hello
s2=world
s3=foo
s4=FSM
s5=comp,lang,c
any sscanf URLs/bookmarks any one has, explaining a little more would
be a great help. google has helped me a lot but not much on this one
though...
TIA,
Simone Mehta
Hi Simone, I would suggest the following modification:
[Modified code, original code from pete] I am able to get the same using your program michael. but need to go for sscanf is that . csv files have strings with quotes also. like "hello",world,"foo",FSM,"comp,lang,c" so this being the case. I will have to maintain a small FSM when it comes to quote which can make things difficult. So i wanted to train sscanf to identify quotes or strings without them.
Hmmm, considering that, I would advise you to abandon sscanf
as a solution for the whole line -- you just cannot get that
in readable code. So, sscanf essentially will give you more
of a headache than it gains in (seeming) shortness and
conciseness.
but sscanf seems to have a real bad man page or maybe I am not able to understand much from it.
..... any sscanf URLs/bookmarks any one has, explaining a little more would be a great help. google has helped me a lot but not much on this one though...
Well, it is not very good, but the man pages at dinkumware.com
( http://www.dinkumware.com/refxc.html ) about formatted I/O may
help you a little bit more. Apart from that: Many people are
requesting scanf-format help around here, so maybe a google-search
through comp.lang.c archives can give you a better understanding
of what is happening.
I would in the above case be interested in s1=hello s2=world s3=foo s4=FSM s5=comp,lang,c
If you know _beforehand_ in which places to expect quotation marks,
you can easily adjust the format in my example.
Otherwise, I would just go through the string in the way pete
has showed. If you encounter a '\"' as first character after
a comma (and zero or more white spaces), just search for '\"'
instead of a terminating ',' and after finding it, throw away
everything up to the next ','...
Cheers
Michael
"Simone Mehta" <si******@indiatimes.com> wrote in message
news:49*************************@posting.google.co m... hi All, I am parsing a CSV file. I want to read every row into a char array of reasonable size and then extract strings from it. <snippet> char foo[128]="hello,world,bye,bye,world"; .... sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5); <snippet/> This is giving me junk . I understand it is not finding '\0' to scan (%s) strings. but then I cannot use %c also . I think i can use like "%64c%*[,]%64c" . Please enlighten me as to the algo to be used here . Am i doing it the right way ?
Thanks In Advance, Simone Mehta.
You could use
sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^,]", s1, s2, s3, s4, s5);
where s1,s2,s3,s4,s5 all point to string buffers;
You could also try this:
char foo[128] = "hello,world,bye,bye,world";
char* sep = ",";
char* str;
int n;
for (n=0, str=strtok(foo,sep); n++, str!=NULL; str=strtok(NULL,sep))
printf("%d: %s\n", n, str);
which gives me the output:
1: hello
2: world
3: bye
4: bye
5: world
Note that strtok will replace the commas with a NULLs in foo. Also, avoid
strtok in multi-threaded applications since it uses static data to preserve
context.
Dag
In <49*************************@posting.google.com> si******@indiatimes.com (Simone Mehta) writes: I am parsing a CSV file. I want to read every row into a char array of reasonable size and then extract strings from it. <snippet> char foo[128]="hello,world,bye,bye,world"; .... sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5); <snippet/> This is giving me junk .
What else can you expect from your brain dead sscanf call?
I understand it is not finding '\0' to scan (%s) strings.
You appear to be completely clueless about how %s works.
but then I cannot use %c also .
%c is useful only when you know in advance how many characters you want
to read. And it doesn't store its output as a properly terminated string.
I think i can use like "%64c%*[,]%64c" .
%64c is hardly any better than %s. I'd say it's actually worse...
Please enlighten me as to the algo to be used here . Am i doing it the right way ?
Nope. Which is to be expected, since you have obviously not bothered to
*carefully* read the specification of the sscanf function. The first rule
of programming: if you don't know what you're doing, don't do it at all.
A %s directive starts by skipping white space (if any) and then it
consumes everything until a white space character or the null character
terminating the input string are encountered. Your string has no white
space characters, so the first %s will store the whole string in s1.
So, %s is useless for your purpose. The right solution is:
rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);
The last conversion specification can be %s if your fields cannot contain
white space. No need for %*[,] unless you want to skip multiple commas,
which doesn't make much sense (no point in skipping multiple commas if
you don't know their exact position inside the input string).
Always check the value of rc, instead of blindly assuming that all 5
fields were properly extracted from the input string.
Trivia quiz: why did I use %[^\n] for the last conversion?
Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
In article <news:49**************************@posting.google. com>
Simone Mehta <si******@indiatimes.com> wrote: csv files have strings with quotes also. like "hello",world,"foo",FSM,"comp,lang,c" so this being the case. I will have to maintain a small FSM when it comes to quote which can make things difficult. So i wanted to train sscanf to identify quotes or strings without them. ...
The scanf engine is less powerful than regular expressions, and
in this case, is not powerful enough to do what you want.
Note that even regular expressions -- which *can* match quotes,
at least in some RE systems -- cannot handle more-general parsing
tasks, such as matching parentheses. But clearly the scanf engine,
which does only literal matches without alternation, is not enough
by itself to handle both quoted and unquoted strings. The closest
you can get is a sort of "manual alternation" scheme:
while (there is more to scan) {
if (this item begins with a double quote) {
run scanf engine on RE-subset "[^"]+", e.g.:
ret = sscanf(&buf[offset], "\"%79[^\"]%c%n",
dequoted_string, &doublequote_char, &more_offset);
if (ret != 2) ... handle error ...
now doublequote_char is " and more_offset says how many
characters were scanned. Note that this assumes the
dequoted_string[] array has size 80 or more (%79 above).
} else {
run scanf engine on RE-subset [^,]+
}
}
This is still not good enough for "real" CSV files, which allow
quoting the quote marks (in various ways).
I recommend writing a real (but ad-hoc) lexer (or finding one, e.g.,
via google search, and adapting it if needed).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
"Dan Pop" <Da*****@cern.ch> wrote in message
news:cj***********@sunnews.cern.ch... In <49*************************@posting.google.com>
si******@indiatimes.com (Simone Mehta) writes: I am parsing a CSV file. I want to read every row into a char array of reasonable size and then extract strings from it. <snippet> char foo[128]="hello,world,bye,bye,world"; .... sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5); <snippet/> This is giving me junk . What else can you expect from your brain dead sscanf call?
I understand it is not finding '\0' to scan (%s) strings.
You appear to be completely clueless about how %s works.
but then I cannot use %c also .
%c is useful only when you know in advance how many characters you want to read. And it doesn't store its output as a properly terminated string.
I think i can use like "%64c%*[,]%64c" .
%64c is hardly any better than %s. I'd say it's actually worse...
Please enlighten me as to the algo to be used here . Am i doing it the right way ?
Nope. Which is to be expected, since you have obviously not bothered to *carefully* read the specification of the sscanf function. The first rule of programming: if you don't know what you're doing, don't do it at all.
A %s directive starts by skipping white space (if any) and then it consumes everything until a white space character or the null character terminating the input string are encountered. Your string has no white space characters, so the first %s will store the whole string in s1. So, %s is useless for your purpose. The right solution is:
rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);
The last conversion specification can be %s if your fields cannot contain white space. No need for %*[,] unless you want to skip multiple commas, which doesn't make much sense (no point in skipping multiple commas if you don't know their exact position inside the input string).
Always check the value of rc, instead of blindly assuming that all 5 fields were properly extracted from the input string.
Trivia quiz: why did I use %[^\n] for the last conversion?
Does it serve any purpose ? Because sscanf would terminate anyways if it
encounters '\0' which in the OP
code is present.
Dan -- Dan Pop DESY Zeuthen, RZ group Email: Da*****@ifh.de Currently looking for a job in the European Union
In <1096348626.492309@sj-nntpcache-3> "Ravi Uday" <ra******@gmail.com> writes: "Dan Pop" <Da*****@cern.ch> wrote in message news:cj***********@sunnews.cern.ch... Trivia quiz: why did I use %[^\n] for the last conversion?
Does it serve any purpose ? Because sscanf would terminate anyways if it encounters '\0' which in the OP code is present.
Try broadening your horizon, beyond the artificial example of the OP.
In real programs, where do such strings come from?
Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Currently looking for a job in the European Union
sscanf( str, "%s%*c%s%*c%s%*c%s%*c%s", would suffice or have you trie
strtok()
-
Mooni
-----------------------------------------------------------------------
Posted via http://www.codecomments.co
----------------------------------------------------------------------- Da*****@cern.ch (Dan Pop) wrote in message news:<cj***********@sunnews.cern.ch>... In <49*************************@posting.google.com> si******@indiatimes.com (Simone Mehta) writes:
I am parsing a CSV file. I want to read every row into a char array of reasonable size and then extract strings from it. <snippet> char foo[128]="hello,world,bye,bye,world"; .... sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5); <snippet/> This is giving me junk . What else can you expect from your brain dead sscanf call?
I understand it is not finding '\0' to scan (%s) strings.
You appear to be completely clueless about how %s works.
It appears you are in complete control of the situation then pray give
the right answer , stop bullying around the OP.but then I cannot use %c also . %c is useful only when you know in advance how many characters you want to read. And it doesn't store its output as a properly terminated string.
I think i can use like "%64c%*[,]%64c" .
%64c is hardly any better than %s. I'd say it's actually worse...
Please enlighten me as to the algo to be used here . Am i doing it the right way ?
Nope. Which is to be expected, since you have obviously not bothered to *carefully* read the specification of the sscanf function. The first rule of programming: if you don't know what you're doing, don't do it at all.
The OP has some confusions thats why he has turned to the list.
don't scare her. I am sure she must have tried the Circumflex with
lilttle success,. A %s directive starts by skipping white space (if any) and then it consumes everything until a white space character or the null character terminating the input string are encountered. Your string has no white space characters, so the first %s will store the whole string in s1. So, %s is useless for your purpose. The right solution is:
rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);
The last conversion specification can be %s if your fields cannot contain white space. No need for %*[,] unless you want to skip multiple commas, which doesn't make much sense (no point in skipping multiple commas if you don't know their exact position inside the input string).
Always check the value of rc, instead of blindly assuming that all 5 fields were properly extracted from the input string.
Please stop thinking people will paste complete code here. some code
is always left out for clarity. Trivia quiz: why did I use %[^\n] for the last conversion?
Dan
your signature says ur looking for a job...
Such arrogance from you can only lead to the search getting prolonged
.. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Simone Mehta |
last post by:
hi All,
I am parsing a CSV file.
I want to read every row into a char array of reasonable size and then
extract strings from it.
<snippet>
char foo="hello,world,bye,bye,world";
........
|
by: smshahriar |
last post by:
Hi,
I want to scan from the following string all the hex numbers and
populate an array of integers:
0x27 0x00
0x30 0x00
0x33 0x00
0x36 0x00
|
by: Michael R. Copeland |
last post by:
I'm processing a control file comprised of many types of lines, with
some containing variable data. I have a problem parsing the following
data:
18 12.2 7.145 6.214 Phase distances
First,...
|
by: Artemio |
last post by:
Dear folks,
I need some help with using the sscanf() function. I need to parse a
string which has several parameters given in a "A=... B=... C=..." way,
and each has a different type (one is a...
|
by: AMP |
last post by:
Hello,
Anybody know if anything exists like sscanf in c.
I found a few things OL but most were pretty old. Maybe something has
come along since 2004?
Thanks
Mike
|
by: Alex Mathieu |
last post by:
Hi,
using sscanf, I'm trying to retrieve something, but nothing seems to
work.
Here's the pattern: SS%*sþ0þ%6s
Heres the data: SS000000395000000000DC-þ0þ799829þ1174503725þ
Actually, I...
|
by: Tarique |
last post by:
Hello.
#include<stdio.h>
int main(void)
{
int i=- -2;
printf("%d",i);
return 0;
}
|
by: Timo |
last post by:
I haven't been using ANSI-C for string parsing for some time, so even
this simple task is problematic:
I have a string tmp_str, which includes date + time + newline in
format: "25.6.2008 21:49"....
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
| |