473,836 Members | 2,135 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to Parse a CSV formatted text file

Hi all,
I have a text file which have data in CSV format.
"empno","phonen umber","wardnum ber"
12345,2234353,1 000202
12326,2243653,1 000098
Iam a beginner of C/C++ programming.
I don't know how to tokenize the comma separated values.I used strtok
function reading line by line using fgets.but it gives some weird
behavior.It doesnot stripout the "" fully.Could any body have sample
code for the same so that it will be helfful for my reference?

Ram Laxman

Ram Laxman
Nov 14 '05
22 19631
ra********@indi a.com (Ram Laxman) wrote:
# Hi all,
# I have a text file which have data in CSV format.
# "empno","phonen umber","wardnum ber"
# 12345,2234353,1 000202
# 12326,2243653,1 000098
# Iam a beginner of C/C++ programming.
# I don't know how to tokenize the comma separated values.I used strtok
# function reading line by line using fgets.but it gives some weird
# behavior.It doesnot stripout the "" fully.Could any body have sample
# code for the same so that it will be helfful for my reference?

This is probably a type 3 language, so you can probably use a finite
state machine. If you're just beginning, that can be an intimidating
bit of jargon, but FSMs are actually easy to understand, and if you
want to be a programmer, you have to understand them. They pop up all
over the place.

You can #defines to abstract the FSM with something like

#define FSM(name) static int name(FILE *file) {int ch=0,m=0,n=0; char *s=0;
#define endFSM return -1;}

#define state(name) name: ch = fgetc(stdin); e_##name: switch (ch) {
#define endstate } return -1;

#define is(character) case character:
#define any default:
#define next(name) ;goto name
#define emove(name) ;goto e_##name;
#define final(name,valu e) name: e_##name: free(s); return value;

#define shift ;if (n+1>=m) {m = 2*(n+1); s = realloc(s,m);} s[n++] = ch;
#define discard ;m = n = 0; s = 0;
#define dispose ;free(s) discard

static void got_empno(char *s);
static void got_phonenumber (char *s);
static void got_wardnumber( char *s);
static void got_csventry(vo id);

FSM(csv_parser)
state(empno)
is('"') next(quoted_emp no)
is(EOF) next(at_end)
is(',') got_empno(s) discard next(phonenumbe r)
any shift next(empno)
endstate
state(quoted_em pno)
is('"') next(empno)
is(EOF) next(at_end_in_ string)
any shift next(empno)
endstate
state(phonenumb er)
is('"') next(quoted_pho nenumber)
is(EOF) next(at_end_in_ entry)
is(',') got_phonenumber (s) discard next(wardnumber )
any shift next(phonenumbe r)
endstate
state(quoted_ph onenumber)
is('"') next(phonenumbe r)
is(EOF) next(at_end_in_ string)
any shift next(phonenumbe r)
endstate
state(wardnumbe r)
is('"') next(quoted_war dnumber)
is(EOF)
got_wardnumber( s); got_csventry() discard
next(at_end)
is('\n')
got_wardnumber( s); got_csventry() discard
next(empno)
is(',') got_wardnumber( s) discard next(unexpected _field)
any shift next(wardnumber )
endstate
state(quoted_wa rdnumber)
is('"') next(wardnumber )
is(EOF) next(at_end_in_ string)
any shift next(wardnumber )
endstate
final(at_end,0)
final(at_end_in _string,1)
final(unexpecte d_field,2)
endFSM

....
int rc = csv_parser(stdi n);
// calls
// got_empno(empno-string)
// got_phonenumber (phonenumber-string)
// got_wardnumber( wardnumber-string)
// got_csventry()
// for each entry
switch (rc) {
case -1: fputs("parser failure\n",stde rr); break;
case 1: fputs("end of file in a string\n",stder r); break;
case 2: fputs("too many fields\n",stder r); break;
}
....

--
Derk Gwen http://derkgwen.250free.com/html/index.html
I have no idea what you just said.
I get that alot.
Nov 14 '05 #11
On Sat, 07 Feb 2004 18:38:10 GMT, "Mike Wahler"
<mk******@mkwah ler.net> wrote in comp.lang.c:

"Ram Laxman" <ra********@ind ia.com> wrote in message
news:24******** *************** ***@posting.goo gle.com...
Hi all,
I have a text file which have data in CSV format.
"empno","phonen umber","wardnum ber"
12345,2234353,1 000202
12326,2243653,1 000098
Iam a beginner of C/C++ programming.
I don't know how to tokenize the comma separated values.I used strtok
function reading line by line using fgets.but it gives some weird
behavior.It doesnot stripout the "" fully.Could any body have sample
code for the same so that it will be helfful for my reference?

Ram Laxman


#include <cstdlib>
#include <fstream>
#include <ios>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>


[snip]

Mike, please do NOT post C++ code to messages crossposted to
comp.lang.c.

Thanks

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.l earn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Nov 14 '05 #12
"Jack Klein" <ja*******@spam cop.net> wrote in message
news:n7******** *************** *********@4ax.c om...

[snip]

Mike, please do NOT post C++ code to messages crossposted to
comp.lang.c.


Oops, sorry, wasn't paying attention. Thanks for the heads-up.

-Mike
Nov 14 '05 #13
ra********@indi a.com (Ram Laxman) wrote in
news:24******** *************** ***@posting.goo gle.com:
Hi all,
I have a text file which have data in CSV format.
"empno","phonen umber","wardnum ber"
12345,2234353,1 000202
12326,2243653,1 000098
Iam a beginner of C/C++ programming.
I don't know how to tokenize the comma separated values.I used strtok
function reading line by line using fgets.but it gives some weird
behavior.It doesnot stripout the "" fully.Could any body have sample
code for the same so that it will be helfful for my reference?


Check out the amazing Spirit framework.
It's available from Boost libraries: http://www.boost.org
Nov 14 '05 #14
On 7 Feb 2004 09:39:14 -0800, in comp.lang.c , ra********@indi a.com
(Ram Laxman) wrote:
Hi all,
I have a text file which have data in CSV format.
"empno","phone number","wardnu mber"
12345,2234353, 1000202
12326,2243653, 1000098
Iam a beginner of C/C++ programming.
I don't know how to tokenize the comma separated values.I used strtok
function reading line by line using fgets.but it gives some weird
behavior.


yes, you need to handle that sort of stuff yourself. Personally I'd
use strtok on this sort of data, since embedded commas should not
exist. Consider the 1st line a special case.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.c om/ms3/bchambless0/welcome_to_clc. html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
Nov 14 '05 #15
Mark McIntyre wrote:

On 7 Feb 2004 09:39:14 -0800, in comp.lang.c , ra********@indi a.com
(Ram Laxman) wrote:
Hi all,
I have a text file which have data in CSV format.
"empno","phone number","wardnu mber"
12345,2234353, 1000202
12326,2243653, 1000098
Iam a beginner of C/C++ programming.
I don't know how to tokenize the comma separated values.I used strtok
function reading line by line using fgets.but it gives some weird
behavior.


yes, you need to handle that sort of stuff yourself. Personally I'd
use strtok on this sort of data, since embedded commas should not
exist. Consider the 1st line a special case.

I don't know of a 'Standard' defining .csv but this is normal output
from Visual FoxPro..

first,last
"Mac "The Knife" Peter","Boswell , Jr."

But strangely, Excel reads it back wrong. Go figure.
"Failure is not an option. With M$ it is bundled with every package."

The format started with dBASE I think and goes something like this..

Fields are alphanumerics separated by commas. Fields of type 'Character'
are further delimited with '"' so that they might contain comma and '"'
itself. The Rules are something like this..

The first field begins with the first character on the line.
Fields end at a naked ',' comma or '\n' newline.
Delimited fields begin with '"' and end with '"' and comma or newline.
The delimiters are not a literal part of the field. Any comma or '"'
within the delimiters are literals.

--
Joe Wright http://www.jw-wright.com
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Nov 14 '05 #16
On Sun, 08 Feb 2004 14:43:05 GMT, in comp.lang.c , Joe Wright
<jo********@ear thlink.net> wrote:
Mark McIntyre wrote:


yes, you need to handle that sort of stuff yourself. Personally I'd
use strtok on this sort of data, since embedded commas should not
exist. Consider the 1st line a special case.

I don't know of a 'Standard' defining .csv but this is normal output
from Visual FoxPro..


snip example w/ embedded commas.

Interesting, but hte OP's data was employee numbers, phone numbers and
ward numbers. I find Occam's Razor to be efficient in such cases.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.c om/ms3/bchambless0/welcome_to_clc. html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
Nov 14 '05 #17
On Sun, 08 Feb 2004 14:43:05 GMT in comp.lang.c++, Joe Wright
<jo********@ear thlink.net> was alleged to have written:
I don't know of a 'Standard' defining .csv but this is normal output
from Visual FoxPro..

first,last
"Mac "The Knife" Peter","Boswell , Jr."

But strangely, Excel reads it back wrong.


Excel expects quotes within the field to be doubled. In fact, I would
go so far as to say FoxPro is wrong.

More lenient parsing would recognize a quote not followed by a comma or
newline as contained within the field. This creates some ambiguities,
since quoted fields can also contain newline.

There is no standard, but see http://www.wotsit.org/download.asp?f=csv

Nov 14 '05 #18
> I have a text file which have data in CSV format.

What *IS* CSV format? The following "definition by example"
isn't very complete.
"empno","phone number","wardnu mber"
12345,2234353, 1000202
12326,2243653, 1000098


Your examples do not handle the corner cases where a string
contains commas, quotes, and/or newlines. If your definition
introduces an "escape" character, also worry about strings
consisting of several of those characters. Also, can single
quotes be used in place of double quotes? Can a single quote
match a double quote or vice versa?

Also it isn't explained what isn't a valid CSV format. How
about these:

,,,,,,,,,,,,,,, ,,,,,
,"""""""""""""" """"""""""" "",
,"""""""""""""" """"""""""" """,
,"""""""""""""" """"""""""""""" ,
"\\\\\\\\\\\\\\ \\\"
"\\\\\\\\\\\\\\ \\\\"
"\\\\\\\\\\\\\\ \\\\\"
""""""""""""""" """"""""
""""""""""""""" """""""""
""""""""""""""" """"""""""
""""""""""""""" """""""""""

Gordon L. Burditt
Nov 14 '05 #19
Joe Wright <jo********@ear thlink.net> wrote:
I don't know of a 'Standard' defining .csv but this is normal output
from Visual FoxPro..

first,last
"Mac "The Knife" Peter","Boswell , Jr."

But strangely, Excel reads it back wrong. Go figure.
"Failure is not an option. With M$ it is bundled with every package."


So, you are saying this is not at all a homework assignment but rather a
request from a Microsoft engineer asking for correct code dealing with
their files?
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.co m/>
Nov 14 '05 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

22
872
by: Ram Laxman | last post by:
Hi all, I have a text file which have data in CSV format. "empno","phonenumber","wardnumber" 12345,2234353,1000202 12326,2243653,1000098 Iam a beginner of C/C++ programming. I don't know how to tokenize the comma separated values.I used strtok function reading line by line using fgets.but it gives some weird behavior.It doesnot stripout the "" fully.Could any body have sample code for the same so that it will be helfful for my...
5
9145
by: Neil Robbins | last post by:
I have a text file that stores a number of records that I need to access in a vb.net application. Each of the fields that make up a record are of a fixed number of bytes. So for instance there is an address field of 240 bytes and there is an id field of 12 bytes. Where the data stored in a field does not fill the available number of bytes then spaces " " are inserted to fill the remaining bytes. There are no delimiters, just fields of a...
5
3761
by: liming | last post by:
Hi all, I have to parse two text files on a weekly basis. Each range from 300kb to 1mb in total. Each text file has 5 columns (name,id, dollar, startdate,enddate), everytime, a) I need to parse each row, extract each column 2) check if the data already exisinst in the db between startdate and end date 3) if not, then insert them into the the database, else, modify the record with the new data.
7
2489
by: pkirk25 | last post by:
My data is in a big file that I have no control over. Sometimes its over 30 MB and often there are several of them. It is machine generated and is nicely formatted. Example text follows: AuctioneerSnapshotDB = { = { = 20, = 1, = {
9
7717
by: NEWSGROUPS | last post by:
I have data in a table in an Access 2000 database that needs to be exported to a formatted text file. For instance, the first field is an account number that is formatted in the table as text and is 8 characters long. This field needs to be exported as pic(15) padded in the front with 0's (zeros). The next field an ID name that is 15 characters that needs to be exported as pic(20) padded with trailing spaces. There are about 5 fields in...
1
1525
by: mscw | last post by:
Hi, I'm trying parse a text file I'm pulling into VB6, but am unable to do like I usually do (Line Input/Print) since the text file inserts two carriage returns at the end of each line instead of one. I think the program ends because it assumes the second line (which is blank line due to the second carriage return) is EOF. Any suggestions? I'm stumped. Thanks!
0
1308
by: ghostface | last post by:
**How do I parse a textfile and edit only a certain part of it. Specifically, just the last column. My textfile looks like this. #Server Group 1 !Name01,192.168.2.201,5901,123456,description01,\\p4d1,01,on !Name02,192.168.2.202,5902,123456,description02,\\p4d1,01,on **Now I have an ASP page with buttons. **example. **When I click button1.. It means it will parse row 1 and last column. Changing
0
9810
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9656
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10526
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10570
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10240
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7772
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6972
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5641
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
3
3100
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.