By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,066 Members | 1,826 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,066 IT Pros & Developers. It's quick & easy.

csv file import problem

P: n/a
Hello

Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?

Angus
Sep 18 '06 #1
Share this Question
Share on Google+
11 Replies


P: n/a
Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?
I'm not sure what symbol you mean by apostrophe or what it
has to do with carriage returns or csv (whatever that is) files.
I suspect that even if I did know all that I still wouldn't be able
to offer suggestions because you're not giving enough details
about what you're trying to achieve.

So you have some "csv" file on your platform where lines are
not terminated by (only) a carriage return. This file contains
lines which contain one or more CRs. Do you want to put the
line into some array or something more elaborate than that ?

Sep 18 '06 #2

P: n/a
Rules for a csv file are that each 'field' is separated by a comma (,) and
the end of each 'record' is terminated by a carriage return.

If a field contains a carriage return then you enclose the field in " and
" - quote marks. (Sorry apostrophe was wrong term to use).

So if I have this record:

1,1 The Avenue, New York, 345666,"hello in this field we have a carriage
return

more text here", 0233-444-7777,blue

If I use

char *fgets(char *s, size_t n, FILE *stream);

Then fgets gets to carriage return between two " "'s and I only half part of
the record.

I can obviously check if odd number of "'s in string and if so get to next
carriage return - but it does seem tricky getting it just right. I wondered
if anyone had come across this sort nof thing before and how they got round
it?

"Spiros Bousbouras" <sp****@gmail.comwrote in message
news:11*********************@h48g2000cwc.googlegro ups.com...
Using the c file io functions if reading a csv file there is a problem
if a
field contains an embedded carriage return. So I can check for
apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a
robust
algorithm or functions to use to reliably get a whole row of text from a
csv
file even if some fields contain embedded carriage returns?

I'm not sure what symbol you mean by apostrophe or what it
has to do with carriage returns or csv (whatever that is) files.
I suspect that even if I did know all that I still wouldn't be able
to offer suggestions because you're not giving enough details
about what you're trying to achieve.

So you have some "csv" file on your platform where lines are
not terminated by (only) a carriage return. This file contains
lines which contain one or more CRs. Do you want to put the
line into some array or something more elaborate than that ?

Sep 18 '06 #3

P: n/a
Someone wrote:
Rules for a csv file are that each 'field' is separated by a comma (,) and
the end of each 'record' is terminated by a carriage return.

If a field contains a carriage return then you enclose the field in " and
" - quote marks. (Sorry apostrophe was wrong term to use).

So if I have this record:

1,1 The Avenue, New York, 345666,"hello in this field we have a carriage
return

more text here", 0233-444-7777,blue

If I use

char *fgets(char *s, size_t n, FILE *stream);

Then fgets gets to carriage return between two " "'s and I only half part of
the record.

I can obviously check if odd number of "'s in string and if so get to next
carriage return - but it does seem tricky getting it just right. I wondered
if anyone had come across this sort nof thing before and how they got round
it?
This might be useful to you:
<ftp://ftp.iiug.org/pub/informix/pub/unl_utils-1.0.tgz>

Has these functions ...

char_esc *fgets_csv(char_esc *s, size_t n, FILE *stream);
int fputs_csv(const char_esc *s, FILE *stream);

int count_cols(const char_esc *s);
size_t get_col(const char_esc *s, int colno, char *col, size_t n);

--
rh
Sep 18 '06 #4

P: n/a
Someone wrote:
Rules for a csv file are that each 'field' is separated by a comma (,) and
the end of each 'record' is terminated by a carriage return.

If a field contains a carriage return then you enclose the field in " and
" - quote marks. (Sorry apostrophe was wrong term to use).

So if I have this record:

1,1 The Avenue, New York, 345666,"hello in this field we have a carriage
return

more text here", 0233-444-7777,blue

If I use

char *fgets(char *s, size_t n, FILE *stream);

Then fgets gets to carriage return between two " "'s and I only half part of
the record.

I can obviously check if odd number of "'s in string and if so get to next
carriage return - but it does seem tricky getting it just right. I wondered
if anyone had come across this sort nof thing before and how they got round
it?
If quotes themselves cannot be escaped then it's not tricky
at all and what you're suggesting can be implemented in a
few lines of code. Try it !

Sep 18 '06 #5

P: n/a
Someone wrote:
>
Rules for a csv file are that each 'field' is separated by a comma
(,) and the end of each 'record' is terminated by a carriage return.
Don't top-post. Your answer belongs after, or intermixed with, the
snipped material you quote. Note the snipping, which removes
everything not germane to your reply.

Many here automatically ignore top-posted articles.

--
"The most amazing achievement of the computer software industry
is its continuing cancellation of the steady and staggering
gains made by the computer hardware industry..." - Petroski
--
Posted via a free Usenet account from http://www.teranews.com

Sep 18 '06 #6

P: n/a
Someone wrote:
Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?
Yes: http://www.pobox.com/~qed/bcsv.zip

I have not been able to find a truly "elegant" way of doing it, but I
believe that's just a side effect of the difficulty instrinsic in the
task required. The implementation given *is* robust however, and it
should have fairly good performance as well.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Sep 18 '06 #7

P: n/a

Someone wrote:
Hello

Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?

Angus
I'd try thinking a bit outside the "C" box. One line of Perl will fix
this problem. Something very close to:
c:\ perl -e 's/(".?)\n(.?")/$1$2/gs'

Note: may need tweaking, and definitely off-topic.

More on topic, something like a loop:

read a character.
if it's a quote, Q = 1 - Q;
if its a end-of-line, and Q is "1", throw it away, else write it out.

Sep 18 '06 #8

P: n/a
"Ancient_Hacker" <gr**@comcast.netwrote in message
>
I'd try thinking a bit outside the "C" box. One line of Perl will fix
this problem. Something very close to:
c:\ perl -e 's/(".?)\n(.?")/$1$2/gs'
What is this?
I know I frequently condemn "compileable gibberish". But that takes the
biscuit. And it isn't even joke code.
>
Note: may need tweaking, and definitely off-topic.
--
www.personal.leeds.ac.uk/~bgy1mm
freeware games to download.

Sep 18 '06 #9

P: n/a
"Someone" <no****@gmail.comwrote in message
Hello

Using the c file io functions if reading a csv file there is a problem if
a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a
robust
algorithm or functions to use to reliably get a whole row of text from a
csv
file even if some fields contain embedded carriage returns?
There's a csv format loader on my website, hidden in the "fuzzy logic trees"
files.
The quoting and escape rules are a nuisance, especially since embedded
newlines are allowed.

--
www.personal.leeds.ac.uk/~bgy1mm
freeware games to download.
Sep 18 '06 #10

P: n/a

Malcolm wrote:
"Ancient_Hacker" <gr**@comcast.netwrote in message

I'd try thinking a bit outside the "C" box. One line of Perl will fix
this problem. Something very close to:
c:\ perl -e 's/(".?)\n(.?")/$1$2/gs'
What is this?
I know I frequently condemn "compileable gibberish". But that takes the
biscuit. And it isn't even joke code.
it's very simple: "-e" means run this code over every line of the
input file list.

"s" means "substitute"

"/" is a delimiter

(".?) means find a double quote followed by the minimum number of
characters. The parens mean save this substring away as variable $1.
\n is what we're looking for inside the quotes.

(.?") means look for any number of characters (as few as possible) that
follow the \n up to the next double quote. the parens mean stash this
away as $2.

The next / separates the search from the repace string. We want to
replace the left mess with everything but the \n, so we say replace
with $1$2.
Then there's a closing / as the final delimiter.

Then there's the modifiers "gs", "g" means do it globally, "s" means
treat the input as one single string, even the newlines. Normally "s"
doesnt match across lines.

I know, a visual mess. But often the boss doesnt care how you do it,
just so it gets done in the next five minutes.

Sep 18 '06 #11

P: n/a
"Someone" <no****@gmail.comwrites:
Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?
The code at [1] should handle embedded newlines. It might not quite
meet your requirements though, since it was designed to parse UNIX-style
DSV input rather than the (badly-designed) MS-style CSV which seems to
be so common. It's in the public domain though, so feel free to modify
it.

[1] http://tpa.org.za/RalphMoritz/Code#libdsv_sm

--
Ralph Moritz
Quantum Solutions Ph: +27 315 629 557
GPG Public Key: http://ralphm.info/me@ralphm.info.gpg
Sep 20 '06 #12

This discussion thread is closed

Replies have been disabled for this discussion.