csv file import problem

Someone

Hello

Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?

Angus

Sep 18 '06 #1

Subscribe Post Reply

2793

Spiros Bousbouras

Using the c file io functions if reading a csv file there is a problem if a

field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?

I'm not sure what symbol you mean by apostrophe or what it
has to do with carriage returns or csv (whatever that is) files.
I suspect that even if I did know all that I still wouldn't be able
to offer suggestions because you're not giving enough details
about what you're trying to achieve.

So you have some "csv" file on your platform where lines are
not terminated by (only) a carriage return. This file contains
lines which contain one or more CRs. Do you want to put the
line into some array or something more elaborate than that ?

Sep 18 '06 #2

Someone

Rules for a csv file are that each 'field' is separated by a comma (,) and
the end of each 'record' is terminated by a carriage return.

If a field contains a carriage return then you enclose the field in " and
" - quote marks. (Sorry apostrophe was wrong term to use).

So if I have this record:

1,1 The Avenue, New York, 345666,"hello in this field we have a carriage
return

more text here", 0233-444-7777,blue

If I use

char *fgets(char *s, size_t n, FILE *stream);

Then fgets gets to carriage return between two " "'s and I only half part of
the record.

I can obviously check if odd number of "'s in string and if so get to next
carriage return - but it does seem tricky getting it just right. I wondered
if anyone had come across this sort nof thing before and how they got round
it?

"Spiros Bousbouras" <sp****@gmail.comwrote in message
news:11*********************@h48g2000cwc.googlegro ups.com...

Using the c file io functions if reading a csv file there is a problem

if a

field contains an embedded carriage return. So I can check for

apostrophe

pairs but processing does seem a bit of a struggle. Has anyone got a

robust

algorithm or functions to use to reliably get a whole row of text from a

csv

file even if some fields contain embedded carriage returns?

I'm not sure what symbol you mean by apostrophe or what it
has to do with carriage returns or csv (whatever that is) files.
I suspect that even if I did know all that I still wouldn't be able
to offer suggestions because you're not giving enough details
about what you're trying to achieve.

So you have some "csv" file on your platform where lines are
not terminated by (only) a carriage return. This file contains
lines which contain one or more CRs. Do you want to put the
line into some array or something more elaborate than that ?

Sep 18 '06 #3

Richard Harnden

Someone wrote:

Rules for a csv file are that each 'field' is separated by a comma (,) and
the end of each 'record' is terminated by a carriage return.

If a field contains a carriage return then you enclose the field in " and
" - quote marks. (Sorry apostrophe was wrong term to use).

So if I have this record:

1,1 The Avenue, New York, 345666,"hello in this field we have a carriage
return

more text here", 0233-444-7777,blue

If I use

char *fgets(char *s, size_t n, FILE *stream);

Then fgets gets to carriage return between two " "'s and I only half part of
the record.

I can obviously check if odd number of "'s in string and if so get to next
carriage return - but it does seem tricky getting it just right. I wondered
if anyone had come across this sort nof thing before and how they got round
it?

This might be useful to you:
<ftp://ftp.iiug.org/pub/informix/pub/unl_utils-1.0.tgz>

Has these functions ...

char_esc *fgets_csv(char_esc *s, size_t n, FILE *stream);
int fputs_csv(const char_esc *s, FILE *stream);

int count_cols(const char_esc *s);
size_t get_col(const char_esc *s, int colno, char *col, size_t n);

--
rh

Sep 18 '06 #4

Spiros Bousbouras

Someone wrote:

Rules for a csv file are that each 'field' is separated by a comma (,) and
the end of each 'record' is terminated by a carriage return.

If a field contains a carriage return then you enclose the field in " and
" - quote marks. (Sorry apostrophe was wrong term to use).

So if I have this record:

1,1 The Avenue, New York, 345666,"hello in this field we have a carriage
return

more text here", 0233-444-7777,blue

If I use

char *fgets(char *s, size_t n, FILE *stream);

Then fgets gets to carriage return between two " "'s and I only half part of
the record.

I can obviously check if odd number of "'s in string and if so get to next
carriage return - but it does seem tricky getting it just right. I wondered
if anyone had come across this sort nof thing before and how they got round
it?

If quotes themselves cannot be escaped then it's not tricky
at all and what you're suggesting can be implemented in a
few lines of code. Try it !

Sep 18 '06 #5

CBFalconer

Someone wrote:

>
Rules for a csv file are that each 'field' is separated by a comma
(,) and the end of each 'record' is terminated by a carriage return.

Don't top-post. Your answer belongs after, or intermixed with, the
snipped material you quote. Note the snipping, which removes
everything not germane to your reply.

Many here automatically ignore top-posted articles.

--
"The most amazing achievement of the computer software industry
is its continuing cancellation of the steady and staggering
gains made by the computer hardware industry..." - Petroski
--
Posted via a free Usenet account from http://www.teranews.com

Sep 18 '06 #6

websnarf

Someone wrote:

Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?

Yes: http://www.pobox.com/~qed/bcsv.zip

I have not been able to find a truly "elegant" way of doing it, but I
believe that's just a side effect of the difficulty instrinsic in the
task required. The implementation given *is* robust however, and it
should have fairly good performance as well.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Sep 18 '06 #7

Ancient_Hacker

Someone wrote:

Hello

Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?

Angus

I'd try thinking a bit outside the "C" box. One line of Perl will fix
this problem. Something very close to:
c:\ perl -e 's/(".?)\n(.?")/$1$2/gs'

Note: may need tweaking, and definitely off-topic.

More on topic, something like a loop:

read a character.
if it's a quote, Q = 1 - Q;
if its a end-of-line, and Q is "1", throw it away, else write it out.

Sep 18 '06 #8

Malcolm

"Ancient_Hacker" <gr**@comcast.netwrote in message

>
I'd try thinking a bit outside the "C" box. One line of Perl will fix
this problem. Something very close to:
c:\ perl -e 's/(".?)\n(.?")/$1$2/gs'

What is this?
I know I frequently condemn "compileable gibberish". But that takes the
biscuit. And it isn't even joke code.

>
Note: may need tweaking, and definitely off-topic.

--
www.personal.leeds.ac.uk/~bgy1mm
freeware games to download.

Sep 18 '06 #9

Malcolm

"Someone" <no****@gmail.comwrote in message

Hello

Using the c file io functions if reading a csv file there is a problem if
a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a
robust
algorithm or functions to use to reliably get a whole row of text from a
csv
file even if some fields contain embedded carriage returns?

There's a csv format loader on my website, hidden in the "fuzzy logic trees"
files.
The quoting and escape rules are a nuisance, especially since embedded
newlines are allowed.

--
www.personal.leeds.ac.uk/~bgy1mm
freeware games to download.

Sep 18 '06 #10

Ancient_Hacker

Malcolm wrote:

"Ancient_Hacker" <gr**@comcast.netwrote in message

I'd try thinking a bit outside the "C" box. One line of Perl will fix
this problem. Something very close to:
c:\ perl -e 's/(".?)\n(.?")/$1$2/gs'
What is this?
I know I frequently condemn "compileable gibberish". But that takes the
biscuit. And it isn't even joke code.

it's very simple: "-e" means run this code over every line of the
input file list.

"s" means "substitute"

"/" is a delimiter

(".?) means find a double quote followed by the minimum number of
characters. The parens mean save this substring away as variable $1.
\n is what we're looking for inside the quotes.

(.?") means look for any number of characters (as few as possible) that
follow the \n up to the next double quote. the parens mean stash this
away as $2.

The next / separates the search from the repace string. We want to
replace the left mess with everything but the \n, so we say replace
with $1$2.
Then there's a closing / as the final delimiter.

Then there's the modifiers "gs", "g" means do it globally, "s" means
treat the input as one single string, even the newlines. Normally "s"
doesnt match across lines.

I know, a visual mess. But often the boss doesnt care how you do it,
just so it gets done in the next five minutes.

Sep 18 '06 #11

Ralph Moritz

"Someone" <no****@gmail.comwrites:

Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?

The code at [1] should handle embedded newlines. It might not quite
meet your requirements though, since it was designed to parse UNIX-style
DSV input rather than the (badly-designed) MS-style CSV which seems to
be so common. It's in the public domain though, so feel free to modify
it.

[1] http://tpa.org.za/RalphMoritz/Code#libdsv_sm

--
Ralph Moritz
Quantum Solutions Ph: +27 315 629 557
GPG Public Key: http://ralphm.info/me@ralphm.info.gpg

Sep 20 '06 #12

csv file import problem

Similar topics