473,386 Members | 1,924 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

csv file import problem

Hello

Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?

Angus
Sep 18 '06 #1
11 2793
Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?
I'm not sure what symbol you mean by apostrophe or what it
has to do with carriage returns or csv (whatever that is) files.
I suspect that even if I did know all that I still wouldn't be able
to offer suggestions because you're not giving enough details
about what you're trying to achieve.

So you have some "csv" file on your platform where lines are
not terminated by (only) a carriage return. This file contains
lines which contain one or more CRs. Do you want to put the
line into some array or something more elaborate than that ?

Sep 18 '06 #2
Rules for a csv file are that each 'field' is separated by a comma (,) and
the end of each 'record' is terminated by a carriage return.

If a field contains a carriage return then you enclose the field in " and
" - quote marks. (Sorry apostrophe was wrong term to use).

So if I have this record:

1,1 The Avenue, New York, 345666,"hello in this field we have a carriage
return

more text here", 0233-444-7777,blue

If I use

char *fgets(char *s, size_t n, FILE *stream);

Then fgets gets to carriage return between two " "'s and I only half part of
the record.

I can obviously check if odd number of "'s in string and if so get to next
carriage return - but it does seem tricky getting it just right. I wondered
if anyone had come across this sort nof thing before and how they got round
it?

"Spiros Bousbouras" <sp****@gmail.comwrote in message
news:11*********************@h48g2000cwc.googlegro ups.com...
Using the c file io functions if reading a csv file there is a problem
if a
field contains an embedded carriage return. So I can check for
apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a
robust
algorithm or functions to use to reliably get a whole row of text from a
csv
file even if some fields contain embedded carriage returns?

I'm not sure what symbol you mean by apostrophe or what it
has to do with carriage returns or csv (whatever that is) files.
I suspect that even if I did know all that I still wouldn't be able
to offer suggestions because you're not giving enough details
about what you're trying to achieve.

So you have some "csv" file on your platform where lines are
not terminated by (only) a carriage return. This file contains
lines which contain one or more CRs. Do you want to put the
line into some array or something more elaborate than that ?

Sep 18 '06 #3
Someone wrote:
Rules for a csv file are that each 'field' is separated by a comma (,) and
the end of each 'record' is terminated by a carriage return.

If a field contains a carriage return then you enclose the field in " and
" - quote marks. (Sorry apostrophe was wrong term to use).

So if I have this record:

1,1 The Avenue, New York, 345666,"hello in this field we have a carriage
return

more text here", 0233-444-7777,blue

If I use

char *fgets(char *s, size_t n, FILE *stream);

Then fgets gets to carriage return between two " "'s and I only half part of
the record.

I can obviously check if odd number of "'s in string and if so get to next
carriage return - but it does seem tricky getting it just right. I wondered
if anyone had come across this sort nof thing before and how they got round
it?
This might be useful to you:
<ftp://ftp.iiug.org/pub/informix/pub/unl_utils-1.0.tgz>

Has these functions ...

char_esc *fgets_csv(char_esc *s, size_t n, FILE *stream);
int fputs_csv(const char_esc *s, FILE *stream);

int count_cols(const char_esc *s);
size_t get_col(const char_esc *s, int colno, char *col, size_t n);

--
rh
Sep 18 '06 #4
Someone wrote:
Rules for a csv file are that each 'field' is separated by a comma (,) and
the end of each 'record' is terminated by a carriage return.

If a field contains a carriage return then you enclose the field in " and
" - quote marks. (Sorry apostrophe was wrong term to use).

So if I have this record:

1,1 The Avenue, New York, 345666,"hello in this field we have a carriage
return

more text here", 0233-444-7777,blue

If I use

char *fgets(char *s, size_t n, FILE *stream);

Then fgets gets to carriage return between two " "'s and I only half part of
the record.

I can obviously check if odd number of "'s in string and if so get to next
carriage return - but it does seem tricky getting it just right. I wondered
if anyone had come across this sort nof thing before and how they got round
it?
If quotes themselves cannot be escaped then it's not tricky
at all and what you're suggesting can be implemented in a
few lines of code. Try it !

Sep 18 '06 #5
Someone wrote:
>
Rules for a csv file are that each 'field' is separated by a comma
(,) and the end of each 'record' is terminated by a carriage return.
Don't top-post. Your answer belongs after, or intermixed with, the
snipped material you quote. Note the snipping, which removes
everything not germane to your reply.

Many here automatically ignore top-posted articles.

--
"The most amazing achievement of the computer software industry
is its continuing cancellation of the steady and staggering
gains made by the computer hardware industry..." - Petroski
--
Posted via a free Usenet account from http://www.teranews.com

Sep 18 '06 #6
Someone wrote:
Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?
Yes: http://www.pobox.com/~qed/bcsv.zip

I have not been able to find a truly "elegant" way of doing it, but I
believe that's just a side effect of the difficulty instrinsic in the
task required. The implementation given *is* robust however, and it
should have fairly good performance as well.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Sep 18 '06 #7

Someone wrote:
Hello

Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?

Angus
I'd try thinking a bit outside the "C" box. One line of Perl will fix
this problem. Something very close to:
c:\ perl -e 's/(".?)\n(.?")/$1$2/gs'

Note: may need tweaking, and definitely off-topic.

More on topic, something like a loop:

read a character.
if it's a quote, Q = 1 - Q;
if its a end-of-line, and Q is "1", throw it away, else write it out.

Sep 18 '06 #8
"Ancient_Hacker" <gr**@comcast.netwrote in message
>
I'd try thinking a bit outside the "C" box. One line of Perl will fix
this problem. Something very close to:
c:\ perl -e 's/(".?)\n(.?")/$1$2/gs'
What is this?
I know I frequently condemn "compileable gibberish". But that takes the
biscuit. And it isn't even joke code.
>
Note: may need tweaking, and definitely off-topic.
--
www.personal.leeds.ac.uk/~bgy1mm
freeware games to download.

Sep 18 '06 #9
"Someone" <no****@gmail.comwrote in message
Hello

Using the c file io functions if reading a csv file there is a problem if
a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a
robust
algorithm or functions to use to reliably get a whole row of text from a
csv
file even if some fields contain embedded carriage returns?
There's a csv format loader on my website, hidden in the "fuzzy logic trees"
files.
The quoting and escape rules are a nuisance, especially since embedded
newlines are allowed.

--
www.personal.leeds.ac.uk/~bgy1mm
freeware games to download.
Sep 18 '06 #10

Malcolm wrote:
"Ancient_Hacker" <gr**@comcast.netwrote in message

I'd try thinking a bit outside the "C" box. One line of Perl will fix
this problem. Something very close to:
c:\ perl -e 's/(".?)\n(.?")/$1$2/gs'
What is this?
I know I frequently condemn "compileable gibberish". But that takes the
biscuit. And it isn't even joke code.
it's very simple: "-e" means run this code over every line of the
input file list.

"s" means "substitute"

"/" is a delimiter

(".?) means find a double quote followed by the minimum number of
characters. The parens mean save this substring away as variable $1.
\n is what we're looking for inside the quotes.

(.?") means look for any number of characters (as few as possible) that
follow the \n up to the next double quote. the parens mean stash this
away as $2.

The next / separates the search from the repace string. We want to
replace the left mess with everything but the \n, so we say replace
with $1$2.
Then there's a closing / as the final delimiter.

Then there's the modifiers "gs", "g" means do it globally, "s" means
treat the input as one single string, even the newlines. Normally "s"
doesnt match across lines.

I know, a visual mess. But often the boss doesnt care how you do it,
just so it gets done in the next five minutes.

Sep 18 '06 #11
"Someone" <no****@gmail.comwrites:
Using the c file io functions if reading a csv file there is a problem if a
field contains an embedded carriage return. So I can check for apostrophe
pairs but processing does seem a bit of a struggle. Has anyone got a robust
algorithm or functions to use to reliably get a whole row of text from a csv
file even if some fields contain embedded carriage returns?
The code at [1] should handle embedded newlines. It might not quite
meet your requirements though, since it was designed to parse UNIX-style
DSV input rather than the (badly-designed) MS-style CSV which seems to
be so common. It's in the public domain though, so feel free to modify
it.

[1] http://tpa.org.za/RalphMoritz/Code#libdsv_sm

--
Ralph Moritz
Quantum Solutions Ph: +27 315 629 557
GPG Public Key: http://ralphm.info/me@ralphm.info.gpg
Sep 20 '06 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: news | last post by:
Our production database in an exported textfil runs about 60 MB. Compressed that's about 9 MB. I'm trying to import the export into another machine running FC3 and mySQL 11.18, and it appears as...
5
by: Al | last post by:
Hi all We have created a xml file that imports a single project using the Import element. This project compiles to a class library, but has references to two other projects that are also class...
1
by: Alex Maghen | last post by:
I've been using my installed VS 2005 for several months with no problem. Suddenly, something strange is happeneing and I'm not sure if it's something I'm missing in ASP.NET or something that's...
3
by: Phoe6 | last post by:
Hi all, I had a filesystem crash and when I retrieved the data back the files had random names without extension. I decided to write a script to determine the file extension and create a newfile...
1
by: DierkErdmann | last post by:
Hi ! I am trying to create an exe file using pyinstaller. Running the created exe-File gives the error message "" Traceback (most recent call last): File "<string>", line 8, in <module> File...
1
by: Icarus27 | last post by:
I am having another problem. I got the first problem fixed and it is working almost right but when I import the information it doesn't pick up the information for the fourth array. Here is the...
4
by: AshishMishra16 | last post by:
HI friends, I am using the Flex to upload files to server. I m getting all the details about the file, but I m not able to upload it to Server. Here is the code i m using for both flex & for...
1
by: stevedub | last post by:
I am having some trouble configuring my array to read from a sequential file, and then calling on that to fill an array of interests. I think I have the class set up to read the file, but when I run...
3
by: vijaykumardahiya | last post by:
Hello To Every One, I want to know that when I upload the File like a image from html page Its not show on servlet page using appropriate logic. I read the FileUpload Home page.But I am still...
2
by: JohnLorac | last post by:
Hello, I'm trying to load and write file on local disc drive using signed javascript file. But I have experienced problem running this url: ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.