473,395 Members | 1,341 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Splitting text files?

MM
Hi

I have never written any C programs before, but it seems that I need to do
so now. Hope some of you out there can spend a few minutes and help me by
writing a simple example of something fairly similar to what I need. I
really think it is a simple matter if you know C programming, but to me it
is not easy at all. An example from some "professional" C programmer will
probably give me all I need to complete it into exactly what I need.

Basically I need it to, in a specific way, split large text files containing
experimental data (stored in a known "form", see example below) into some
smaller files. The smaller files I will later use MATLAB to handle.
Theoretically I could use MATLAB to do it all (split the data file as well),
but when trying this it took WAY to long time (not possible, since I will
use this in another system). MATLAB is not really optimized to read/write
large text files (if the files are not structured in some ways...). And yes,
I need to do it all in C (not C++, VB, Fortran, Perl...).

Below is an example of the structure of the type of text file I will need to
split. Suppose the file name of this file is "simdata.txt". Open this file
for reading is probably one of the first things to do.

First there are some header lines. The header ends when the word "\Data:"
(without quotes) is found. All header lines are to be saved into a new file,
say "header.dat".

When "\Data:" has been identified, the first word "Time" is to be
identified. Probably it follows on the next row (after "\Data:"), but one
cannot be absolutely sure of this. Though, "Time" can be assumed to be the
first word in the row. So, when the word "Time" is identified, then starts
(including that row!) the first data block. This block ends when the next
block is identified in a similar way. Each data block is to be saved as
individual files, say "data1.dat", "data2.dat", and "data3.dat". We can
assue there are three blocks.

Hope this information is sufficient and that someone can help me with this.
I really need it, and cannot do much more without it.

Best regards,

MM

########################################
########### Example of file to split ###########
########################################

header line 1
header line 2
header line 3
.......
.......
.......
header line (last one)
\Data:
Time parameter2 parameter3 parameter4 ...
....... This is data block 1
....... This is data block 1
....... This is data block 1
....... This is data block 1
....... This is data block 1
....... This is data block 1
....... This is data block 1
....... This is data block 1
Time parameter5 parameter6 parameter7 ...
....... This is data block 2
....... This is data block 2
....... This is data block 2
....... This is data block 2
....... This is data block 2
....... This is data block 2
....... This is data block 2
....... This is data block 2
Time parameter8 parameter9 parameter10 ...
....... This is data block 3
....... This is data block 3
....... This is data block 3
....... This is data block 3
....... This is data block 3
....... This is data block 3
....... This is data block 3
....... This is data block 3

########################################
############# End of example #############
########################################

Nov 13 '05 #1
11 4721
On Tue, 8 Jul 2003 15:55:05 +0200, MM wrote:
"Tom St Denis" <to********@iahu.ca> wrote in message
news:_F*******************@news04.bloor.is.net.cab le.rogers.com...
MM wrote:
Hi


How is summer school going?

Fail much?

Tom

So, what's wrong with you? Tired of your tedious job? I'm not, which
is why I take on (for me) challenging tasks in my job.


Like asking a newsgroup to solve your problem?

Oh, and top posting is severely frowned upon.

--
main(int c,char*k,char*s){c>0?main(0,"adceoX$_k6][^hn","-7\
0#05&'40$.6'+).3+1%30"),puts(""):*s?c=!c?-*s:(putchar(45),c
),putchar(main(c,k+=*s-c*-1,s+1)):(s=0);return!s?10:10+*k;}
Nov 13 '05 #2
MM wrote:
Hi

I have never written any C programs before, but it seems that I need to do
so now. Hope some of you out there can spend a few minutes and help me by
writing a simple example of something fairly similar to what I need. I
really think it is a simple matter if you know C programming, but to me it
is not easy at all. An example from some "professional" C programmer will
probably give me all I need to complete it into exactly what I need.

Basically I need it to, in a specific way, split large text files containing
experimental data (stored in a known "form", see example below) into some
smaller files. The smaller files I will later use MATLAB to handle.
Theoretically I could use MATLAB to do it all (split the data file as well),
but when trying this it took WAY to long time (not possible, since I will
use this in another system). MATLAB is not really optimized to read/write
large text files (if the files are not structured in some ways...). And yes,
I need to do it all in C (not C++, VB, Fortran, Perl...).

Don't pay too much attention to Tom StDenis, he has a pretty wide mouth.

As others have pointed out, bottom-posting is the rule in c.l.c, and so
is not doing people's work for them. On the other hand, here's a handful
of advice:

- it might be presomptuous to take on a C project without having a
few basic notions of the language. If you are as serious as you claim
about your job and taking on challenging tasks, do get Kernighan &
Ritchie 2nd ed. to learn about the language. I would even think that
when you are through with the book, you should be way able to solve your
little problem by yourself.
- nonetheless, if you want to skip on the concepts part and start
fighting with your little program, you should definitely explore the
functions fopen, fgets, strcmp, fputs, fclose. Have a look at, say, the
ggets library, if only to get an idea of the common issues involved with
I/O in C.

--
Bertrand Mollinier Toublet
"Reality exists" - Richard Heathfield, 1 July 2003

Nov 13 '05 #3
MM wrote:

The following is untested...

[snip - split this]

#include <stdio.h>
#include <string.h>

int
main(void)
{
FILE *fp;
char fname[4+2+4+1]; /* dataNN.txt */
char buf[256]; /* max line length is 255 characters */
int i = 0;

/* find start of data segment */
while(fgets(buf, sizeof buf, stdio) != 0){
if(strcmp("\\Data:", buf) == 0)
break;
}

while(fgets(buf, sizeof buf, stdio) != 0){
/* lines starting with '#' are skipped as comments */
/* blank lines are also skipped */
if(buf[0] == '#' || buf[0] == '\n')
continue;

/* write each block to a separate file */
if(strncmp("Time", buf, 4) == 0){
if(i > 0)
fclose(fp);
sprintf(fname, "data%02d.txt", ++i);
if((fp=fopen(fname, "w")) == 0){
perror(fname);
exit(EXIT_FAILURE);
}
}
fputs(buf, fp);
}
fclose(fp);
return 0;
}

HTH,

/david

--
Andre, a simple peasant, had only one thing on his mind as he crept
along the East wall: 'Andre, creep... Andre, creep... Andre, creep.'
-- unknown
Nov 13 '05 #4
MM
Ok, I get it. But, the alternative for me would be to say "Now, I cannot do
this - it will have to wait until after summer". Of course there are people
in my company that could help me with this, but since it is summer and
pretty much everyone is on holidays, then I have to try to find other ways
to solve the problems I encounter. I thought one way was to ask people who
really knows C programming. Maybe I was wrong... But I still hope that there
ARE people who can understand what I need and are willing to help me.

"Pieter Droogendijk" <gi*@binky.homeunix.org> wrote in message
news:20030708162131.438187fd.gi*@binky.homeunix.or g...
Like asking a newsgroup to solve your problem?

Oh, and top posting is severely frowned upon.

--
main(int c,char*k,char*s){c>0?main(0,"adceoX$_k6][^hn","-7\
0#05&'40$.6'+).3+1%30"),puts(""):*s?c=!c?-*s:(putchar(45),c
),putchar(main(c,k+=*s-c*-1,s+1)):(s=0);return!s?10:10+*k;}

Nov 13 '05 #5
MM
Many thanks to both David for his code (I will have a look at it and see if
I can get it all to work) and Bertrand (yes, I will get to learn much more
of C, starting right away) for his advice.

If I have had a lot of time I would not have asked the HG for all this.
Instead I would have begun trying to write the program all from the
beginning myself, and only asking the NG for specific parts. But I really
don't have the time now.

By the way, what is "bottom-posting"?

MM
Nov 13 '05 #6
This is top-posting (my reply above yours), frowned upon in c.l.c.

MM wrote:
Many thanks to both David for his code (I will have a look at it and see if
I can get it all to work) and Bertrand (yes, I will get to learn much more
of C, starting right away) for his advice.

If I have had a lot of time I would not have asked the HG for all this.
Instead I would have begun trying to write the program all from the
beginning myself, and only asking the NG for specific parts. But I really
don't have the time now.

By the way, what is "bottom-posting"?

This is bottom-posting (my reply below yours), de facto standard in c.l.c.
--
Bertrand Mollinier Toublet
"Reality exists" - Richard Heathfield, 1 July 2003

Nov 13 '05 #7
Evil top-posted text.

On Tue, 8 Jul 2003 17:04:56 +0200, MM wrote:
Many thanks to both David for his code (I will have a look at it and
see if I can get it all to work) and Bertrand (yes, I will get to
learn much more of C, starting right away) for his advice.
Good Non-top-posted text.
If I have had a lot of time I would not have asked the HG for all
this. Instead I would have begun trying to write the program all from
the beginning myself, and only asking the NG for specific parts. But I
really don't have the time now.

By the way, what is "bottom-posting"?

MM


Bottom posting (as in opposite of top-posting) is replying to a post
where your own comments appear BELOW some amount of quoted text. like
this.

--
main(int c,char*k,char*s){c>0?main(0,"adceoX$_k6][^hn","-7\
0#05&'40$.6'+).3+1%30"),puts(""):*s?c=!c?-*s:(putchar(45),c
),putchar(main(c,k+=*s-c*-1,s+1)):(s=0);return!s?10:10+*k;}
Nov 13 '05 #8

MM <do*******@yahoo.se> wrote in message
news:WU***************@nntpserver.swip.net...
Ok, I get it. But, the alternative for me would be to say "Now, I cannot do this - it will have to wait until after summer". Of course there are people in my company that could help me with this, but since it is summer and
pretty much everyone is on holidays, then I have to try to find other ways
to solve the problems I encounter. I thought one way was to ask people who
really knows C programming. Maybe I was wrong... But I still hope that there ARE people who can understand what I need and are willing to help me.


Again, please don't top post.

Then please note that most folks don't consider
'helping' and 'doing it for you' to be the same
thing.

Post the code of your best attempt, and then I
suspect you'll get plenty of assistance.

-Mike

Nov 13 '05 #9
David Rubin wrote:

MM wrote:

The following is untested...

[snip - split this]

#include <stdio.h>
#include <stdlib.h>
#include <string.h> int
main(void)
{
FILE *fp;
char fname[4+2+4+1]; /* dataNN.txt */
char buf[256]; /* max line length is 255 characters */
int i = 0; /* find start of data segment */
while(fgets(buf, sizeof buf, stdio) != 0){
while(fgets(buf, sizeof buf, stdin) != 0){
if(strcmp("\\Data:", buf) == 0)
if(strncmp("\\Data:", buf, 6) == 0)
break;
}

while(fgets(buf, sizeof buf, stdio) != 0){


while(fgets(buf, sizeof buf, stdin) != 0){

/david

--
Andre, a simple peasant, had only one thing on his mind as he crept
along the East wall: 'Andre, creep... Andre, creep... Andre, creep.'
-- unknown
Nov 13 '05 #10
MM wrote:

Ok, I get it. But, the alternative for me would be to say "Now, I cannot do
this - it will have to wait until after summer". Of course there are people
in my company that could help me with this, but since it is summer and
pretty much everyone is on holidays, then I have to try to find other ways
to solve the problems I encounter. I thought one way was to ask people who
really knows C programming. Maybe I was wrong... But I still hope that there
ARE people who can understand what I need and are willing to help me.

"Pieter Droogendijk" <gi*@binky.homeunix.org> wrote in message
news:20030708162131.438187fd.gi*@binky.homeunix.or g...
Like asking a newsgroup to solve your problem?

Oh, and top posting is severely frowned upon.

No MM, I suppose you still don't get it. Not only did you top post over
the message asking you not to, you still expect someone here to do the
job for you. As you mention above, you only came here because you
couldn't get anyone in your company to do it for you until after summer.

This sounds like a job for "Consultant Dude" and that you get to pay
for.
--
Joe Wright mailto:jo********@earthlink.net
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Nov 13 '05 #11
MM
Ok, I've learned a lot, both from all the critics given, and from the nice
code by David Rubin (many thanks again, David!).

I've looked at the code, understood it, and adjusted it a little (for
example to create the header file and to read data from an input file
instead of from the "stdio") and no I have three questions:

1) How do I change the code so that I use "input arguments" to specify the
file names (the name of the input file and maybe also of the output files)?
For example, if I compile the code and that the application then gets the
name "splitdata", then I want to be able to call my application with
something like this:
splitdata datafile.txt header.dat dblock.dat
The last two arguments are not very important to be able to specify, but it
would of course be nice.
In the code as I have it now, the name of the input file is specified in
line 13 with
char tname[] = "Example.txt";
So, I want to skip this "hard coded" name specification. Also the length of
the input file name is unknown.

2) I cannot figure out why in line 12 I have to specify the length of the
char array (is it such?) 'fname', since if I don't, then output data block
files later than number 9 will not be written correctly or not written at
all. Not very important for me, but I'm interested.

3) In line 70 I want to include the number of data blocks found, i.e. the
value of the counter 'i' after "NumDataBlocks=". How do I do this, "append"
a string with an integer?

Many thanks in advance!

MM

================================================== ===============
=== Code, including line numbers (the code without line numbers is included
below this one) ===
================================================== ===============

1: #include <stdio.h>
2: #include <stdlib.h>
3: #include <string.h>
4:
5: #define DATASTART "\\Data:"
6: #define BLOCKSTART "Time"
7:
8: int main()
9: {
10: FILE *fh, *fp, *fq;
11: char hname[] = "header.dat"; /* name of header file to write */
12: char fname[6+2+4+1]; /* dblockNN.dat */
13: char tname[] = "Example.txt"; /* name of input file to split */
14: char buf[1001]; /* max line length is 1000 characters */
15: int i = 0;
16:
17: /* open input file for reading */
18: if((fq=fopen(tname, "r")) == 0) {
19: perror(tname);
20: exit(EXIT_FAILURE);
21: }
22:
23: /* open header output file */
24: if((fh=fopen(hname, "w")) == 0) {
25: perror(fname);
26: exit(EXIT_FAILURE);
27: }
28:
29: /* print data to header file */
30: /* if start of data segment is found then close header file */
31: // while(fgets(buf, sizeof buf, stdin) != 0) {
32: while(fgets(buf, sizeof buf, fq) != 0) {
33: // if(strncmp("\\Data:", buf, 6) == 0) {
34: if(strncmp(DATASTART, buf, 6) == 0) {
35: fclose(fh);
36: break;
37: }
38: fputs(buf, fh);
39: }
40:
41: // while(fgets(buf, sizeof buf, stdin) != 0) {
42: while(fgets(buf, sizeof buf, fq) != 0) {
43: /* lines starting with '#' are skipped as comments */
44: /* blank lines are also skipped */
45: /*
46: if(buf[0] == '#' || buf[0] == '\n')
47: continue;
48: */
49:
50: /* write each block to a separate file */
51: // if(strncmp("Time", buf, 4) == 0) {
52: if(strncmp(BLOCKSTART, buf, 4) == 0) {
53:
54: if(i > 0)
55: fclose(fp);
56: sprintf(fname, "dblock%02d.dat", ++i);
57: if((fp=fopen(fname, "w")) == 0) {
58: perror(fname);
59: exit(EXIT_FAILURE);
60: }
61: }
62: fputs(buf, fp);
63: }
64: /* open header output file again */
65: if((fh=fopen(hname, "a")) == 0) {
66: perror(fname);
67: exit(EXIT_FAILURE);
68: }
69: /* print the number of data blocks found last in the header file */
70: fputs("NumDataBlocks=", fh);
71: fclose(fh);
72:
73: /* close the other files */
74: fclose(fp);
75: fclose(fq);
76: return 0;
77: }

=========================
=== Code without line numbers ===
=========================

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define DATASTART "\\Data:"
#define BLOCKSTART "Time"

int main()
{
FILE *fh, *fp, *fq;
char hname[] = "header.dat"; /* name of header file to write */
char fname[6+2+4+1]; /* dblockNN.dat */
char tname[] = "Example.txt"; /* name of input file to split */
char buf[1001]; /* max line length is 1000 characters */
int i = 0;

/* open input file for reading */
if((fq=fopen(tname, "r")) == 0) {
perror(tname);
exit(EXIT_FAILURE);
}

/* open header output file */
if((fh=fopen(hname, "w")) == 0) {
perror(fname);
exit(EXIT_FAILURE);
}

/* print data to header file */
/* if start of data segment is found then close header file */
// while(fgets(buf, sizeof buf, stdin) != 0) {
while(fgets(buf, sizeof buf, fq) != 0) {
// if(strncmp("\\Data:", buf, 6) == 0) {
if(strncmp(DATASTART, buf, 6) == 0) {
fclose(fh);
break;
}
fputs(buf, fh);
}

// while(fgets(buf, sizeof buf, stdin) != 0) {
while(fgets(buf, sizeof buf, fq) != 0) {
/* lines starting with '#' are skipped as comments */
/* blank lines are also skipped */
/*
if(buf[0] == '#' || buf[0] == '\n')
continue;
*/

/* write each block to a separate file */
// if(strncmp("Time", buf, 4) == 0) {
if(strncmp(BLOCKSTART, buf, 4) == 0) {

if(i > 0)
fclose(fp);
sprintf(fname, "dblock%02d.dat", ++i);
if((fp=fopen(fname, "w")) == 0) {
perror(fname);
exit(EXIT_FAILURE);
}
}
fputs(buf, fp);
}
/* open header output file again */
if((fh=fopen(hname, "a")) == 0) {
perror(fname);
exit(EXIT_FAILURE);
}
/* print the number of data blocks found last in the header file */
fputs("NumDataBlocks=", fh);
fclose(fh);

/* close the other files */
fclose(fp);
fclose(fq);
return 0;
}

Nov 13 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: somaBoy MX | last post by:
I'm building a site where I need to pull very large blocks from a database. I would like to make navigation a little more user friendly by splitting text in pages which can then be navigated. I...
3
by: Sandman | last post by:
I am splitting a text block into paragraphs, to be able to add images and stuff like that to a specific paragraph in a content management system. Well, right now I'm splittin on two or more...
5
by: Jon | last post by:
I am not too familiar with working with files, so I'd like some advice. I need to write a function for my program that take large text files (> 150 MB) and splits them into several text files of...
10
by: klineb | last post by:
Good Day, I have written and utility to convert our DOS COBOL data files to a SQL Server database. Part of the process requires parsing each line into a sql statement and validting the data to...
5
by: hecuba007 | last post by:
My apologies if this question has been asked before .. I would like to split large files into smaller chunks for uploading to php for re-assembly on the server. Is there a (relatively) simple...
2
by: Jenny | last post by:
Hello All! I have a long XML file that I should transmit to other computer using http. Problem is that the whole XML Document is too large for one transmitting. What is the nicest way to...
4
by: nikila | last post by:
Hi, I am trying to split large xml files to smaller xml files using c#.net. can you please provide any sample code for this? I have to split the file if the size is more than 10 MB. Also, xml...
0
by: aj | last post by:
SQL Server 2005 SP2 I read the excellent advice "Optimize tempdb in SQL Server by striping and splitting to multiple files" at...
1
by: apking | last post by:
Hi Friends, Iam new to PHP.i am designing a CSS layout.Now i need to split the head and footer and left parts.then include those files in every page.This is mu code How to split this.and splitted...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.