473,325 Members | 2,872 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,325 software developers and data experts.

reading file backwards and parsing

I have some log files that I'm working with that look like this:

1000000000 3456 1234
1000000001 3456 1235
1000020002 3456 1223
1000203044 3456 986
etc.

I'm trying to read the file backwards and just look at the first
column. Here's what I've got so far:

in=fopen(fpath,"rb");
if (in!=NULL) {
fseek(in,0,SEEK_END);
back1line(in); /* function that goes back 1 line */

while (1) {
pos=ftell(in);
fgets(buffer,1024,in);
buffer[strlen(buffer)-1]=0;

printf("line=%s\n", buffer);

len=strlen(buffer);
if ((buffer[len-1] == '\n')||(buffer[len-1] == '\r'))
buffer[len-1] = '\0';
memset(cptrs,0,sizeof(cptrs));
i=0;
cptrs[i]=buffer;
while (cptrs[i] && i<3) {
++i;
cptrs[i]=strchr(cptrs[i-1],' ');
if (cptrs[i]==NULL) {
printf("we break in here\n");
break;
}
*cptrs[i]=0;
++cptrs[i];
}
lprintf(0,"got 0 = <%d>\n",atoi(cptrs[0]));
lprintf(0,"got 1 = <%d>\n",atoi(cptrs[1]));
lprintf(0,"got 2 = <%d>\n",atoi(cptrs[2]));
fseek(in,pos,0);
if (back1line(in)!=0)
return -1;
}
}
else
rc=1;
fclose(in);

This just prints out the elements of the last line of the file and I'm
not sure why. If I replace the while loop that splits on ' ' with a
while loop that uses strtok to tokenize the string, it works great.
But, it seems to me that the above should work. Then again, what do I
know?

TIA!

-matt
Nov 14 '05 #1
11 1895
Matt DeFoor wrote:

I have some log files that I'm working with that look like this:

1000000000 3456 1234
1000000001 3456 1235
1000020002 3456 1223
1000203044 3456 986
etc.

I'm trying to read the file backwards and just look at the first
column. Here's what I've got so far:
[snipped]


One obstacle to diagnosing your difficulty is that we're
forced to make guesses about the portions of your code that
you didn't provide. You omitted all the declarations (the
one I'm most interested in is `cptrs'), and you omitted the
definitions of the back1line() and lprintf() functions.

Albert Einstein is supposed to have said "Things should
be made as simple as possible, but no simpler." In your zeal
for brevity I fear you've violated the second part of his
advice ...

Please post the shortest *complete* program that demonstrates
your problem. Otherwise, we're all sitting here debugging our
own suppositions about what you've omitted.

--
Er*********@sun.com
Nov 14 '05 #2
Eric Sosman <Er*********@sun.com> wrote in message news:<40***************@sun.com>...
One obstacle to diagnosing your difficulty is that we're
forced to make guesses about the portions of your code that
you didn't provide. You omitted all the declarations (the
one I'm most interested in is `cptrs'), and you omitted the
definitions of the back1line() and lprintf() functions.

Please post the shortest *complete* program that demonstrates
your problem. Otherwise, we're all sitting here debugging our
own suppositions about what you've omitted.


Sorry about. As soon I posted I realized that I had left my custom
printf function in as well as omitting a few others. Apologies. I have
another apology to make as well. This program is meant to be compiled
with Metrowerks CodeWarrior where it doesn't work properly (yet it
compiles). However, I've since tested and compiled on my rusty, yet
trusty, Linux box and it works there with a small modification by
typecasting line 62 as an int.

With that said, if anyone knows of a better/more efficient way to
accomplish what I'm trying to do, that'd be great. Essentially, I'm
trying to read the log file backwards and compare the first column
which is a timestamp.

Anyway, here it is. In all its horrid glory:

#include <stdlib.h>
#include <stdio.h>

int backtonewline (FILE *fp) {
char ch;
long pos;
int rc;

rc=fseek(fp, -1L, 1);
if (rc==-1)
return -1;

while (1) {
ch=fgetc(fp);
if (ch=='\n')
break;
rc=fseek(fp,-2L,1);
if (rc==-1)
return -1;
}
fseek(fp,-1L,1);
return 0;
}

int back1line (FILE *fp) {
if (backtonewline (fp) != 0) return -1;
if (backtonewline (fp) != 0) return -1;
fseek(fp,1L,1);
return 0;
}
int main () {
FILE *in;
int rc,len,i;
char buffer[100];
char *cptrs[3];
long pos;

len=strlen(buffer);
if ((buffer[len-1] == '\n')||(buffer[len-1] == '\r'))
buffer[len-1] = '\0';
in=fopen("log","rb");
if (in!=NULL) {
fseek(in,0,SEEK_END);
back1line(in);

while (1) {
pos=ftell(in);
fgets(buffer,100,in);
buffer[strlen(buffer)-1]=0;

len=strlen(buffer);
if ((buffer[len-1] == '\n')||(buffer[len-1] == '\r'))
buffer[len-1] = '\0';
printf("line=%s\n", buffer);
memset(cptrs,0,sizeof(cptrs));
i=0;
cptrs[i]=buffer;
while (cptrs[i] && i<3) {
++i;
(int *)cptrs[i]=strchr(cptrs[i-1],' '); /* typecast change to
work on Linux */
if (cptrs[i]==NULL) {
printf("we break in here\n");
break;
}
*cptrs[i]=0;
++cptrs[i];
}
printf("got 0 = <%d>\n",atoi(cptrs[0]));
printf("got 1 = <%d>\n",atoi(cptrs[1]));
printf("got 2 = <%d>\n",atoi(cptrs[2]));
fseek(in,pos,0);
if (back1line(in)!=0)
rc=1;
}
}
else
rc=1;
fclose(in);

}

Cheers,
Matt
Nov 14 '05 #3
On 12 Apr 2004 18:43:17 -0700, ma***@myrealbox.com (Matt DeFoor)
wrote:
Sorry about. As soon I posted I realized that I had left my custom
printf function in as well as omitting a few others. Apologies. I have
another apology to make as well. This program is meant to be compiled
with Metrowerks CodeWarrior where it doesn't work properly (yet it
compiles). However, I've since tested and compiled on my rusty, yet
trusty, Linux box and it works there with a small modification by
typecasting line 62 as an int.
While I believe you tested something, it was not this code. It
doesn't compile clean. Did you cut and paste or retype the code?

With that said, if anyone knows of a better/more efficient way to
accomplish what I'm trying to do, that'd be great. Essentially, I'm
trying to read the log file backwards and compare the first column
which is a timestamp.

Anyway, here it is. In all its horrid glory:

#include <stdlib.h>
#include <stdio.h>

int backtonewline (FILE *fp) {
char ch;
long pos;
int rc;

rc=fseek(fp, -1L, 1);
-1 would work just as well as -1L. But how do you know that SEEK_CUR
is 1 on all the systems you compile it on?
if (rc==-1)
fseek can return any non-zero value on error. Are you sure each of
your systems returns -1?
return -1;

while (1) {
ch=fgetc(fp);
fgetc returns an int, not a char. You need that to check for errors.
if (ch=='\n')
break;
rc=fseek(fp,-2L,1);
if (rc==-1)
return -1;
}
fseek(fp,-1L,1);
This positions you 1 character before the '\n'.
return 0;
}

int back1line (FILE *fp) {
if (backtonewline (fp) != 0) return -1;
If successful, this positions you one character before the '\n' before
the current line,
if (backtonewline (fp) != 0) return -1;
If successful, this positions you one character before the '\n' before
the previous line.

After processing line 2, this will fail because there is no '\n'
before line 1 and obviously no character before that.
fseek(fp,1L,1);
This positions you at the '\n' before the previous line.
return 0;
}
int main () {
FILE *in;
int rc,len,i;
char buffer[100];
char *cptrs[3];
long pos;

len=strlen(buffer);
You forgot to include string.h for strlen, memset, strchr, etc.

buffer is uninitialized. This invokes undefined behavior. I assume
this is out of sequence?
if ((buffer[len-1] == '\n')||(buffer[len-1] == '\r'))
buffer[len-1] = '\0';
in=fopen("log","rb");
if (in!=NULL) {
fseek(in,0,SEEK_END);
back1line(in);
If your file ends with a '\n' this will work. I believe that is
implementation defined. If there is no '\n', then you will skip the
last line and start with the one before it.

while (1) {
pos=ftell(in);
fgets(buffer,100,in);
Since back1line left you pointed at the '\n', you will only read in
that one character.
buffer[strlen(buffer)-1]=0;
This assumes the line was less than 99 characters. Are you sure?

len=strlen(buffer);
Did you really want to call strlen twice?
if ((buffer[len-1] == '\n')||(buffer[len-1] == '\r'))
The only possible '\n' was at the end of the string, just before the
'\0', and you removed it two statements earlier.
buffer[len-1] = '\0';
printf("line=%s\n", buffer);
memset(cptrs,0,sizeof(cptrs));
All bits 0 is not necessarily a valid value for a pointer. Why do you
bother since you initialize each cptrs as needed?

Undefined behavior in C89 because memset is assumed to return an int
which is not true.
i=0;
cptrs[i]=buffer;
while (cptrs[i] && i<3) {
++i;
(int *)cptrs[i]=strchr(cptrs[i-1],' '); /* typecast change to
work on Linux */
More undefined behavior or it would be if it wasn't for the syntax
error.

Tell us this was meant as a joke. The result of a cast is not a
modifiable l-value and therefore may not appear as the destination of
an assignment operator. Why did your note in the beginning say cast
to int when here you cast to int*?
if (cptrs[i]==NULL) {
printf("we break in here\n");
break;
}
*cptrs[i]=0;
++cptrs[i];
}
printf("got 0 = <%d>\n",atoi(cptrs[0]));
printf("got 1 = <%d>\n",atoi(cptrs[1]));
printf("got 2 = <%d>\n",atoi(cptrs[2]));
If you break out of the previous while loop because cptrs[i] is NULL,
then at least one of these calls to printf invokes undefined behavior.
fseek(in,pos,0);
Is 0 guaranteed to be SEEK_SET?
if (back1line(in)!=0)
rc=1;
}
}
else
rc=1;
fclose(in);

}


Please provide the real code.
<<Remove the del for email>>
Nov 14 '05 #4
ma***@myrealbox.com (Matt DeFoor) wrote in message
Sorry about. As soon I posted I realized that I had left my custom
printf function in as well as omitting a few others. Apologies. I have
another apology to make as well. This program is meant to be compiled
with Metrowerks CodeWarrior where it doesn't work properly (yet it
compiles). However, I've since tested and compiled on my rusty, yet
trusty, Linux box and it works there with a small modification by
typecasting line 62 as an int.

<snipped>
Anyway, here it is. In all its horrid glory:

#include <stdlib.h>
#include <stdio.h>


Forgot to include string.h.

-matt
Nov 14 '05 #5
Matt DeFoor wrote:

Eric Sosman <Er*********@sun.com> wrote in message news:<40***************@sun.com>...
One obstacle to diagnosing your difficulty is that we're
forced to make guesses about the portions of your code that
you didn't provide. You omitted all the declarations (the ^^^ one I'm most interested in is `cptrs'), and you omitted the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

My crystal ball is obviously in good working order. Send
me a GIF of your palm and I'll read your future -- for a small
fee, calculated in `double' on the DeathStation 9000 ...
Please post the shortest *complete* program [...]
Anyway, here it is. In all its horrid glory:
[...]

int main () {
FILE *in;
int rc,len,i;
char buffer[100];
char *cptrs[3];
long pos;


So `cptrs' is an array of three pointers, called cptrs[0]
through cptrs[2]. Further along:
i=0;
cptrs[i]=buffer;
while (cptrs[i] && i<3) {
This test can succeed if `i' is equal to 2 ...
++i;
.... in which case the next line changes `i' to 3 ...
(int *)cptrs[i]=strchr(cptrs[i-1],' '); /* typecast change to
work on Linux */
.... and you're now trying to store something into cptrs[3],
which doesn't exist. (The cast is bogus; rather than quieting
a warning, it should have prevented the program from compiling
at all. I suspect you're operating the compiler in a non-
conforming mode; if it's gcc, try using the "-ansi -pedantic"
flags, along with "-W -Wall". The proper way to quiet the
warning would have been to #include <string.h>; without that
inclusion your use of strchr(), strlen(), and memset() is not
only suspect, but flat-out incorrect.)

Once you try to store something in an array element that
doesn't exist, all bets are off. One quite likely (but not
guaranteed) outcome is that some other variable that just
happens to reside next to cptrs[2] will get clobbered. It's
likely (although again, not guaranteed) that the next-door
neighbor will be one of `pos' or `buffer' -- and if you've
managed to dump garbage in either of those, there's a good
chance your program will misbehave.
With that said, if anyone knows of a better/more efficient way to
accomplish what I'm trying to do, that'd be great. Essentially, I'm
trying to read the log file backwards and compare the first column
which is a timestamp.


This really isn't enough of an explanation to support an
informed recommendation. If you're trying to process the entire
log backwards, you'll be much better off reading it in the
forward direction and rearranging the processing. If you're
only interested in the last N lines (for smallish N), you may
do well to make a guess about how long those lines are, fseek()
to a position shortly before the estimated start of the group,
read and store the entire tail end of the file, and then figure
out which lines are which.

--
Er*********@sun.com
Nov 14 '05 #6
Eric Sosman <Er*********@sun.com> wrote in message news:<40***************@sun.com>...
Matt DeFoor wrote:
With that said, if anyone knows of a better/more efficient way to
accomplish what I'm trying to do, that'd be great. Essentially, I'm
trying to read the log file backwards and compare the first column
which is a timestamp.


This really isn't enough of an explanation to support an
informed recommendation. If you're trying to process the entire
log backwards, you'll be much better off reading it in the
forward direction and rearranging the processing. If you're
only interested in the last N lines (for smallish N), you may
do well to make a guess about how long those lines are, fseek()
to a position shortly before the estimated start of the group,
read and store the entire tail end of the file, and then figure
out which lines are which.


I need to search through the whole file, line by line, to find the
most recent entry that matches a certain criteria (e.g. 3 months ago
from today). So, I thought that reading the file backwards would be
the fastest and best approach.

-matt
Nov 14 '05 #7
Barry Schwarz <sc******@deloz.net> wrote:
Matt DeFoor wrote:

[without including string.h]
memset(cptrs,0,sizeof(cptrs));


Undefined behavior in C89 because memset is assumed to return an int
which is not true.


Isn't this OK as long as the return value is not used?
Nov 14 '05 #8
ma***@myrealbox.com (Matt DeFoor) wrote in message news:<45**************************@posting.google. com>...
Eric Sosman <Er*********@sun.com> wrote in message news:<40***************@sun.com>...
Matt DeFoor wrote:
With that said, if anyone knows of a better/more efficient way to
accomplish what I'm trying to do, that'd be great. Essentially, I'm
trying to read the log file backwards and compare the first column
which is a timestamp.


This really isn't enough of an explanation to support an
informed recommendation. If you're trying to process the entire
log backwards, you'll be much better off reading it in the
forward direction and rearranging the processing. If you're
only interested in the last N lines (for smallish N), you may
do well to make a guess about how long those lines are, fseek()
to a position shortly before the estimated start of the group,
read and store the entire tail end of the file, and then figure
out which lines are which.


I need to search through the whole file, line by line, to find the
most recent entry that matches a certain criteria (e.g. 3 months ago
from today). So, I thought that reading the file backwards would be
the fastest and best approach.

-matt


If you have control over the program creating the log file the best
solution is probably:

1) Modify that application so it creates a separate log file for every
day/week/month (pick most appropriate) named by date.
2) Have your program search those log file in reverse date order, but
*forwards*
Nov 14 '05 #9
Matt DeFoor wrote:
Eric Sosman <Er*********@sun.com> wrote in message news:<40***************@sun.com>...
Matt DeFoor wrote:
With that said, if anyone knows of a better/more efficient way to
accomplish what I'm trying to do, that'd be great. Essentially, I'm
trying to read the log file backwards and compare the first column
which is a timestamp.


This really isn't enough of an explanation to support an
informed recommendation. If you're trying to process the entire
log backwards, you'll be much better off reading it in the
forward direction and rearranging the processing. If you're
only interested in the last N lines (for smallish N), you may
do well to make a guess about how long those lines are, fseek()
to a position shortly before the estimated start of the group,
read and store the entire tail end of the file, and then figure
out which lines are which.

I need to search through the whole file, line by line, to find the
most recent entry that matches a certain criteria (e.g. 3 months ago
from today). So, I thought that reading the file backwards would be
the fastest and best approach.

-matt

I haven't gone back through this thread so I don't know what advice
you already have. I'd do it something like this..

Let's say the file is a series of key=value pairs written
sequentially over time. You want to know the last value associated
with key. For sake of argument, we are looking for a key named PASS
which will occur several times in the file. We are interested to
know where the 'last' occurrence is.

Read the file line by line with fgets(), remembering (saving) the
address of the beginning of the line as returned by ftell().

long pass = 0;
long tell = 0;
char line[ENOUGH];
char *cp;

/* Open your file in text mode and prepare to read it to the end. */

while (fgets(line, sizeof line, fp) != NULL) {
if ((cp = strstr(line, "PASS=")) != NULL)
pass = tell; /* the beginning of this line */
tell = ftell(fp); /* the beginning of the next line */
}
fseek(fp, pass, SEEK_SET); /* last line with "PASS=" */
fgets(line, sizeof line, fp); /* read the line */
cp = strchr(line, '=') + 1; /* point to the value */

This is not a program. It's a hint.
--
Joe Wright mailto:jo********@comcast.net
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Nov 14 '05 #10
Matt DeFoor wrote:
.... snip ...
I need to search through the whole file, line by line, to find
the most recent entry that matches a certain criteria (e.g. 3
months ago from today). So, I thought that reading the file
backwards would be the fastest and best approach.


Why didn't you say so in the first place! Just read it in the
normal forward direction, and whenever the criterion is satisfied
note where you are, overwriting the previous note. When you hit
EOF the note will specify the last position. Pseudo code:

locn = NOWHERE;
where = STARTOFFILE;
while (fggets(&ln, f) {
if (findin(ln, criterion) locn = where;
where = currentposition(f);
free(ln);
}

assuming the use of ggets.zip available on my site.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #11
On 13 Apr 2004 13:22:34 -0700, ol*****@inspire.net.nz (Old Wolf)
wrote:
Barry Schwarz <sc******@deloz.net> wrote:
Matt DeFoor wrote:

[without including string.h]
> memset(cptrs,0,sizeof(cptrs));


Undefined behavior in C89 because memset is assumed to return an int
which is not true.


Isn't this OK as long as the return value is not used?


Consider the situation where returned pointers are stored in one set
of registers and returned integers are stored in another set of
registers. Since memset will return a pointer it will update one of
the registers in the first set. The compiler, thinking that memset
returns an int, is allowed to assume that all the values it has
previously loaded in that set are still intact. Any subsequent code
that uses the changed register has got a problem because the value the
compiler thinks is there is not.
<<Remove the del for email>>
Nov 14 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Jay | last post by:
I have a very large text file (being read by a CGI script on a web server), and I get memory errors when I try to read the whole file into a list of strings. The problem is, I want to read the file...
2
by: Oxmard | last post by:
Armed with my new O'Reilly book Optimizing Oracle Performance I have been trying to get a better understanding of how Oracle works. The book makes the statement, " A database cal with dep=n + 1...
15
by: SK | last post by:
Hey folks, I am searching for a string (say "ABC") backwards in a file. First I seek to the end. Then I try to make a check like - do { file.clear (); file.get(c); file.seekg(-2,...
11
by: Matt DeFoor | last post by:
I have some log files that I'm working with that look like this: 1000000000 3456 1234 1000000001 3456 1235 1000020002 3456 1223 1000203044 3456 986 etc. I'm trying to read the file...
6
by: Rajorshi Biswas | last post by:
Hi folks, Suppose I have a large (1 GB) text file which I want to read in reverse. The number of characters I want to read at a time is insignificant. I'm confused as to how best to do it. Upon...
6
by: Neil Patel | last post by:
I have a log file that puts the most recent record at the bottom of the file. Each line is delimited by a \r\n Does anyone know how to seek to the end of the file and start reading backwards?
2
by: Jean-Marie Vaneskahian | last post by:
Reading - Parsing Records From An LDAP LDIF File In .Net? I am in need of a .Net class that will allow for the parsing of a LDAP LDIF file. An LDIF file is the standard format for representing...
7
by: robannexs | last post by:
hi all.. i've got a file of the following format 10000000 records in 10000000 records out 5120000000 bytes (5.1 GB) copied, 628.835 seconds, 8.1 MB/s how am i suppose to get the parameter...
1
by: syhzaidi | last post by:
How can we do Parsing of Hexdecimel in C# reading string from stream file for eg.. i have a file like.......... 0f 2f 12 2d 3a.......in hexa decimal save in a file.txt and i m reading it from...
1
by: stoogots2 | last post by:
I have written a Windows App in C# that needs to read a text file over the network, starting from the end of the file and reading backwards toward the beginning (looking for the last occurrence of a...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.