473,407 Members | 2,676 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

Verify that line exists in a file

Hi,

Please could you tell me what the quickest way would be to verify that a
line of text exists in a file.

I would like to be able to do something like:

grep "^line of text$" filename.txt

and then check the exit code, or:

grep -c "^line of text$" filename.txt

and then check the count.

But I need to do it in a C program.

Many Thanks,
Jonny
Nov 14 '05 #1
9 2636
In article <T2****************@newsfe3-win.ntli.net>,
Jonny <ww******@ntlworld.com> wrote:
:Please could you tell me what the quickest way would be to verify that a
:line of text exists in a file.

open the file, loop reading lines, for each line strcmp against the
target; if you get a match then set an appropriate status and break
early; close the file.

Watch our for the factor of whether the line read includes or excludes
the record seperator; note that the answer might not be the same for the
last line in the file.
I should be slightly more honest: the above is not necessarily
the -quickest- way, if by 'quick' you mean that execution speed is
crucial. If execution speed is crucial, there are faster algorithms.

For example, you could read a bufferful of the file at a time into
memory and proceed through it. Instead of starting from the start
of the buffer and comparing forward, you could instead go immediately
a number of characters further on, where the offset is the size of
the string you are trying to match against. If the character there does
not match the -last- character in the string, then you know already
that that particular section you are looking at is not a copy of the
line; if it does match, then you proceed backwards comparing a character
at at time until you reach the beginning of the string or find a mismatch.

If that first test did not match, you could scan forward until you
found the end-of-line and then start the backwards comparison from there.
There are optimizations that can be made to even that process, though:
the character that was in fact at the place you looked can give you
hints about how much further on to look. For example if the character
you found was an 'x', and you knew that 'x' was the 7th last character in
the string, then you could immediately advance by 7 characters and look
for the newline there, instead of testing each of the characters
inbetween for newline.
--
"There are three kinds of lies: lies, damn lies, and statistics."
-- not Twain, perhaps Disraeli, first quoted by Leonard Courtney
Nov 14 '05 #2
Jonny wrote:
Hi,

Please could you tell me what the quickest way would be to verify that a
line of text exists in a file.

I would like to be able to do something like:

grep "^line of text$" filename.txt

and then check the exit code, or:

grep -c "^line of text$" filename.txt

and then check the count.

But I need to do it in a C program.
You want to have a function

int OccurrenceOfLine(FILE *file, const char *line)

which reads lines with fgets() into a buffer of size=strlen(line)+1, strips the '\n' from the end of the line

(if there is none, the line was too long, i.e. it does not
match and you can read characters until you have reached the
'\n' of this line) and uses strcmp() on the buffer and
line.
If you have a match, you can return 1. If you encounter an
error, you can return -1. Otherwise, you return 0.
If you want to _count_ the occurences, then I suggest

int OccurrencesOfLine(FILE *file, const char *line, size_t *count)

where you return 0 on success and an error code !=0 on failure.
The number of matching lines is stored in *count.

Note: You need of course <stdio.h> and <string.h> for both
ways I have suggested.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Nov 14 '05 #3
Jonny wrote:
Hi,

Please could you tell me what the quickest way would be to verify that a
line of text exists in a file.

I would like to be able to do something like:

grep "^line of text$" filename.txt

and then check the exit code, or:

grep -c "^line of text$" filename.txt

and then check the count.

But I need to do it in a C program.

Many Thanks,
Jonny


Knuth-pratt-morris or boyer-moore
http://www.cs.utexas.edu/users/moore...ing-searching/

gtoomey
Nov 14 '05 #4
Gregory Toomey wrote:

[ ... ]
Knuth-pratt-morris or boyer-moore
http://www.cs.utexas.edu/users/moore...ing-searching/


With the proviso that you _rarely_ want to use Boyer-Moore as
originally defined -- nearly all practical use is of simplified
variants such as Boyer-Moore-Horspool (or Sunday's variant thereof). A
full-blown Boyer-Moore search is theoretically better in some sense,
but the savings rarely justify the initialization effort.

The development effort is considerable as well -- to the point that
incorrect implementations abound, even those published by the most
highly respected computer scientists (e.g. Knuth). In fact, for the
first 15 years after the algorithm was described, EVERY published
implementation seems to have contained at least one defect.

By contrasty, Sunday's variant of B-M-H is easy to set up, much easier
implement correctly (I'm among the few to have really screwed it up),
and will usually run faster to boot!

--
Later,
Jerry.

The universe is a figment of its own imagination.

Nov 14 '05 #5
Thanks to everyone for replying.

I do need a function which is efficient, so I'll probably need to try an
implementation of one of the more advanced algorithms suggested.

I found the section of code below, which claims to be an implementation
of Boyer-Moore-Horspool.

I would be grateful if someone could briefly tell me if and how I could
use it.
char *search( pat, text, n )
char *pat, *text;
int n;

{ int i, j, k, m, skip[MAXCHAR];
m = strlen(pat);
if( m==0 ) return( text );
for( k=0; k<MAXCHAR; k++ ) skip[k] = m;
for( k=0; k<m-1; k++ ) skip[pat[k]] = m-k-1;

for( k=m-1; k < n; k += skip[text[k] & (MAXCHAR-1)] ) {
for( j=m-1, i=k; j>=0 && text[i] == pat[j]; j-- ) i--;
if( j == (-1) ) return( text+i+1 );
}
return( NULL );
}
Regards,
Jonny
Nov 14 '05 #6
Gregory Toomey wrote:
Jonny wrote:
Please could you tell me what the quickest way would be to verify
that a line of text exists in a file.

I would like to be able to do something like:

grep "^line of text$" filename.txt

and then check the exit code, or:

grep -c "^line of text$" filename.txt

and then check the count. But I need to do it in a C program.


Knuth-pratt-morris or boyer-moore
http://www.cs.utexas.edu/users/moore...ing-searching/


For complete lines combining fgets and strcmp is hard to beat.
Normally strcmp will fail on the first char. or two, and fgets need
not read into a buffer significantly larger than the comparee. The
only complication will be stripping the '\n' off the end of file
lines, if present, and flushing the input line thru '\n' if not
present.

There is no need for more complex searches such as kmp or bm.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

Nov 14 '05 #7
Jonny wrote:
Thanks to everyone for replying.

I do need a function which is efficient, so I'll probably need to try an
implementation of one of the more advanced algorithms suggested.
Which is _complete_ overkill.
Both search for a comparatively small pattern in a comparatively
large _unstructured_ text.
You, OTOH, have a pattern which is exactly as large as the text
without the '\n'. That means that setting up the skip vector
about triples the cost.

The only disadvantage of fgets() is that it does not return or
make available the number of read characters so that you may
consider using getc() and comparing yourself.
Implement the "read line -- compare it" variant and one advanced
algorithm and just use a profiler or clock() or whatever to
find out which is better.

A "better" algorithm is not necessarily the best algorithm for
a given task -- whenever you have additional information about
the problem at hand, using a method tailored to your degree of
information is best.

I found the section of code below, which claims to be an implementation
of Boyer-Moore-Horspool.

I would be grateful if someone could briefly tell me if and how I could
use it.
You could, with some restrictions.
Why did you not just paste it into your code _and_ try it?
We are not here to counsel you about algorithms or code off some
dubious site in the web but about _your_ C code -- which you still
fail to show us.

char *search( pat, text, n )
char *pat, *text;
int n;
K&R style -- this is outdated by a good fifteen years.

char *search (const char *pat, const char *text, int n)

where you probably would change int n into size_t n and
would change the order of text and pat to fit the order
in standard library functions such as strstr().

{ int i, j, k, m, skip[MAXCHAR];
m = strlen(pat);
if( m==0 ) return( text );
for( k=0; k<MAXCHAR; k++ ) skip[k] = m;
for( k=0; k<m-1; k++ ) skip[pat[k]] = m-k-1;

for( k=m-1; k < n; k += skip[text[k] & (MAXCHAR-1)] ) {
Here is a catch: The author assumes MAXCHAR to be 1<<s.
To repair this, make it skip[text[k] % MAXCHAR]
and leave the micro-optimisations to the compiler.
for( j=m-1, i=k; j>=0 && text[i] == pat[j]; j-- ) i--;
if( j == (-1) ) return( text+i+1 );
}
return( NULL );
}


I have not closely looked at or tested the rest of the
code -- the latter is your job.
However, I think that you are trying to be lazy on the
thinking and programming part. This will not lead to the
most efficient solution.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Nov 14 '05 #8
Jonny <ww******@ntlworld.com> wrote:
# Hi,
#
# Please could you tell me what the quickest way would be to verify that a
# line of text exists in a file.

Set up an FSM and scan the file character by character. Then you don't have to worry
about line bufferring.

If the line you're interested in has N characters, the state machine would have
N+4 states:
s=0 initial state
0<=s<N s characters of a line have been read and match the given string.
if the next input character is the sth string character
then move to state s+1
else move to state N+2
s=N all N characters match
if the next input character is a new line or EOF
then move to state N+2
else move to state N+1
s=N+1 this line does not match the string
if the next input character is a new line
then move to state 0
else if the next input is an EOF
then move to state N+3
else
stay in state N+1
s=N+2 accepting final state: the line is in the file
s=N+3 rejecting final state: the line is not in the file

If you get the line out of argv, the only variables you need are the next input,
N, s, and argc and argv.

If the file length is M and the sought line is randomly distributed in the file,
the estimated running time is a little over O(M/2), and it is independent of N.
The storage cost is O(4+N).

--
SM Ryan http://www.rawbw.com/~wyrmwif/
The whole world's against us.
Nov 14 '05 #9
On Sat, 19 Feb 2005 21:18:43 GMT, in comp.lang.c , Jonny
<ww******@ntlworld.com> wrote:
Hi,

Please could you tell me what the quickest way would be to verify that a
line of text exists in a file.

I would like to be able to do something like:

grep "^line of text$" filename.txt


1) d/l the source code for grep. Its in the gnu txtutils pack I think.

2) read the file line by line till you find a line that matches your
pattern.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>
Nov 14 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Dan Williams | last post by:
Apologies for the cross-post but i thought i'd aim for the largest audience possible. I have a web site that users have to register to with their email address. The site then emails the user...
10
by: bsandell | last post by:
I need to write a stored procedure to verify that a table exists and also that the user executing the stored procedure has access to the specified table. Any user can call this publicly...
1
by: Koen | last post by:
Hi all, I created a little database to manage my e-books. The program will synchronize a table with the contents of a directory. Works great. Because I keep additional info (like keywords) to...
2
by: MrMike | last post by:
This is a tough one, but here goes. I have a webform textbox where users input a filepath. For example: \\servername\sharename\file.xls. Is there anyway that I could use VB code to verify that...
2
by: Wayne Wengert | last post by:
I want to write a Windows application to go through all the email addresses in an SQL Server table and to report which ones are invalid. From Googling and perusing NGs it is my understanding that...
3
by: akadelski | last post by:
I need a way to verify a file exists before I attempt to open it. Right now we are storing PDFs server-side and I need a way to either 1. Check if the physical file exists or 2. check if the URL...
0
by: jpr | last post by:
Hello, I need some help. I have a form named MASTER based on a table also called MASTER. A control of my form in names SSN which stores the client SSN. On the same form I have placed a subform...
2
by: Bazzaboy | last post by:
I need to verify that someone has not submitted a bogus email address in my on-line form. So I need to check: The syntax of the email address submitted If the email address actually exists ...
2
by: SM | last post by:
Hello, I've created this 'wonderful' function the embeds a youtube video in a specified div section using the Javascript DOM. Everything works OK... until I realize how bad the logical programming...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.