473,386 Members | 1,644 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

No. of 'a' in a text file

/* I wrote the following program to calculate no. of 'a' in the file
c:/1.txt but it fails to give appropriate result. What is wrong with
it? */

#include"stdio.h"
int main(void)
{
FILE *f;
char ch;
long int a=0;
f=fopen("c:/1.txt","r");
while(ch=getc(f)!=EOF)
{
switch(ch)
{
case 'a': a++;break;
}
}
printf("No. of 'a' = %d\n",a);
fclose(f);
return 0;
}

Apr 21 '07 #1
77 2533
ume$h said:
/* I wrote the following program to calculate no. of 'a' in the file
c:/1.txt but it fails to give appropriate result. What is wrong with
it? */

#include"stdio.h"
int main(void)
{
FILE *f;
char ch;
long int a=0;
f=fopen("c:/1.txt","r");
What happens if the fopen fails?
while(ch=getc(f)!=EOF)
Are you sure you meant to say this? Consider the precedences of = and !=
and check the return type of getc.
{
switch(ch)
{
case 'a': a++;break;
Do you really mean to break out of your loop at this point?
}
}
printf("No. of 'a' = %d\n",a);
Did you read the documentation for printf?

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 21 '07 #2
ume$h wrote:
>
/* I wrote the following program to calculate no. of 'a' in the file
c:/1.txt but it fails to give appropriate result. What is wrong with
it? */
A lot.

#include"stdio.h"
#include <stdio.h>
int main(void)
{
FILE *f;
char ch;
int ch;
/*
** getc returns int, EOF is type int.
*/

long int a=0;
f=fopen("c:/1.txt","r");
What if fopen returns NULL?
while(ch=getc(f)!=EOF)
ch is assigned a value of either 1 or 0,
depending on whether or not getc returns EOF.
{
switch(ch)
{
case 'a': a++;break;
}
You need a default case.
}
printf("No. of 'a' = %d\n",a);
fclose(f);
return 0;
}
/* BEGIN new.c */

#include <stdio.h>

int main(void)
{
FILE *f;
char *fn = "c:/1.txt";
int c;
long unsigned a = 0;

f = fopen(fn,"r");
if (f != NULL) {
while ((c = getc(f)) != EOF) {
a += c == 'a';
}
printf("No. of 'a' = %lu\n", a);
fclose(f);
} else {
printf("fopen problem with %s\n", fn);
}
return 0;
}

/* END new.c */

--
pete
Apr 21 '07 #3
// This one runs.

#include"stdio.h"
int main(void)
{
FILE *f;
long int a=0;
f=fopen("c:/1.txt","r");
while (getc(f)=='a')
a++;
printf("No. of 'a' = %d\n",a);
fclose(f);
return 0;
}

Apr 21 '07 #4
In article <fI*********************@bt.com>,
Richard Heathfield <rj*@see.sig.invalidwrote:
>while(ch=getc(f)!=EOF)

Are you sure you meant to say this? Consider the precedences of = and !=
and check the return type of getc.
>{
switch(ch)
{
case 'a': a++;break;

Do you really mean to break out of your loop at this point?
What? That does not break out of the loop.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Apr 21 '07 #5
Umesh said:
// This one runs.

#include"stdio.h"
int main(void)
{
FILE *f;
long int a=0;
f=fopen("c:/1.txt","r");
while (getc(f)=='a')
a++;
printf("No. of 'a' = %d\n",a);
fclose(f);
return 0;
}
On my system, the output of this program is:

"Segmentation fault (core dumped)"

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 21 '07 #6
Richard Tobin said:
Richard Heathfield wrote:
>> switch(ch)
{
case 'a': a++;break;

Do you really mean to break out of your loop at this point?

What? That does not break out of the loop.
You're right, of course. A crit too far.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 21 '07 #7
pete <pf*****@mindspring.comwrites:
ume$h wrote:
[...]
>{
switch(ch)
{
case 'a': a++;break;
}

You need a default case.
[...]

What for?

The switch statement is equivalent to:

if (ch == 'a') {
a++;
}

which, of course, would be a better way to write it (unless the OP is
planning to expand it to handle other characters).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Apr 21 '07 #8
Umesh wrote:
>
// This one runs.

#include"stdio.h"
Should be
#include <stdio.h>
int main(void)
{
FILE *f;
long int a=0;
f=fopen("c:/1.txt","r");
What if fopen returns a null pointer?
while (getc(f)=='a')
a++;
This loop will tally the number of 'a' characters
at the begining of the file
and will stop as soon as it reads any other character.
printf("No. of 'a' = %d\n",a);
Should be %ld because (a) is long.
fclose(f);
That function call is undefined if (f) is a null pointer.
return 0;
}
--
pete
Apr 21 '07 #9
Keith Thompson wrote:
>
pete <pf*****@mindspring.comwrites:
ume$h wrote:
[...]
{
switch(ch)
{
case 'a': a++;break;
}
You need a default case.
[...]

What for?
Style?
I forgot that the default case is optional for switch statements.
The switch statement is equivalent to:

if (ch == 'a') {
a++;
}

which, of course, would be a better way to write it (unless the OP is
planning to expand it to handle other characters).
--
pete
Apr 21 '07 #10
Umesh <fr****************@gmail.comwrites:
// This one runs.

#include"stdio.h"
int main(void)
{
FILE *f;
long int a=0;
f=fopen("c:/1.txt","r");
while (getc(f)=='a')
a++;
printf("No. of 'a' = %d\n",a);
fclose(f);
return 0;
}
I'm sure it runs, but it doesn't work. It appears to count the number
of consecutive 'a' characters starting at the beginning of the file,
which I don't think is what you're trying to do.

It also ignores all the advice that's been given to you so far:

Use <stdio.h>, not "stdio.h"

Check the result of fopen().

The "%d" format expects an int; you're giving it a long int.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Apr 22 '07 #11
/*Calculate the no. of occurance of 'ab'. But is this one OK/
EFFICIENT?*/
#include<stdio.h>
int main(void)
{
FILE *f;
long int c,c1,ab=1;
f=fopen("c:/1.txt","r");
while ((c=getc(f))!=EOF && (c1=getc(f))!=EOF) {
if(c=='a' && c1=='b') ab++;}
fclose(f);
printf("No. of 'ab' = %ld\n",ab);
return 0;
}

Apr 22 '07 #12
Umesh said:
/*Calculate the no. of occurance of 'ab'. But is this one OK/
EFFICIENT?*/
#include<stdio.h>
int main(void)
{
FILE *f;
long int c,c1,ab=1;
f=fopen("c:/1.txt","r");
while ((c=getc(f))!=EOF && (c1=getc(f))!=EOF) {
if(c=='a' && c1=='b') ab++;}
fclose(f);
printf("No. of 'ab' = %ld\n",ab);
return 0;
}
I ran this, and got the following output:

Segmentation fault (core dumped)
But yes, the core dump did happen very quickly. No efficiency complaints
here.
--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 22 '07 #13

Richard Heathfield wrote:
I ran this, and got the following output:

Segmentation fault (core dumped)
But yes, the core dump did happen very quickly. No efficiency complaints
here.
The program is running in TC++ 4.5 and VC++ 6 compiler. What is wrong
with you?
I wanted to say that if I want to find no. of occurance of
'abcdefghijk' in a text file by modifying this program, it would be a
toilsome job. Is there any alternative? Thank you.

Apr 22 '07 #14
On Apr 21, 9:59 pm, Umesh <fraternitydispo...@gmail.comwrote:
/*Calculate the no. of occurance of 'ab'. But is this one OK/
EFFICIENT?*/
#include<stdio.h>
int main(void)
{
FILE *f;
long int c,c1,ab=1;

Since getc returns int, I'd
use int for the 'c' and 'c1' variables.

f=fopen("c:/1.txt","r");

Others have said that you
need to check to see if fopen succeeds.

while ((c=getc(f))!=EOF && (c1=getc(f))!=EOF) {
if(c=='a' && c1=='b') ab++;}
fclose(f);
printf("No. of 'ab' = %ld\n",ab);
return 0;

}

And what happens if the file
you're reading contains the
string "bababa\n"? Ask yourself
if the output would be any different
for a file that contained the string
"ababab\n"?

--
Hope this helps,
Steven
Apr 22 '07 #15
Umesh said:
>
Richard Heathfield wrote:
>I ran this, and got the following output:

Segmentation fault (core dumped)
But yes, the core dump did happen very quickly. No efficiency
complaints here.

The program is running in TC++ 4.5 and VC++ 6 compiler.
No, it isn't. The program doesn't run in the compiler. It runs as a
process on the computer. (One might reasonably describe a C program as
running "in" an interpreter if one happened to be using one, but you
aren't.)
What is wrong with you?
What a question. Here's a better question: what is wrong with your
program, that causes it to produce a segmentation fault on my system
instead of a graceful error message? Hint: fopen can fail.
I wanted to say that if I want to find no. of occurance of
'abcdefghijk' in a text file by modifying this program, it would be a
toilsome job. Is there any alternative? Thank you.
Step 1: make it readable (so that you can correct it).
Step 2: make it correct (so that it's worth speeding up).
Step 3: make it fast (if it isn't already fast enough).

Leaving aside readability issues (although I don't consider your program
to be very readable), you have at least three problems in your program
that stop it from being correct. Firstly, it makes an invalid
assumption that its resource acquisition request is bound to succeed.
Secondly, it starts its count from 1 rather than from 0. And thirdly,
it fails to count any 'a', 'b' pair that are an odd number of bytes
into the file.

I suggest you fix those three problems before worrying about
performance.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 22 '07 #16
Umesh wrote:
/*Calculate the no. of occurance of 'ab'. But is this one OK/
EFFICIENT?*/
#include<stdio.h>
int main(void)
{
FILE *f;
long int c,c1,ab=1;
f=fopen("c:/1.txt","r");
while ((c=getc(f))!=EOF && (c1=getc(f))!=EOF) {
if(c=='a' && c1=='b') ab++;}
fclose(f);
printf("No. of 'ab' = %ld\n",ab);
return 0;
}
Whitespace has plummeted in price over the past few years.

Making you program a little safer and readable.

#include <stdio.h>

int main( int argc, char** argv )
{
if( argc 1 )
{
FILE *f = fopen( argv[1],"r");

if( f )
{
int c,c1;
unsigned ab=0;

while ((c=getc(f))!=EOF && (c1=getc(f))!=EOF)
{
if(c=='a' && c1=='b') ab++;
}

fclose(f);
printf("No. of 'ab' = %ld\n",ab);
}
}
return 0;
}

Now try running it on its self and you find your logic errors.

--
Ian Collins.
Apr 22 '07 #17
Ian Collins said:
Umesh wrote:
<snip>
>long int c,c1,ab=1;
<snip>
>printf("No. of 'ab' = %ld\n",ab);
<snip>
>
Making you program a little safer and readable.
Laudable, but you have introduced at least one fresh bug, by making ab
into an unsigned int...

<snip>
unsigned ab=0;
<snip>
printf("No. of 'ab' = %ld\n",ab);
....without fixing the printf to match.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 22 '07 #18
Umesh wrote:
I wanted to say that if I want to find no. of occurance of
'abcdefghijk' in a text file by modifying this program, it would be a
toilsome job. Is there any alternative? Thank you.
/* BEGIN new.c */

#include <stdio.h>

int main(void)
{
int c;
FILE *f;
char *fn = "c:/1.txt";
long unsigned count = 0;
enum states {A = 0,B,C,D,E,F,G,H,I,J,K} waiting_for = A;

f = fopen(fn,"r");
if (f != NULL) {
while ((c = getc(f)) != EOF) {
switch (c) {
case 'a':
if (waiting_for++ != A) {
waiting_for = A;
}
break;
case 'b':
if (waiting_for++ != B) {
waiting_for = A;
}
break;
case 'c':
if (waiting_for++ != C) {
waiting_for = A;
}
break;
case 'd':
if (waiting_for++ != D) {
waiting_for = A;
}
break;
case 'e':
if (waiting_for++ != E) {
waiting_for = A;
}
break;
case 'f':
if (waiting_for++ != F) {
waiting_for = A;
}
break;
case 'g':
if (waiting_for++ != G) {
waiting_for = A;
}
break;
case 'h':
if (waiting_for++ != H) {
waiting_for = A;
}
break;
case 'i':
if (waiting_for++ != I) {
waiting_for = A;
}
break;
case 'j':
if (waiting_for++ != J) {
waiting_for = A;
}
break;
case 'k':
if (waiting_for == K) {
++count;
}
waiting_for = A;
break;
default:
waiting_for = A;
break;
}
}
printf("No. of 'abcdefghijk' = %lu\n", count);
fclose(f);
} else {
printf("fopen problem with %s\n", fn);
}
return 0;
}

/* END new.c */
--
pete
Apr 22 '07 #19
pete wrote:
>
Umesh wrote:
I wanted to say that if I want to find no. of occurance of
'abcdefghijk' in a text file by modifying this program,
it would be a toilsome job. Is there any alternative? Thank you.
/* BEGIN new.c */

#include <stdio.h>
#include <string.h>

#define STRING "abcdefghijk"

int main(void)
{
int c;
FILE *f;
char *letter;
char *fn = "c:/1.txt";
long unsigned count = 0;
enum states {A = 0,B,C,D,E,F,G,H,I,J,K} waiting_for = A;
const char* const string = STRING;

f = fopen(fn, "r");
if (f != NULL) {
while ((c = getc(f)) != EOF) {
switch (c) {
case 'k':
if (waiting_for == K) {
++count;
}
waiting_for = A;
break;
default:
letter = strchr(string, c);
if (letter == NULL
|| waiting_for++ != letter - string)
{
waiting_for = A;
}
break;
}
}
printf("No. of '%s' = %lu\n", string, count);
fclose(f);
} else {
printf("fopen problem with %s\n", fn);
}
return 0;
}

/* END new.c */
--
pete
Apr 22 '07 #20
On Apr 22, 8:16 am, Richard Heathfield <r...@see.sig.invalidwrote:
Ian Collins said:
Umesh wrote:
<snip>
long int c,c1,ab=1;
<snip>
printf("No. of 'ab' = %ld\n",ab);
<snip>
Making you program a little safer and readable.
Laudable, but you have introduced at least one fresh bug, by making ab
into an unsigned int...
<snip>
unsigned ab=0;
<snip>
printf("No. of 'ab' = %ld\n",ab);
...without fixing the printf to match.
It's worth pointing out that this was also cross-posted to
comp.lang.c++, and that C++ has better ways of handling this
problem: in C++ code, I'd use std::deque (for the general case,
anyway) and istream, for example, to maintain a sliding two
character window in the file. In C, I'd probably simulate the
use of deque to acheive the same thing---a two character queue
is pretty easy to program. Alternatively, a simple state
machine is an efficient solution in both languages.

In C, for this specific case, I'd probably write something like:

#include <stdio.h>
#include <stdlib.h>

int
main()
{
FILE* f = fopen( "somefile.txt", "r" ) ;
if ( f == NULL ) {
fprintf( stderr, "cannot open: %s\n", "somefile.txt" ) ;
exit( 2 ) ;
}
int prev = '\0' ;
int count = 0 ;
for ( int ch = getc( f ) ; ch != EOF ; ch = getc( f ) ) {
if ( prev == 'a' && ch == 'b' ) {
++ count ;
}
prev = ch ;
}
printf( "%d\n", count ) ;
return 0 ;
}

(I think that this is 100% C. At any rate, gcc -pedantic
-std=c99 -Wall compiles it without warnings.)

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 22 '07 #21
pete wrote:
>
pete wrote:

Umesh wrote:
I wanted to say that if I want to find no. of occurance of
'abcdefghijk' in a text file by modifying this program,
it would be a toilsome job. Is there any alternative? Thank you.
/* BEGIN new.c */

#include <stdio.h>
#include <string.h>

#define STRING "abcdef"

int main(void)
{
enum states {A = 0,B,C,D,E,F,G,H,I,J,K} waiting_for = A;
/*
** Just make sure that the enum states cover the STRING
** and you can change STRING to whatever you want
** without having to change any other lines in this program.
*/
int c;
FILE *fp;
char *letter;
char *fn = "c:/1.txt";
long unsigned count = 0;
const char* const string = STRING;

fp = fopen(fn, "r");
if (fp != NULL) {
while ((c = getc(fp)) != EOF) {
if (c == STRING[sizeof STRING - 2]) {
if (waiting_for == sizeof STRING - 2) {
++count;
}
waiting_for = A;
} else {
letter = strchr(string, c);
if (letter == NULL
|| waiting_for++ != letter - string)
{
waiting_for = A;
}
}
}
printf("No. of '%s' = %lu\n", string, count);
fclose(fp);
} else {
printf("fopen problem with %s\n", fn);
}
return 0;
}

/* END new.c */

--
pete
Apr 22 '07 #22
pete wrote:
>
pete wrote:

pete wrote:
>
Umesh wrote:
>
I wanted to say that if I want to find no. of occurance of
'abcdefghijk' in a text file by modifying this program,
it would be a toilsome job. Is there any alternative? Thank you.
#define STRING "abcdef"

int main(void)
{
enum states {A = 0,B,C,D,E,F,G,H,I,J,K} waiting_for = A;
/*
** Just make sure that the enum states cover the STRING
** and you can change STRING to whatever you want
** without having to change any other lines in this program.
*/
Actually, the following line of code
letter = strchr(string, c);
prevents the program from working correctly
with strings like "baby",
that have multiple occurances of letters.

--
pete
Apr 22 '07 #23
my be it will like this,
while(ch=getc(f)!=EOF)-----------while(ch=getc(f)&&ch!=EOF)

{ //do something

Apr 22 '07 #24
Umesh wrote:
/*Calculate the no. of occurance of 'ab'. But is this one OK/
EFFICIENT?*/
#include<stdio.h>
int main(void)
{
FILE *f;
long int c,c1,ab=1;
f=fopen("c:/1.txt","r");
while ((c=getc(f))!=EOF && (c1=getc(f))!=EOF) {
if(c=='a' && c1=='b') ab++;}
fclose(f);
printf("No. of 'ab' = %ld\n",ab);
return 0;
}
That will not work, it will fail for sequences like 'xab'.

The code I attached below shows how you can create a state machine to
search for any sequence of characters. You'll need to adapt it to read
from a file (should be trivial). It's not designed for speed.


#include <map>
#include <vector>
#include <memory>
#include <cassert>

// ======== IteratorTranverser =======================================
/**
* IteratorTranverser is a template class that iterates through
* a pointer or iterator. The pointers passed to it must be valid
* for the life-time of this object.
*/

template <typename Itr, typename t_key_char>
struct IteratorTranverser
{

Itr m_from;
const Itr m_end;

IteratorTranverser( const Itr & i_from, const Itr & i_end )
: m_from( i_from ),
m_end( i_end )
{
}
bool GetChar( t_key_char & l_char )
{
if ( m_from != m_end )
{
l_char = * ( m_from ++ );
return true;
}

return false;
}

bool HasInput( bool i_wait )
{
return m_from != m_end;
}

};
// ======== CombiningTraverser ========================================
/**
*
*
*/

template <typename TraverserTypeFirst, typename TraverserTypeSecond, typename t_key_char>
struct CombiningTraverser
{

TraverserTypeFirst & m_first;
TraverserTypeSecond & m_second;
bool m_use_second;

CombiningTraverser(
TraverserTypeFirst & io_first,
TraverserTypeSecond & io_second
)
: m_first( io_first ),
m_second( io_second ),
m_use_second( false )
{
}

bool GetChar( t_key_char & l_char )
{
if ( ! m_use_second )
{
if ( m_first.GetChar( l_char ) )
{
return true;
}
m_use_second = true;
}

return m_second.GetChar( l_char );
}

bool HasInput( bool i_wait )
{
if ( ! m_use_second )
{
if ( m_first.HasInput( i_wait ) )
{
return true;
}
m_use_second = true;
}

return m_second.HasInput( i_wait );
}

};
/**
* SimpleScanner is a simple scanner generator
*/

template <typename t_key_char, typename t_result>
class SimpleScanner
{
/**
* DFA_State contains a list of transitionstransitions
*/

struct DFA_State
{
typedef std::map<t_key_char, DFA_State * t_map_type;
typedef typename t_map_type::iterator t_iterator;
t_map_type m_transitions;

t_result m_terminal;
bool m_has_val;

DFA_State()
: m_terminal(),
m_has_val( false )
{
}

/**
* FindOrInsertTransition is used to construct the scanner
*/

DFA_State * FindOrInsertTransition( t_key_char i_char )
{
std::pair<t_iterator, booll_insert_result =
m_transitions.insert( typename t_map_type::value_type( i_char, 0 ) );

if ( ! l_insert_result.second )
{
return l_insert_result.first->second;
}

return l_insert_result.first->second = new DFA_State;
}
/**
* FindTransition is used to traverse the scanner
*/

DFA_State * FindTransition( t_key_char i_char )
{
t_iterator l_insert_result =
m_transitions.find( i_char );

if ( l_insert_result != m_transitions.end() )
{
return l_insert_result->second;
}

return 0;
}

};

struct DFA_Machine
{

DFA_State * m_initial_state;
DFA_State * m_current_state;
DFA_State * m_last_accept_state;
std::vector<t_key_char m_str;

DFA_Machine( DFA_State * i_initial_state )
: m_initial_state( i_initial_state ),
m_current_state( i_initial_state ),
m_last_accept_state( 0 )
{
}

/**
* NextChar will traverse the state machine with the next
* character and return the terminal t_result if one exists.
* If i_char does not make a valid transition, o_valid
* is set to false.
*/
bool NextChar( t_key_char i_char )
{
m_str.push_back( i_char );
DFA_State * l_next_state = m_current_state->FindTransition( i_char );

if ( l_next_state )
{
m_current_state = l_next_state;

// If there is an accepting state then we
// can roll back the push-back buffer.
if ( l_next_state->m_has_val )
{
m_last_accept_state = l_next_state;
m_str.clear();

}

return true;
}

m_current_state = m_initial_state;
return false;
}

template <typename Traverser>
bool ScanStream( Traverser & io_traverser, t_result & o_result )
{
t_key_char l_char;

while ( io_traverser.GetChar( l_char ) )
{
bool i_valid;

i_valid = NextChar( l_char );

DFA_State * l_last_accept_state = m_last_accept_state;

// If there are no more transitions or the last
if ( ( ! i_valid ) || ( m_current_state->m_transitions.size() == 0 ) )
{
if ( l_last_accept_state )
{
m_last_accept_state = 0;
m_current_state = m_initial_state;
if ( l_last_accept_state->m_has_val )
{
o_result = l_last_accept_state->m_terminal;
return true;
}
}
return false;
}

// There are transitions ...
assert( m_current_state->m_transitions.size() != 0 );

// If there are transitions (true here) and this is an interactive
// scan (waiting for user input) then wait a little longer, if there
// are no accept states - wait forever (which means calling GetChar).

if ( l_last_accept_state )
{
if ( ! io_traverser.HasInput( true ) )
{
// there is no longer any pending input. We're done.
m_last_accept_state = 0;
m_current_state = m_initial_state;
o_result = l_last_accept_state->m_terminal;
return true;
}
}
}

return false;
}
template <typename TraverserType>
bool DoScan( TraverserType & io_traverser, t_result & o_result )
{
std::vector<t_key_char l_str = std::vector<t_key_char>();
l_str.swap( m_str );

if ( l_str.size() != 0 )
{
IteratorTranverser< typename std::vector<t_key_char>::iterator, t_key_char l_tvsr(
l_str.begin(),
l_str.end()
);

CombiningTraverser<
IteratorTranverser< typename std::vector<t_key_char>::iterator, t_key_char >,
TraverserType,
t_key_char
l_combined( l_tvsr, io_traverser );
bool l_scanned = ScanStream( l_combined, o_result );

// may still have content locally - push that back into the
m_str.insert( m_str.end(), l_tvsr.m_from, l_tvsr.m_end );

return l_scanned;
}
else
{
return ScanStream( io_traverser, o_result );
}

return false;
}

bool HasInput( bool )
{
return m_str.size() != 0;
}

bool GetChar( t_key_char & l_char )
{
if ( m_str.size() != 0 )
{
l_char = m_str.front();
m_str.erase( m_str.begin() );
return true;
}
return false;
}

};

struct Scanner
{
DFA_State * m_initial_state;

Scanner()
: m_initial_state( new DFA_State )
{
}

DFA_Machine * NewMachine()
{
return new DFA_Machine( m_initial_state );
}

/**
* AddTerminal will add a terminal and will return the colliding
* terminal (if there is one)
*/

template <typename t_iterator>
bool AddTerminal(
int i_length,
t_iterator i_str,
const t_result & i_kd,
t_result & o_result
) {

DFA_State * l_curr_state = m_initial_state;

t_iterator l_str = i_str;

for ( int i = 0; i < i_length; ++ i )
{
DFA_State * l_next_state = l_curr_state->FindOrInsertTransition( * l_str );

++ l_str;

l_curr_state = l_next_state;
}

if ( l_curr_state->m_has_val )
{
// We have a collision !
o_result = l_curr_state->m_terminal;
return true;
}

l_curr_state->m_terminal = i_kd;
l_curr_state->m_has_val = true;

#if 0
// actually test the scanner to make sure that we decode what we expect
// to decode
std::auto_ptr<DFA_Machinel_machine( NewMachine() );

IteratorTranverser< t_iterator l_tvsr( i_str, i_str + i_length );

const t_result * l_kd2 = l_machine->ScanStream( l_tvsr );

// assert( l_kd2 == i_kd );

return 0;
#endif
return false;

}
};

Scanner m_scanner;
public:

struct Machine
{

DFA_Machine * m_machine;

Machine()
: m_machine( 0 )
{
}

~Machine()
{
if ( m_machine )
{
delete m_machine;
}
}

bool HasInput( bool )
{
if ( m_machine )
{
return m_machine->HasInput( false );
}
return false;
}

bool GetChar( t_key_char & l_char )
{
if ( m_machine )
{
return m_machine->GetChar( l_char );
}
return false;
}

private:

// no copies allowed
Machine( const Machine & );
Machine & operator=( const Machine & );

};

template <typename TraverserType>
bool Traverse( Machine & i_machine, TraverserType & io_traverser, t_result & o_kd )
{
DFA_Machine * l_machine = i_machine.m_machine;

if ( ! l_machine )
{
l_machine = i_machine.m_machine = m_scanner.NewMachine();
}

return l_machine->DoScan( io_traverser, o_kd );

}
bool AddTerminal(
int i_length,
const t_key_char * i_str,
const t_result & i_kd,
t_result & o_result
) {

return m_scanner.AddTerminal( i_length, i_str, i_kd, o_result );

}

bool AddTerminal(
const t_key_char * i_str,
const t_result & i_kd,
t_result & o_result
) {

return m_scanner.AddTerminal( std::strlen( i_str ), i_str, i_kd, o_result );

}

template < typename t_container >
bool AddTerminal(
const t_container i_str,
const t_result & i_kd,
t_result & o_result
) {

return m_scanner.AddTerminal( i_str.size(), i_str.begin(), i_kd, o_result );

}

}; // SimpleScanner


#include <string>
#include <iostream>
#include <ostream>
#include <istream>

class NoisyStr
{
public:
std::string m_value;

NoisyStr()
: m_value( "unassigned" )
{
}

NoisyStr( const std::string & i_value )
: m_value( i_value )
{
}

NoisyStr( const char * i_value )
: m_value( i_value )
{
}

NoisyStr( const NoisyStr & i_value )
: m_value( i_value.m_value )
{
std::cout << "Copied " << m_value << "\n";
}

NoisyStr & operator=( const NoisyStr & i_value )
{
std::cout << "Assigned " << m_value;
m_value = i_value.m_value;
std::cout << " to " << m_value << "\n";
return * this;
}

const char * c_str()
{
return m_value.c_str();
}
};

typedef std::string KeyType;

int main()
{
SimpleScanner< char, KeyType l_scanner;

KeyType l_collision;

l_scanner.AddTerminal( "abcde", "ZZ", l_collision );
l_scanner.AddTerminal( "xyz", "YY", l_collision );
l_scanner.AddTerminal( "dx_", "DX", l_collision );

static const char l_test[] = "abcde_test_abcdx_xyz";

std::cout << "scanning " << l_test << std::endl;
IteratorTranverser< const char *, char l_trav( l_test, l_test + sizeof( l_test ) -1 );

SimpleScanner< char, KeyType >::Machine machine;

KeyType l_result;

while (true )
{
if ( l_scanner.Traverse( machine, l_trav, l_result ) )
{
std::cout << l_result.c_str();
}
else
{
char l_ch;

if ( ! machine.GetChar( l_ch ) )
{
if ( ! l_trav.GetChar( l_ch ) )
{
break;
}
}

std::cout << l_ch;

}
}

std::cout << std::endl;

} // main

Apr 22 '07 #25
Gianni Mariani said:
Umesh wrote:
>/*Calculate the no. of occurance of 'ab'. But is this one OK/
EFFICIENT?*/
#include<stdio.h>
int main(void)
{
FILE *f;
long int c,c1,ab=1;
f=fopen("c:/1.txt","r");
while ((c=getc(f))!=EOF && (c1=getc(f))!=EOF) {
if(c=='a' && c1=='b') ab++;}
fclose(f);
printf("No. of 'ab' = %ld\n",ab);
return 0;
}

That will not work, it will fail for sequences like 'xab'.

The code I attached below shows how you can create a state machine to
search for any sequence of characters. You'll need to adapt it to
read
from a file (should be trivial). It's not designed for speed.
Nor for brevity - not at almost 600 lines.

Nor, alas, did gcc like it very much:

foo.c:1: map: No such file or directory
foo.c:2: vector: No such file or directory
foo.c:3: memory: No such file or directory
foo.c:4: cassert: No such file or directory
foo.c:250: unterminated character constant
foo.c:484: string: No such file or directory
foo.c:485: iostream: No such file or directory
foo.c:486: ostream: No such file or directory
foo.c:487: istream: No such file or directory
make: *** [foo.o] Error 1

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 22 '07 #26
Richard Heathfield wrote:
Gianni Mariani said:
>Umesh wrote:
>>/*Calculate the no. of occurance of 'ab'. But is this one OK/
EFFICIENT?*/
#include<stdio.h>
int main(void)
{
FILE *f;
long int c,c1,ab=1;
f=fopen("c:/1.txt","r");
while ((c=getc(f))!=EOF && (c1=getc(f))!=EOF) {
if(c=='a' && c1=='b') ab++;}
fclose(f);
printf("No. of 'ab' = %ld\n",ab);
return 0;
}
That will not work, it will fail for sequences like 'xab'.

The code I attached below shows how you can create a state machine to
search for any sequence of characters. You'll need to adapt it to
read
from a file (should be trivial). It's not designed for speed.

Nor for brevity - not at almost 600 lines.

Nor, alas, did gcc like it very much:
....

Try naming the file with a .cpp (C++ extension) and compiling it with a
C++ compiler.
Apr 22 '07 #27
Gianni Mariani said:
Richard Heathfield wrote:
>>
Nor, alas, did gcc like it very much:
...

Try naming the file with a .cpp (C++ extension) and compiling it with
a C++ compiler.
No, thank you. If I want C++, I know where to find it - but I don't
expect to find it in comp.lang.c.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 22 '07 #28
"James Kanze" <ja*********@gmail.comha scritto nel messaggio
news:11**********************@e65g2000hsc.googlegr oups.com...
In C, for this specific case, I'd probably write something like:
[snip]
FILE* f = fopen( "somefile.txt", "r" ) ;
What's the point of putting f so far on the right?
Anyhow, I'd write FILE *f rather than FILE* f, or else when you
write FILE* f, g you'll be surprised by results.

[snip]
exit( 2 ) ;
The behaviour of this is implementation-defined. On the DS9K,
exit(2) makes the program securely erase the whole disk on exit.
The standard way to do that is exit(EXIT_FAILURE);
int prev = '\0' ;
You can use declarations after statements only in C99. No problem
if you have a C99-compliant compiler (gcc isn't). The same for
declarations within the guard of a for loop.
Apr 22 '07 #29
"Bruce !C!+" <aa*******@163.comha scritto nel messaggio
news:11**********************@e65g2000hsc.googlegr oups.com...
my be it will like this,
while(ch=getc(f)!=EOF)-----------while(ch=getc(f)&&ch!=EOF)

{ //do something
It'll stop if it hits a null character.
Try while((ch = getc(f) != EOF)
Apr 22 '07 #30
Army1987 wrote:
"James Kanze" <ja*********@gmail.comha scritto nel messaggio
news:11**********************@e65g2000hsc.googlegr oups.com...
>In C, for this specific case, I'd probably write something like:
[snip]
> FILE* f = fopen( "somefile.txt", "r" ) ;
What's the point of putting f so far on the right?
Anyhow, I'd write FILE *f rather than FILE* f, or else when you
write FILE* f, g you'll be surprised by results.
Well I wouldn't write it that way. I'd write it:

FILE *f; /* or FILE* f; */
FILE *g;

Single line declarations are probably the best.
>
[snip]
> exit( 2 ) ;
The behaviour of this is implementation-defined. On the DS9K,
exit(2) makes the program securely erase the whole disk on exit.
The standard way to do that is exit(EXIT_FAILURE);
> int prev = '\0' ;
You can use declarations after statements only in C99. No problem
if you have a C99-compliant compiler (gcc isn't). The same for
declarations within the guard of a for loop.

Apr 22 '07 #31
* Army1987:
"James Kanze" <ja*********@gmail.comha scritto nel messaggio
news:11**********************@e65g2000hsc.googlegr oups.com...
>In C, for this specific case, I'd probably write something like:
[snip]
> FILE* f = fopen( "somefile.txt", "r" ) ;
What's the point of putting f so far on the right?
Anyhow, I'd write FILE *f rather than FILE* f, or else when you
write FILE* f, g you'll be surprised by results.
The problem you encounter is that it's an ungood idea to have multiple
declarators in one declaration.

That's what you shouldn't be doing.

And when you're not doing that ungood thing, writing FILE* f makes sense.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Apr 22 '07 #32
Umesh wrote:
I wanted to say that if I want to find no. of occurance of
'abcdefghijk' in a text file by modifying this program, it would be a
toilsome job. Is there any alternative? Thank you.
I've been using the C code below for awhile. It allows me to search for
any pattern of bytes, not just strings (the pattern can contain NULs, in
other words, and can be used to find number images and other patterns of
bits). The approach is

create a buffer to read blocks of the file into
fill the start of the buffer with patlen bytes
loop
fill the rest of the buffer with buflen - patlen bytes
for each byte position in the buffer
if match( buffer byte position, pattern )
return file position
endfor
/* if we're here, we didn't find a match */
copy patlen bytes from the end of the buffer to the start
endloop
return no match found

The business with moving patlen bytes from the end to the start of the
buffer is to catch cases where the match straddles a block boundary (it
starts at the end of one buflen block of bytes and finishes in the next
one).

I'd be interested in comments about the portability of this code. It
uses several standard library functions that take or return size_t's,
but I've used long or int for the arguments and variables.
/*
================================================== ====================
search()

Search an open file for a byte pattern.

INPUTS
fp the file to search
pos start the search at this byte position in the file
pat the bytes to search for
len the length of pat, in bytes

RESULTS
Returns the absolute file position at which the byte pattern was
first found, or the start position otherwise.

The search actually begins at pos + 1, so that repeated searches move
forward in the file, rather than repeatedly finding a match at the
same position. The pattern must be smaller than buf[].
================================================== ==================== */

static long search( FILE *fp, long pos, char *pat, int len )
{
static char buf[ 1024 ];
long rlen, oldpos;
int i, found = 0;

if ( len >= sizeof( buf ))
return pos;
oldpos = pos++;
if ( fseek( fp, pos, SEEK_SET ))
return oldpos;
rlen = fread( buf, 1, len, fp );
if ( rlen < len )
return oldpos;

while ( 1 ) {
rlen = fread( buf + len, 1, sizeof( buf ) - len, fp );
for ( i = 0; i < rlen + len; i++ )
if ( !memcmp( pat, &buf[ i ], len )) {
found = 1;
break;
}
if ( found || ( rlen < ( sizeof( buf ) - len ))) break;
memcpy( buf, buf + sizeof( buf ) - len, len );
pos += sizeof( buf ) - len;
}
return found ? pos + i : oldpos;
}
- Ernie http://home.comcast.net/~erniew
Apr 22 '07 #33
"Alf P. Steinbach" <al***@start.nowrites:
* Army1987:
>"James Kanze" <ja*********@gmail.comha scritto nel messaggio
news:11**********************@e65g2000hsc.googleg roups.com...
>>In C, for this specific case, I'd probably write something like:
[snip]
>> FILE* f = fopen( "somefile.txt", "r" ) ;
What's the point of putting f so far on the right?
Anyhow, I'd write FILE *f rather than FILE* f, or else when you
write FILE* f, g you'll be surprised by results.

The problem you encounter is that it's an ungood idea to have multiple
declarators in one declaration.

That's what you shouldn't be doing.
Agreed, mostly. There are times when combining multiple declarations
in one line can make sense:

int x, y, z;
int dx, dy, dz;

or perhaps

int x, dx;
int y, dy;
int z, dz;

(though structs might be even better). But yes, in most cases it's
clearer to have one declaration per line -- and if you prefer to make
that a strict rule, I won't disagree too strongly.
And when you're not doing that ungood thing, writing FILE* f makes sense.
There I disagree. If you're going to have declarations of any
significant complexity, you need to understand why C declaration are
the way they are (declaration follows use). With that understanding,
"FILE *f" just makes more sense.

FILE *f; /* *f is a FILE */
int a[10] /* a[10] is an int (or would be if it existed */

But in simple cases like "FILE *f" / "FILE* f", it isn't all that big
a deal. Declaring it as "FILE* f" implies that f is a FILE* -- which
happens to be true in this case, but the principle doens't extend to
other forms of declarations. As long as you can keep it straight,
I'll still *prefer* to keep the "*" next to the identifier, but I can
cope with other styles.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Apr 22 '07 #34
Richard Heathfield wrote:
Gianni Mariani said:
>Richard Heathfield wrote:
>>Nor, alas, did gcc like it very much:
...

Try naming the file with a .cpp (C++ extension) and compiling it with
a C++ compiler.

No, thank you. If I want C++, I know where to find it - but I don't
expect to find it in comp.lang.c.
Then stop posting to comp.lang.c++
Apr 22 '07 #35
Gianni Mariani said:
Richard Heathfield wrote:
>Gianni Mariani said:
>>Richard Heathfield wrote:
Nor, alas, did gcc like it very much:
...

Try naming the file with a .cpp (C++ extension) and compiling it
with a C++ compiler.

No, thank you. If I want C++, I know where to find it - but I don't
expect to find it in comp.lang.c.

Then stop posting to comp.lang.c++
I'm reading this in comp.lang.c, where you posted off-topic C++ code.
Please don't do that again. Thank you.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 22 '07 #36
Richard Heathfield wrote:
Gianni Mariani said:
Richard Heathfield wrote:
Gianni Mariani said:

Richard Heathfield wrote:
Nor, alas, did gcc like it very much:
...

Try naming the file with a .cpp (C++ extension) and compiling it
with a C++ compiler.

No, thank you. If I want C++, I know where to find it - but I don't
expect to find it in comp.lang.c.
Then stop posting to comp.lang.c++

I'm reading this in comp.lang.c, where you posted off-topic C++ code.
Please don't do that again. Thank you.
I'm sorry Richard, but you're just way off-base here. When messages are
cross-posted like that, you can't use your own group's topicality to
override the other's.

Brian

Apr 22 '07 #37
On Apr 22, 1:27 pm, Gianni Mariani <gi3nos...@mariani.wswrote:
The code I attached below shows how you can create a state machine to
search for any sequence of characters. You'll need to adapt it to read
from a file (should be trivial). It's not designed for speed.

#include <map>
#include <vector>
#include <memory>
#include <cassert>

// ======== IteratorTranverser =======================================
* IteratorTranverser is a template class that iterates through
* a pointer or iterator. The pointers passed to it must be valid
* for the life-time of this object.
*/

template <typename Itr, typename t_key_char>
[snip]

What on earth possessed you to send this to comp.lang.c? Do you
realize that C != C++?

Apr 22 '07 #38
Fr************@googlemail.com wrote:
>
What on earth possessed you to send this to comp.lang.c? Do you
realize that C != C++?
Do you realise the message has been cross-posted?

--
Ian Collins.
Apr 22 '07 #39
On Apr 22, 11:06 pm, Ian Collins <ian-n...@hotmail.comwrote:
Francine.Ne...@googlemail.com wrote:
What on earth possessed you to send this to comp.lang.c? Do you
realize that C != C++?

Do you realise the message has been cross-posted?
I don't see how that makes it acceptable - he didn't even set
followups to comp.lang.c++.
--
Ian Collins.

Apr 22 '07 #40
Fr************@googlemail.com wrote:
On Apr 22, 11:06 pm, Ian Collins <ian-n...@hotmail.comwrote:
>Francine.Ne...@googlemail.com wrote:
>>What on earth possessed you to send this to comp.lang.c? Do you
realize that C != C++?
Do you realise the message has been cross-posted?

I don't see how that makes it acceptable - he didn't even set
followups to comp.lang.c++.
If the OP wanted a C solution, then the poster should have posted to
comp.lang.c only. If you don't want C++ discussion in comp.lang.c then
ask the OP to not cross post or somthing. Seeing as you're posting to
comp.lang.c++, it's hard to argue that there should be no C++ discussion...

<dig motive=jest>

Anyway, you should learn to program in a real language .... :-)

</dig>

Apr 22 '07 #41
In the following, "that ungood thing" refers to multiple
declarations in a single line:

Keith Thompson wrote:
"Alf P. Steinbach" <a...@start.nowrites:
And when you're not doing that ungood thing, writing FILE* f makes sense.

There I disagree. If you're going to have declarations of any
significant complexity, you need to understand why C declaration are
the way they are (declaration follows use). With that understanding,
"FILE *f" just makes more sense.

FILE *f; /* *f is a FILE */
int a[10] /* a[10] is an int (or would be if it existed */
When I look to the declaration section of a block to see
what type f is, I'm interested in seeing what type f is. I don't
want to see what *f is. In other words, I'd rather derive
information about *f from information about f rather
than the other way around.
But in simple cases like "FILE *f" / "FILE* f", it isn't all that big
a deal. Declaring it as "FILE* f" implies that f is a FILE* -- which
happens to be true in this case, but the principle doens't extend to
other forms of declarations. As long as you can keep it straight,
I'll still *prefer* to keep the "*" next to the identifier, but I can
cope with other styles.
In working code, what percentage of declarations are not the
"simple case"? Declarations of function pointers obviously
require special treatment, as do
complex array types, but "T * x" is probably valid 95% of the time.
And the non-simple cases could perhaps be made clearer: eg

typedef int (*int_int_fcn)(int);

int_int_fcn * h;
int (**f)(int);
.....hmmm, perhaps this is a bad example...or perhaps you are,
as usual, correct. The declaration of h was supposed to be
clearer than the declaration of f, but I would argue that this is
not the case.

Apr 23 '07 #42
Fr************@googlemail.com wrote:

What on earth possessed you to send this to comp.lang.c? Do you
realize that C != C++?

See what I said to Richard. It's wildly unfair to expect other people
to anticipate what's topical regarding other groups in a cross-posted
message, or to refrain from giving replies that are perfectly correct
and topical for their group.

It's the fault of the OP, but that doesn't matter. It would have been
helpful for the clc++ people to cut clc out the their replies, but none
of the clc did that either, so there's plenty of blame to go around.


Brian
Apr 23 '07 #43
Fr************@googlemail.com wrote:
On Apr 22, 11:06 pm, Ian Collins <ian-n...@hotmail.comwrote:
Francine.Ne...@googlemail.com wrote:
What on earth possessed you to send this to comp.lang.c? Do you
realize that C != C++?
Do you realise the message has been cross-posted?

I don't see how that makes it acceptable - he didn't even set
followups to comp.lang.c++.
What would follow-ups have to do with it? Did you mean "adjust the
newsgroup list"?

Brian
Apr 23 '07 #44
Default User said:
Richard Heathfield wrote:
<snip>
>I'm reading this in comp.lang.c, where you posted off-topic C++ code.
Please don't do that again. Thank you.

I'm sorry Richard, but you're just way off-base here. When messages
are cross-posted like that, you can't use your own group's topicality
to override the other's.
Why not? He did.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 23 '07 #45
Richard Heathfield wrote:
Default User said:
Richard Heathfield wrote:
<snip>
I'm reading this in comp.lang.c, where you posted off-topic C++
code. >Please don't do that again. Thank you.

I'm sorry Richard, but you're just way off-base here. When messages
are cross-posted like that, you can't use your own group's
topicality to override the other's.

Why not? He did.
He complained about your posts being off-topic? I must have missed that.


Brian
Apr 23 '07 #46
Default User said:
Richard Heathfield wrote:
>Default User said:
Richard Heathfield wrote:
<snip>
>I'm reading this in comp.lang.c, where you posted off-topic C++
code. >Please don't do that again. Thank you.
>
I'm sorry Richard, but you're just way off-base here. When messages
are cross-posted like that, you can't use your own group's
topicality to override the other's.

Why not? He did.

He complained about your posts being off-topic?
No, he used the topicality of C++ in comp.lang.c++ to override
comp.lang.c's topic, which is C, not C++.
I must have missed that.
I must have missed the point where this discussion got very dull, but
I'm sure we have passed it by now.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 23 '07 #47
On Apr 23, 5:59 am, "Default User" <defaultuse...@yahoo.comwrote:
Francine.Ne...@googlemail.com wrote:
I don't see how that makes it acceptable - he didn't even set
followups to comp.lang.c++.

What would follow-ups have to do with it? Did you mean "adjust the
newsgroup list"?
It would be reasonable to say (perhaps not in so many words): "The OP
started a discussion in both clc and clc++. His original post, and all
discussion so far, involved only C code, and shouldn't really have
been crossposted to clc++, for which I curse the OP (and other clc'ers
who replied without spotting that it had been crossposted and removing
clc++). However, as it happens I have an interesting approach to this
in C++. So in case anyone reading the thread in clc would be
interested, I'll crosspost the code, but set followups to clc++ so
that any resulting discussion goes to the right place".
Brian

Apr 23 '07 #48
On Apr 22, 8:00 pm, Ernie Wright <ern...@comcast.netwrote:
Umesh wrote:
I wanted to say that if I want to find no. of occurance of
'abcdefghijk' in a text file by modifying this program, it would be a
toilsome job. Is there any alternative? Thank you.
I've been using the C code below for awhile. It allows me to search for
any pattern of bytes, not just strings (the pattern can contain NULs, in
other words, and can be used to find number images and other patterns of
bits). The approach is
create a buffer to read blocks of the file into
fill the start of the buffer with patlen bytes
loop
fill the rest of the buffer with buflen - patlen bytes
for each byte position in the buffer
if match( buffer byte position, pattern )
return file position
endfor
/* if we're here, we didn't find a match */
copy patlen bytes from the end of the buffer to the start
endloop
return no match found
The business with moving patlen bytes from the end to the start of the
buffer is to catch cases where the match straddles a block boundary (it
starts at the end of one buflen block of bytes and finishes in the next
one).
That brings back memories:-). I remember doing something
similar back in the 1980's. Using a buffer size of around 32KB,
but with a BM search, instead of trying to match starting at
each character; the results were over twice as fast as fgrep on
the machine I was using. (The strings we were searching were
typically between 8 and 16 characters in length, which meant
that using BM search really made a difference.)

If you need to be very, very fast, over large quantities of
data, it's still probably the best solution (although I'd
consider mmap as well). But it's not something I'd start out
with.

For most uses, maintaining a sliding window into the file,
either with std::deque, std::vector, or a circular queue (or
their equivalents under C) is a lot quicker and easier to
implement.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 23 '07 #49
On Apr 22, 5:58 pm, "Army1987" <please....@for.itwrote:
"James Kanze" <james.ka...@gmail.comha scritto nel messaggionews:11**********************@e65g2000hsc .googlegroups.com...
In C, for this specific case, I'd probably write something like:
[snip]
FILE* f = fopen( "somefile.txt", "r" ) ;
What's the point of putting f so far on the right?
To make declarations look different from expression statements,
and to allow easily finding what is being declared. Lining up
the symbols being declared has been part of the coding
guidelines in most of the places where I used C; for some
reason, it seems less popular in C++, although I find it even
more important there.
Anyhow, I'd write FILE *f rather than FILE* f, or else when you
write FILE* f, g you'll be surprised by results.
Every coding guideline I've ever seen has forbidden defining
more than one variable per statement. It's just bad programming
practice. Other than that, it's mostly a matter of taste. I
find it cleaner to separate the type from what is being defined.
[snip] exit( 2 ) ;
The behaviour of this is implementation-defined.
True. I guess I'm just too used to Unix (and it also works
under Windows). EXIT_FAILURE is more portable. (But it
provides less information.)
On the DS9K,
exit(2) makes the program securely erase the whole disk on exit.
The standard way to do that is exit(EXIT_FAILURE);
And the Unix way is exit(2).
int prev = '\0' ;
You can use declarations after statements only in C99. No
problem if you have a C99-compliant compiler (gcc isn't). The
same for declarations within the guard of a for loop.
I compiled the code with gcc, just to make sure:
gcc -std=c99 -pedantic -Wall
This is 2007, you know, not 1983 (when I first learned C).

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 23 '07 #50

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

22
by: Ling Lee | last post by:
Hi all. I'm trying to write a program that: 1) Ask me what file I want to count number of lines in, and then counts the lines and writes the answear out. 2) I made the first part like this: ...
1
by: Rigga | last post by:
Hi, I am new to Python and need to parse a text file and cut parts out i.e. say the text file contained 5 rows of text: line 1 of the text file line 2 of the text file line 3 of the text...
27
by: Eric | last post by:
Assume that disk space is not an issue (the files will be small < 5k in general for the purpose of storing preferences) Assume that transportation to another OS may never occur. Are there...
16
by: thenightfly | last post by:
Ok, I know all about how binary numbers translate into text characters. My question is what exactly IS a text character? Is it a bitmap?
7
by: Chris | last post by:
Hi I can use a text file as a datasource but am unable to get the datatable to see the text file as having multiple columns. Everything gets put into the first column in the datatable. Sample of...
3
by: bbepristis | last post by:
Hey all I have this code that reads from one text file writes to another unless im on a certian line then it writes the new data however it only seems to do about 40 lines then quits and I cant...
1
by: Osoccer | last post by:
...to a different folder and in the relocated file concatenates all of the lines in one long string with a space between each line element. Here is a fuller statement of the problem: I need a...
10
by: bluemountain | last post by:
Hi there, Iam new to python forms and programming too I had a text file where i need to extract few words of data from the header(which is of 3 lines) and search for the keyword TEXT1, TEXT2,...
0
Debadatta Mishra
by: Debadatta Mishra | last post by:
Introduction In this article I will provide you an approach to manipulate an image file. This article gives you an insight into some tricks in java so that you can conceal sensitive information...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.