"Walter Dnes (delete the 'z' to get my real address)" wrote:[color=blue]
> <Thomas_MatthewsSpamBotsSuck@sbcglobal.net> wrote:
>[color=green]
>> Since C has pointers, why do you need to use memmove() a lot?
>> Moving large blocks of memory is a waste of computer resources.[/color]
>
> Maybe I've chosen the wrong algorithm. I need to search for
> byte-arrays 255 bytes or less in a binary file. I am using the
> term "byte-arrays", *NOT STRINGS*, because they can contain '\0'
> as a valid 'character'. I was thinking something along the
> lines of...
>
> 1) given a byte-array-to-search-for
> 2) read in first 256 bytes of file into buffer
>
> Beginning of outer loop
> 3) read in next 64 kbytes of file into buffer, starting at
> byte 256
>
> Beginning of inner loop
> 4) use memchr() to find address of byte in buffer that
> matches first byte of byte-array-to-search-for
> 5) use memcmp() to check if entire byte-array-to-search-for
> is matched at that location
> 6) start search after the match, to see if any more
> matches, repeating until search hits end of buffer
> End of of inner loop
>
> 7) move last 256 bytes of of buffer to beginning of buffer
> End of outer loop
>
> Step 7 (outer loop) is the memory moving part. Until such time
> as disk-threshing happens, the bigger the buffer, the better.
> If there's a better algorithm, please do tell, and point me to
> it. Text editors have probably invented that wheel already, but
> do they handle '\0' as a valid 'character'?[/color]
There is definitely a better algorithm, requiring no buffer
whatsoever. A modification of the following will do your job, and
you don't have to dump the following string. It won't input
strings including '\0', but you can arrange to alter that.
/*
Leor Zolman wrote:[color=blue]
> On 25 Feb 2004 07:34:40 -0800,
joan@ljungh.se (spike) wrote:
>[color=green]
>> Im trying to write a program that should read through a binary
>> file searching for the character sequence "\name\"
>>
>> Then it should read the characters following the "\name\"
>> sequence until a NULL character is encountered.
>>
>> But when my program runs it gets a SIGSEGV (Segmentation
>> vioalation) signal.
>>
>> Whats wrong? And is there a better way than mine to solve
>>this task (most likely)[/color]
>
> I think so. Here's a version I just threw together:[/color]
*/
/* And heres another throw -- binfsrch.c by CBF */
/* Released to public domain. Attribution appreciated */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>
/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action. In some cases
the presence of 0x1a EOF markers (MsDos) does.
This is a version of Knuth-Morris-Pratt algorithm. The
point of using this is to avoid any backtracking in file
reading, and thus avoiding any use of buffer arrays.
*/
size_t chrcount; /* debuggery, count of input chars, zeroed */
/* --------------------- */
/* Almost straight out of Sedgewick */
/* The next array indicates what index in id should next be
compared to the current char. Once the (lgh - 1)th char
has been successfully compared, the id has been found.
The array is formed by comparing id to itself. */
void initnext(int *next, const char *id, int lgh)
{
int i, j;
assert(lgh > 0);
next[0] = -1; i = 0; j = -1;
while (i < lgh) {
while ((j >= 0) && (id[i] != id[j])) j = next[j];
i++; j++;
next[i] = j;
}
#if (0)
for (i = 0; i < lgh; i++)
printf("id[%d] = '%c' next[%d] = %d\n",
i, id[i], i, next[i]);
#endif
} /* initnext */
/* --------------------- */
/* reads f without rewinding until either EOF or *marker
has been found. Returns EOF if not found. At exit the
last matching char has been read, and no further. */
int kmpffind(const char *marker, int lgh, int *next, FILE *f)
{
int j; /* char position in marker to check */
int ch; /* current char */
assert(lgh > 0);
j = 0;
while ((j < lgh) && (EOF != (ch = getc(f)))) {
chrcount++;
while ((j >= 0) && (ch != marker[j])) j = next[j];
j++;
}
return ch;
} /* kmpffind */
/* --------------------- */
/* Find marker in f, display following printing chars
up to some non printing character or EOF */
int binfsrch(const char *marker, FILE *f)
{
int *next;
int lgh;
int ch;
int items; /* count of markers found */
lgh = strlen(marker);
if (!(next = malloc(lgh * sizeof *next))) {
puts("No memory");
exit(EXIT_FAILURE);
}
else {
initnext(next, marker, lgh);
items = 0;
while (EOF != kmpffind(marker, lgh, next, f)) {
/* found, take appropriate action */
items++;
printf("%d %s : \"", items, marker);
while (isprint(ch = getc(f))) {
chrcount++;
putchar(ch);
}
puts("\"");
if (EOF == ch) break;
else chrcount++;
}
free(next);
return items;
}
} /* binfsrch */
/* --------------------- */
int main(int argc, char **argv)
{
FILE *f;
f = stdin;
if (3 == argc) {
if (!(f = fopen(argv[2], "rb"))) {
printf("Can't open %s\n", argv[2]);
exit(EXIT_FAILURE);
}
argc--;
}
if (2 != argc) {
puts("Usage: binfsrch name [binaryfile]");
puts(" (file defaults to stdin text mode)");
}
else if (binfsrch(argv[1], f)) {
printf("\"%s\" : found\n", argv[1]);
}
else printf("\"%s\" : not found\n", argv[1]);
printf("%lu chars\n", (unsigned long)chrcount);
return 0;
} /* main binfsrch */
--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?