By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,274 Members | 2,229 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,274 IT Pros & Developers. It's quick & easy.

Any search pattern method recommed for mmap memory

P: n/a
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

Sep 21 '07 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
Google for Boyer-Moore, I suspect...
Sep 21 '07 #2

P: n/a
Owen Zhang wrote:
>
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
You don't need the 'virtual memory'. Look the following over.

/*
Leor Zolman wrote:
On 25 Feb 2004 07:34:40 -0800, jo**@ljungh.se (spike) wrote:
>Im trying to write a program that should read through a binary
file searching for the character sequence "\name\"

Then it should read the characters following the "\name\"
sequence until a NULL character is encountered.

But when my program runs it gets a SIGSEGV (Segmentation
vioalation) signal.

Whats wrong? And is there a better way than mine to solve
this task (most likely)

I think so. Here's a version I just threw together:
*/

/* And heres another throw -- binfsrch.c by CBF */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>

/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action. In some cases
the presence of 0x1a EOF markers (MsDos) does.

This is a version of Knuth-Morris-Pratt algorithm. The
point of using this is to avoid any backtracking in file
reading, and thus avoiding any use of buffer arrays.
*/

size_t chrcount; /* debuggery, count of input chars, zeroed */

/* --------------------- */

/* Almost straight out of Sedgewick */
/* The next array indicates what index in id should next be
compared to the current char. Once the (lgh - 1)th char
has been successfully compared, the id has been found.
The array is formed by comparing id to itself. */
void initnext(int *next, const char *id, int lgh)
{
int i, j;

assert(lgh 0);
next[0] = -1; i = 0; j = -1;
while (i < lgh) {
while ((j >= 0) && (id[i] != id[j])) j = next[j];
i++; j++;
next[i] = j;
}
#ifdef DEBUG
for (i = 0; i <= lgh; i++)
printf("id[%d] = '%c' next[%d] = %d\n",
i, id[i], i, next[i]);
#endif
} /* initnext */

/* --------------------- */

/* reads f without rewinding until either EOF or *marker
has been found. Returns EOF if not found. At exit the
last matching char has been read, and no further. */
int kmpffind(const char *marker, int lgh, int *next, FILE *f)
{
int j; /* char position in marker to check */
int ch; /* current char */

assert(lgh 0);
j = 0;
while ((j < lgh) && (EOF != (ch = getc(f)))) {
chrcount++;
while ((j >= 0) && (ch != marker[j])) j = next[j];
j++;
}
return ch;
} /* kmpffind */

/* --------------------- */

/* Find marker in f, display following printing chars
up to some non printing character or EOF */
int binfsrch(const char *marker, FILE *f)
{
int *next;
int lgh;
int ch;
int items; /* count of markers found */

lgh = strlen(marker);
if (!(next = malloc(1 + lgh * sizeof *next))) {
puts("No memory");
exit(EXIT_FAILURE);
}
else {
initnext(next, marker, lgh);
items = 0;
while (EOF != kmpffind(marker, lgh, next, f)) {
/* found, take appropriate action */
items++;
printf("%d %s : \"", items, marker);
while (isprint(ch = getc(f))) {
chrcount++;
putchar(ch);
}
puts("\"");
if (EOF == ch) break;
else chrcount++;
}
free(next);
return items;
}
} /* binfsrch */

/* --------------------- */

int main(int argc, char **argv)
{
FILE *f;

f = stdin;
if (3 == argc) {
if (!(f = fopen(argv[2], "rb"))) {
printf("Can't open %s\n", argv[2]);
exit(EXIT_FAILURE);
}
argc--;
}
if (2 != argc) {
puts("Usage: binfsrch name [binaryfile]");
puts(" (file defaults to stdin text mode)");
}
else if (binfsrch(argv[1], f)) {
printf("\"%s\" : found\n", argv[1]);
}
else printf("\"%s\" : not found\n", argv[1]);
printf("%lu chars\n", (unsigned long)chrcount);
return 0;
} /* main binfsrch */

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
--
Posted via a free Usenet account from http://www.teranews.com

Sep 21 '07 #3

P: n/a
Owen Zhang <ow***************@gmail.comwrites:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The mmap() function is not part of standard C. If it's relevant to
your question, you should ask in a system-specific newsgroup, most
likely comp.unix.programmer.

But I don't see how it's relevant. Is there some reason you think
searching a chunk of memory allocated by mmap is different from
searching any other chunk of memory?

Standard C provides some simple searching functions such as strstr().
If that doesn't suit your needs, then you probably have an algorithm
question; comp.programming is likely to be the best place to ask.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 21 '07 #4

P: n/a
Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

The on-topic answer is: strstr().

mmap() specific considerations, should rather be posted to "c.u.programmer".

--
Tor <torust [at] online [dot] no>
Sep 21 '07 #5

P: n/a
In article <11*********************@57g2000hsv.googlegroups.c om>,
Owen Zhang <ow***************@gmail.comwrote:
>I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
I think about all you can say is that a method that access data
sequentially rather than randomly is likely to work better, because it
matches disk access better. That's assuming you don't have any kind
of indexing of course.

-- Richard

--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Sep 21 '07 #6

P: n/a
Tor Rustad <to********@hotmail.comwrites:
Owen Zhang wrote:
>I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

The on-topic answer is: strstr().
[...]

Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 21 '07 #7

P: n/a
Keith Thompson wrote:
Tor Rustad <to********@hotmail.comwrites:
>Owen Zhang wrote:
>>I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The on-topic answer is: strstr().
[...]

Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.
I don't follow.. why can't OP check for extra requirements after each
match by strstr()?
OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().
--
Tor <torust [at] online [dot] no>
Sep 21 '07 #8

P: n/a
Tor Rustad <to********@hotmail.comwrites:
Keith Thompson wrote:
>Tor Rustad <to********@hotmail.comwrites:
>>Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The on-topic answer is: strstr().
[...]
Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.

I don't follow.. why can't OP check for extra requirements after each
match by strstr()?
Yes, he could do that, but it might not be as efficient as a more
specialized search. If the keyword is sufficiently short, for
example, there might be a lot of false positives. But again, we don't
know much about the OP's requirements.
OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().
I hadn't thought of that, though it shouldn't be to hard to address
it.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 21 '07 #9

P: n/a
Keith Thompson wrote:
Tor Rustad <to********@hotmail.comwrites:
>Keith Thompson wrote:
>>Tor Rustad <to********@hotmail.comwrites:
Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The on-topic answer is: strstr().
[...]
Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.
I don't follow.. why can't OP check for extra requirements after each
match by strstr()?

Yes, he could do that, but it might not be as efficient as a more
specialized search. If the keyword is sufficiently short, for
example, there might be a lot of false positives. But again, we don't
know much about the OP's requirements.
There "might" be a lot of false positives, particularly if Keith is
allowed to construct that input file! :)

OTOH, let say OP want to scan C source files for keywords, will there
normally be more matches for "int" than [ \t]?

If complex matching is required, OP should rather look into using a
regular expression library, or a lex tool. No reason to reinvent the
wheel for this.

>OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().

I hadn't thought of that, though it shouldn't be to hard to address
it.
I had the case in mind, where other programs access the file simultaneously.

--
Tor <torust [at] online [dot] no>
Sep 22 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.