473,403 Members | 2,270 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,403 software developers and data experts.

Any search pattern method recommed for mmap memory

I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

Sep 21 '07 #1
9 2293
Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
Google for Boyer-Moore, I suspect...
Sep 21 '07 #2
Owen Zhang wrote:
>
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
You don't need the 'virtual memory'. Look the following over.

/*
Leor Zolman wrote:
On 25 Feb 2004 07:34:40 -0800, jo**@ljungh.se (spike) wrote:
>Im trying to write a program that should read through a binary
file searching for the character sequence "\name\"

Then it should read the characters following the "\name\"
sequence until a NULL character is encountered.

But when my program runs it gets a SIGSEGV (Segmentation
vioalation) signal.

Whats wrong? And is there a better way than mine to solve
this task (most likely)

I think so. Here's a version I just threw together:
*/

/* And heres another throw -- binfsrch.c by CBF */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>

/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action. In some cases
the presence of 0x1a EOF markers (MsDos) does.

This is a version of Knuth-Morris-Pratt algorithm. The
point of using this is to avoid any backtracking in file
reading, and thus avoiding any use of buffer arrays.
*/

size_t chrcount; /* debuggery, count of input chars, zeroed */

/* --------------------- */

/* Almost straight out of Sedgewick */
/* The next array indicates what index in id should next be
compared to the current char. Once the (lgh - 1)th char
has been successfully compared, the id has been found.
The array is formed by comparing id to itself. */
void initnext(int *next, const char *id, int lgh)
{
int i, j;

assert(lgh 0);
next[0] = -1; i = 0; j = -1;
while (i < lgh) {
while ((j >= 0) && (id[i] != id[j])) j = next[j];
i++; j++;
next[i] = j;
}
#ifdef DEBUG
for (i = 0; i <= lgh; i++)
printf("id[%d] = '%c' next[%d] = %d\n",
i, id[i], i, next[i]);
#endif
} /* initnext */

/* --------------------- */

/* reads f without rewinding until either EOF or *marker
has been found. Returns EOF if not found. At exit the
last matching char has been read, and no further. */
int kmpffind(const char *marker, int lgh, int *next, FILE *f)
{
int j; /* char position in marker to check */
int ch; /* current char */

assert(lgh 0);
j = 0;
while ((j < lgh) && (EOF != (ch = getc(f)))) {
chrcount++;
while ((j >= 0) && (ch != marker[j])) j = next[j];
j++;
}
return ch;
} /* kmpffind */

/* --------------------- */

/* Find marker in f, display following printing chars
up to some non printing character or EOF */
int binfsrch(const char *marker, FILE *f)
{
int *next;
int lgh;
int ch;
int items; /* count of markers found */

lgh = strlen(marker);
if (!(next = malloc(1 + lgh * sizeof *next))) {
puts("No memory");
exit(EXIT_FAILURE);
}
else {
initnext(next, marker, lgh);
items = 0;
while (EOF != kmpffind(marker, lgh, next, f)) {
/* found, take appropriate action */
items++;
printf("%d %s : \"", items, marker);
while (isprint(ch = getc(f))) {
chrcount++;
putchar(ch);
}
puts("\"");
if (EOF == ch) break;
else chrcount++;
}
free(next);
return items;
}
} /* binfsrch */

/* --------------------- */

int main(int argc, char **argv)
{
FILE *f;

f = stdin;
if (3 == argc) {
if (!(f = fopen(argv[2], "rb"))) {
printf("Can't open %s\n", argv[2]);
exit(EXIT_FAILURE);
}
argc--;
}
if (2 != argc) {
puts("Usage: binfsrch name [binaryfile]");
puts(" (file defaults to stdin text mode)");
}
else if (binfsrch(argv[1], f)) {
printf("\"%s\" : found\n", argv[1]);
}
else printf("\"%s\" : not found\n", argv[1]);
printf("%lu chars\n", (unsigned long)chrcount);
return 0;
} /* main binfsrch */

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
--
Posted via a free Usenet account from http://www.teranews.com

Sep 21 '07 #3
Owen Zhang <ow***************@gmail.comwrites:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The mmap() function is not part of standard C. If it's relevant to
your question, you should ask in a system-specific newsgroup, most
likely comp.unix.programmer.

But I don't see how it's relevant. Is there some reason you think
searching a chunk of memory allocated by mmap is different from
searching any other chunk of memory?

Standard C provides some simple searching functions such as strstr().
If that doesn't suit your needs, then you probably have an algorithm
question; comp.programming is likely to be the best place to ask.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 21 '07 #4
Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

The on-topic answer is: strstr().

mmap() specific considerations, should rather be posted to "c.u.programmer".

--
Tor <torust [at] online [dot] no>
Sep 21 '07 #5
In article <11*********************@57g2000hsv.googlegroups.c om>,
Owen Zhang <ow***************@gmail.comwrote:
>I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
I think about all you can say is that a method that access data
sequentially rather than randomly is likely to work better, because it
matches disk access better. That's assuming you don't have any kind
of indexing of course.

-- Richard

--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Sep 21 '07 #6
Tor Rustad <to********@hotmail.comwrites:
Owen Zhang wrote:
>I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

The on-topic answer is: strstr().
[...]

Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 21 '07 #7
Keith Thompson wrote:
Tor Rustad <to********@hotmail.comwrites:
>Owen Zhang wrote:
>>I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The on-topic answer is: strstr().
[...]

Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.
I don't follow.. why can't OP check for extra requirements after each
match by strstr()?
OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().
--
Tor <torust [at] online [dot] no>
Sep 21 '07 #8
Tor Rustad <to********@hotmail.comwrites:
Keith Thompson wrote:
>Tor Rustad <to********@hotmail.comwrites:
>>Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The on-topic answer is: strstr().
[...]
Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.

I don't follow.. why can't OP check for extra requirements after each
match by strstr()?
Yes, he could do that, but it might not be as efficient as a more
specialized search. If the keyword is sufficiently short, for
example, there might be a lot of false positives. But again, we don't
know much about the OP's requirements.
OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().
I hadn't thought of that, though it shouldn't be to hard to address
it.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 21 '07 #9
Keith Thompson wrote:
Tor Rustad <to********@hotmail.comwrites:
>Keith Thompson wrote:
>>Tor Rustad <to********@hotmail.comwrites:
Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The on-topic answer is: strstr().
[...]
Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.
I don't follow.. why can't OP check for extra requirements after each
match by strstr()?

Yes, he could do that, but it might not be as efficient as a more
specialized search. If the keyword is sufficiently short, for
example, there might be a lot of false positives. But again, we don't
know much about the OP's requirements.
There "might" be a lot of false positives, particularly if Keith is
allowed to construct that input file! :)

OTOH, let say OP want to scan C source files for keywords, will there
normally be more matches for "int" than [ \t]?

If complex matching is required, OP should rather look into using a
regular expression library, or a lex tool. No reason to reinvent the
wheel for this.

>OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().

I hadn't thought of that, though it shouldn't be to hard to address
it.
I had the case in mind, where other programs access the file simultaneously.

--
Tor <torust [at] online [dot] no>
Sep 22 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Hao Xu | last post by:
Hi everyone! I found that if you want to write to the memory got by mmap(), you have to get the file descriptor for mmap() in O_RDWR mode. If you got the file descriptor in O_WRONLY mode, then...
60
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't...
4
by: Fabiano Sidler | last post by:
Hi folks! I created an mmap object like so: --- snip --- from mmap import mmap,MAP_ANONYMOUS,MAP_PRIVATE fl = file('/dev/zero','rw') mm = mmap(fl.fileno(), 1, MAP_PRIVATE|MAP_ANONYMOUS) ---...
1
by: Carl Mackey | last post by:
hi, i'm new to this list and new to python as well. i have a question on the memory mapped file ability python has. when i use a mmap on a file, will it copy the whole thing to ram or just...
26
by: myeates | last post by:
Hi Anyone ever done this? It looks like Python2.4 won't take a length arg Mathew
13
by: George Sakkis | last post by:
I've been trying to track down a memory leak (which I initially attributed erroneously to numpy) and it turns out to be caused by a memory mapped file. It seems that mmap caches without limit the...
2
by: beejisbrigit | last post by:
Hi there, I was wondering if anyone had experience with File I/O in Java vs. C++ using mmap(), and knew if the performance was better in one that the other, or more or less negligible. My...
1
by: sam_cit | last post by:
Hi Everyone, I searched for mmap() and i found the following in wikipedia, 'Anonymous mappings are mappings of physical RAM to virtual memory. This is similar to malloc, and is used in some...
5
by: Matias Surdi | last post by:
Suppose I've a process P1, which generates itself a lot of data , for example 2Mb. Then, I've a process P2 which must access P1 shared memory and, probably, modify this data. To accomplish this,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.