473,666 Members | 2,088 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Any search pattern method recommed for mmap memory

I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

Sep 21 '07 #1
9 2307
Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
Google for Boyer-Moore, I suspect...
Sep 21 '07 #2
Owen Zhang wrote:
>
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
You don't need the 'virtual memory'. Look the following over.

/*
Leor Zolman wrote:
On 25 Feb 2004 07:34:40 -0800, jo**@ljungh.se (spike) wrote:
>Im trying to write a program that should read through a binary
file searching for the character sequence "\name\"

Then it should read the characters following the "\name\"
sequence until a NULL character is encountered.

But when my program runs it gets a SIGSEGV (Segmentation
vioalation) signal.

Whats wrong? And is there a better way than mine to solve
this task (most likely)

I think so. Here's a version I just threw together:
*/

/* And heres another throw -- binfsrch.c by CBF */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>

/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action. In some cases
the presence of 0x1a EOF markers (MsDos) does.

This is a version of Knuth-Morris-Pratt algorithm. The
point of using this is to avoid any backtracking in file
reading, and thus avoiding any use of buffer arrays.
*/

size_t chrcount; /* debuggery, count of input chars, zeroed */

/* --------------------- */

/* Almost straight out of Sedgewick */
/* The next array indicates what index in id should next be
compared to the current char. Once the (lgh - 1)th char
has been successfully compared, the id has been found.
The array is formed by comparing id to itself. */
void initnext(int *next, const char *id, int lgh)
{
int i, j;

assert(lgh 0);
next[0] = -1; i = 0; j = -1;
while (i < lgh) {
while ((j >= 0) && (id[i] != id[j])) j = next[j];
i++; j++;
next[i] = j;
}
#ifdef DEBUG
for (i = 0; i <= lgh; i++)
printf("id[%d] = '%c' next[%d] = %d\n",
i, id[i], i, next[i]);
#endif
} /* initnext */

/* --------------------- */

/* reads f without rewinding until either EOF or *marker
has been found. Returns EOF if not found. At exit the
last matching char has been read, and no further. */
int kmpffind(const char *marker, int lgh, int *next, FILE *f)
{
int j; /* char position in marker to check */
int ch; /* current char */

assert(lgh 0);
j = 0;
while ((j < lgh) && (EOF != (ch = getc(f)))) {
chrcount++;
while ((j >= 0) && (ch != marker[j])) j = next[j];
j++;
}
return ch;
} /* kmpffind */

/* --------------------- */

/* Find marker in f, display following printing chars
up to some non printing character or EOF */
int binfsrch(const char *marker, FILE *f)
{
int *next;
int lgh;
int ch;
int items; /* count of markers found */

lgh = strlen(marker);
if (!(next = malloc(1 + lgh * sizeof *next))) {
puts("No memory");
exit(EXIT_FAILU RE);
}
else {
initnext(next, marker, lgh);
items = 0;
while (EOF != kmpffind(marker , lgh, next, f)) {
/* found, take appropriate action */
items++;
printf("%d %s : \"", items, marker);
while (isprint(ch = getc(f))) {
chrcount++;
putchar(ch);
}
puts("\"");
if (EOF == ch) break;
else chrcount++;
}
free(next);
return items;
}
} /* binfsrch */

/* --------------------- */

int main(int argc, char **argv)
{
FILE *f;

f = stdin;
if (3 == argc) {
if (!(f = fopen(argv[2], "rb"))) {
printf("Can't open %s\n", argv[2]);
exit(EXIT_FAILU RE);
}
argc--;
}
if (2 != argc) {
puts("Usage: binfsrch name [binaryfile]");
puts(" (file defaults to stdin text mode)");
}
else if (binfsrch(argv[1], f)) {
printf("\"%s\" : found\n", argv[1]);
}
else printf("\"%s\" : not found\n", argv[1]);
printf("%lu chars\n", (unsigned long)chrcount);
return 0;
} /* main binfsrch */

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net>
--
Posted via a free Usenet account from http://www.teranews.com

Sep 21 '07 #3
Owen Zhang <ow************ ***@gmail.comwr ites:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The mmap() function is not part of standard C. If it's relevant to
your question, you should ask in a system-specific newsgroup, most
likely comp.unix.progr ammer.

But I don't see how it's relevant. Is there some reason you think
searching a chunk of memory allocated by mmap is different from
searching any other chunk of memory?

Standard C provides some simple searching functions such as strstr().
If that doesn't suit your needs, then you probably have an algorithm
question; comp.programmin g is likely to be the best place to ask.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 21 '07 #4
Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

The on-topic answer is: strstr().

mmap() specific considerations, should rather be posted to "c.u.programmer ".

--
Tor <torust [at] online [dot] no>
Sep 21 '07 #5
In article <11************ *********@57g20 00hsv.googlegro ups.com>,
Owen Zhang <ow************ ***@gmail.comwr ote:
>I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
I think about all you can say is that a method that access data
sequentially rather than randomly is likely to work better, because it
matches disk access better. That's assuming you don't have any kind
of indexing of course.

-- Richard

--
"Considerat ion shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Sep 21 '07 #6
Tor Rustad <to********@hot mail.comwrites:
Owen Zhang wrote:
>I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?

The on-topic answer is: strstr().
[...]

Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 21 '07 #7
Keith Thompson wrote:
Tor Rustad <to********@hot mail.comwrites:
>Owen Zhang wrote:
>>I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The on-topic answer is: strstr().
[...]

Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.
I don't follow.. why can't OP check for extra requirements after each
match by strstr()?
OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().
--
Tor <torust [at] online [dot] no>
Sep 21 '07 #8
Tor Rustad <to********@hot mail.comwrites:
Keith Thompson wrote:
>Tor Rustad <to********@hot mail.comwrites:
>>Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The on-topic answer is: strstr().
[...]
Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.

I don't follow.. why can't OP check for extra requirements after each
match by strstr()?
Yes, he could do that, but it might not be as efficient as a more
specialized search. If the keyword is sufficiently short, for
example, there might be a lot of false positives. But again, we don't
know much about the OP's requirements.
OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().
I hadn't thought of that, though it shouldn't be to hard to address
it.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 21 '07 #9
Keith Thompson wrote:
Tor Rustad <to********@hot mail.comwrites:
>Keith Thompson wrote:
>>Tor Rustad <to********@hot mail.comwrites:
Owen Zhang wrote:
I have a file loaded into virtual memory space by mmap. I need to
search some key word inside the memory opened by mmap. What is the
best and efficient way to do?
The on-topic answer is: strstr().
[...]
Sure, but strstr() simply searches for a specified substring, not
necessarily for a "keyword" (which may imply it's delimited somehow).
Without more information, we can't be sure whether strstr will do the
job or not.
I don't follow.. why can't OP check for extra requirements after each
match by strstr()?

Yes, he could do that, but it might not be as efficient as a more
specialized search. If the keyword is sufficiently short, for
example, there might be a lot of false positives. But again, we don't
know much about the OP's requirements.
There "might" be a lot of false positives, particularly if Keith is
allowed to construct that input file! :)

OTOH, let say OP want to scan C source files for keywords, will there
normally be more matches for "int" than [ \t]?

If complex matching is required, OP should rather look into using a
regular expression library, or a lex tool. No reason to reinvent the
wheel for this.

>OTOH, files are typically not null terminated, but I didn't bother to
check if OP needed to address this issue when using mmap().

I hadn't thought of that, though it shouldn't be to hard to address
it.
I had the case in mind, where other programs access the file simultaneously.

--
Tor <torust [at] online [dot] no>
Sep 22 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3529
by: Hao Xu | last post by:
Hi everyone! I found that if you want to write to the memory got by mmap(), you have to get the file descriptor for mmap() in O_RDWR mode. If you got the file descriptor in O_WRONLY mode, then writing to the memory got by mmap() will lead to segmentation fault. Anyone knows why? Is this a rule or a bug? What if I just want to write to the file and nothing else?
60
49063
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't be indexed/sorted. I don't want to load the entire file into physical memory, memory-mapped files are ok (and preferred). Speed/performance is a requirement -- the target is to locate the string in 10 seconds or less for a 100 MB file. The...
4
3697
by: Fabiano Sidler | last post by:
Hi folks! I created an mmap object like so: --- snip --- from mmap import mmap,MAP_ANONYMOUS,MAP_PRIVATE fl = file('/dev/zero','rw') mm = mmap(fl.fileno(), 1, MAP_PRIVATE|MAP_ANONYMOUS) --- snap --- Now, when I try to resize mm to 10 byte
1
2170
by: Carl Mackey | last post by:
hi, i'm new to this list and new to python as well. i have a question on the memory mapped file ability python has. when i use a mmap on a file, will it copy the whole thing to ram or just whatever part of it i'm working on? basically, i'm wondering if it would be ok for me to have multiple mmap's open on very large files as i read or write from them.
26
9305
by: myeates | last post by:
Hi Anyone ever done this? It looks like Python2.4 won't take a length arg Mathew
13
3510
by: George Sakkis | last post by:
I've been trying to track down a memory leak (which I initially attributed erroneously to numpy) and it turns out to be caused by a memory mapped file. It seems that mmap caches without limit the chunks it reads, as the memory usage grows to several hundreds MBs according to the Windows task manager before it dies with a MemoryError. I'm positive that these chunks are not referenced anywhere else; in fact if I change the mmap object to a...
2
4923
by: beejisbrigit | last post by:
Hi there, I was wondering if anyone had experience with File I/O in Java vs. C++ using mmap(), and knew if the performance was better in one that the other, or more or less negligible. My instinct would say C++ is faster, but Java has made some improvements with its FileChannel class.
1
1457
by: sam_cit | last post by:
Hi Everyone, I searched for mmap() and i found the following in wikipedia, 'Anonymous mappings are mappings of physical RAM to virtual memory. This is similar to malloc, and is used in some malloc implementations for certain allocations' I understood that the memory contents are mapped to files using mmap()
5
6063
by: Matias Surdi | last post by:
Suppose I've a process P1, which generates itself a lot of data , for example 2Mb. Then, I've a process P2 which must access P1 shared memory and, probably, modify this data. To accomplish this, I've been digging around python's mmap module, but I can't figure how to use it without files. Could anybody explain me how could this be accomplished? An example will be very appreciated. Thanks a lot for your help.
0
8449
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8876
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8642
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7387
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6198
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5666
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4198
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4371
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
1777
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.