473,503 Members | 6,587 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Built for speed - mmap, threads

I'm writing an application that decodes a file containing binary
records. Each record is a particular event type. Each record is
translated into ASCII and then written to a file. Each file contains
the same events. At the moment each record is processed one after the
other. It taks about 1m40s to process a large file containing 70,000
records. Would my application benifit from multiple threads and mmap?

If so what is the best way to manage the multiple output files? For
example there are 20 event types. When parsing the file I identify the
event type and build 20 lists. Then have 20 threads each working with
each event file.

How do I extract this into classes?

Feb 19 '06 #1
7 3087
Michael wrote:
I'm writing an application that decodes a file containing binary
records. Each record is a particular event type. Each record is
translated into ASCII and then written to a file. Each file contains
the same events. At the moment each record is processed one after the
other. It taks about 1m40s to process a large file containing 70,000
records. Would my application benifit from multiple threads and mmap?
Well that all depends on ow many cores you have to run them on, and is
bit OT here. You'll have better luck on comp.programming.threads, or
one specific to your platform.
If so what is the best way to manage the multiple output files? For
example there are 20 event types. When parsing the file I identify the
event type and build 20 lists. Then have 20 threads each working with
each event file.
Odds are that'll slow you down due to context switches, assuming you
have less tan 20 cores.
How do I extract this into classes?

Again, try comp.programming.threads, maybe with an example of how you
think it could be done.

--
Ian Collins.
Feb 19 '06 #2
OK, thanks will try threads. Target is 8 sun sparc IV dual core CPU.

Feb 19 '06 #3
"Michael" <ch******@evolving.com> wrote in message
news:11*********************@g14g2000cwa.googlegro ups.com...
: I'm writing an application that decodes a file containing binary
: records. Each record is a particular event type. Each record is
: translated into ASCII and then written to a file. Each file contains
: the same events. At the moment each record is processed one after the
: other. It taks about 1m40s to process a large file containing 70,000
: records. Would my application benifit from multiple threads and mmap?

You don't say how much processing is being performed on the events,
or what is the actual size of the file/each record.

Using memory-mapping will typically help a lot if the performance
is i/o bound. It also often simplifies the reading/processing of
the data. So it something I often do upfront.

However, 100s for 70k records seems relatively long, so would assume
that your are doing quite some processing. It is likely that this
processing itself (its algorithms) could be improved quite a bit.
You should use a profiler and find out what is being the most
time-consuming -- you might find an obvious culprit.

: If so what is the best way to manage the multiple output files? For
: example there are 20 event types. When parsing the file I identify the
: event type and build 20 lists. Then have 20 threads each working with
: each event file.

Regarding the output: it might be good to prepare the output
in a memory buffer, and to write/flush them in large chunks.
But this all depends on your current memory usage, etc.

Using multiple threads will not automatically improve performance,
unless you carfully craft your design based on a thorough analysis.
Just creating one thread for each output file typically won't help.

: How do I extract this into classes?

What do you think you need classes for?
By the way, your question has nothing to do with the C++ language,
and therefore doesn't belong in this NG.
Try a platform-specific forum?
hth -Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Feb 19 '06 #4

Michael wrote:
I'm writing an application that decodes a file containing binary
records. Each record is a particular event type. Each record is
translated into ASCII and then written to a file. Each file contains
the same events. At the moment each record is processed one after the
other. It taks about 1m40s to process a large file containing 70,000
records. Would my application benifit from multiple threads and mmap?
The answer is a definite maybe. The threads question is highly hardware
dependent. Multiple threads are most effective on machines with
multiple processors. Otherwise, simply increasing the number of threads
does not increase a machine's processing power to like degree. In fact
because switching between threads entails some overhead, it is just as
possible to wind up with too many threads instead of too few when
guessing blindly for the optimal number.

Since you have not provided a detailed description about the
application's current memory use and I/O characteristics, it is
impossible to say whether mmap would help or not. And the first order
of business in any case has to be to profile the current app and find
out how it is spending those 100 seconds. If 90% of that time is in
parsing code, than no, mmap will be unlikely to help. If, on the other
hand, a large portion of that time is spent in disk I/O operations (as
is often the case), then yes, a few large read and write operations
(instead of many little ones) will do more to improve performance than
almost any other type of optimization. But without knowing the extent
to which the current application has optimized its behavior, it's
futile to estimate how much further its performance could be optimized.
If so what is the best way to manage the multiple output files? For
example there are 20 event types. When parsing the file I identify the
event type and build 20 lists. Then have 20 threads each working with
each event file.
Unless the hardware has a lot of multiprocessing capability, 20 threads
sound like far too many. But only profiling and testing various
implementations will be able to find the optimal number of threads for
this app running on a particular hardware configuation.

As for the 20 event types, I would not do anything fancy. If the 20
possible types are fixed, then declaring an array of 20 file handles
with using an enum as an index into that array to find the
corresponding file handle should suffice. Just avoid "magic numbers"
like 20, and define const integral values in their place.
How do I extract this into classes?


I'm not sure that a program that performs a linear processing task
benefits a great deal from classes. Classes (and a class hierarchy)
work best as a dynamic model - often one driven by an ever-changing
series of events (often generated by the user's interaction with the
application). A program that opens a file, parses its contents, closes
the file and declares itself done is really conducting a series of
predictable sequential operations. And the only reason for wanting to
use classes here would be for maintainability (because I can't see that
performance issues would ever mandate implementing classes).

So the question to ask is whether classes would necessarily make the
code more maintainable? A well-designed and implemented class model
should, but otherwise a class model designed for its own sake would
probably be harder to maintain. Because a class hierarchy of any kind,
almost always increases the total complexity of a program (in other
words there is more code). But because code in a well-designed
hierarchy better encapsulates its complexity, a programmer is able to
work on the program's logic in smaller "pieces" (thereby reducing the
complexity that the programmer has to deal with at any one time).

Lastly, maintainability is a separate issue from performance. And one
that should be addressed first. It wouldn't make sense to fine tune the
app's performance if its code is going to be thrown out and replaced
with an object-oriented implemnentation in the final, shipping version.

So to recap: first, decide whether (and then implement, if the decision
is affirmative) a class hierarchy would improve the maintainability of
the source code to such an extent that would justify the additional
work. Second, profile the app to obtain a precise accounting of the 100
seconds it spends processing records. Next, use that profile
information to target bottlenecks: remedy them using standard
optimization techniques (such as using fewer I/O requests by increasing
the size of each request, or, if parsing is the bottleneck, use a table
driven for maximal speed). And lastly the most important point: it's
simply never effective to try to speed up a program, without first
learning why it is so slow.

Greg

Feb 19 '06 #5
Sorry was half thinking about how to write this?

I know where all the time is being spent as I timed each task as I was
developing. For each record I am setting a TCL array and then dumping
to file. I still need to add logic but I am concentrating on raw speed
at the moment end to end.

I already have the decoding part which goes through the file and
creates an index. It is 1 class. It taks about 2 seconds to create the
index on a 30M file - 70000 records. The index is public so I can
directly access this index to get an offset to different parts of the
file. The file is loaded into memory at startup but I will eventually
mmap it - once I work out how to and if it makes a difference to
performance.

I was thinking of creating another class which would be the decode
thread manager. This would decide how many threads were needed for a
particular file, create the threads and then balance the load on each
thread by deciding which records each thread would process. A thread
would store output data in a buffer which would then be copied and
flush to file. Memory isn't a problem I have 32GB to play around with.

Feb 19 '06 #6
Thanks Greg,

I'm using C++ because I haven't used C for ages and don't wont to mess
around with memory management and pointers - core dumps. It's quicker
for me to write code to store things like configuration in vectors and
let them deal with cleaning up memory. I only have one 'new/delete' and
that is to create a large buffer to hold the contents of the file in
memory - this will eventually disappear once I get mmap working - in
cygwin/g++. I'm not that bothered about memory overhead of using
vectors as I've got 32GB to work with.

I did some rough profiling. Without writing to file the processing
(paring file, setting internal TCL variables) maximises the CPU usage.
With writing to disk, the CPU usage goes down to 35% (2 CPU Sparc III)
and there is I/O wait. So with threads and mmap I'm hoping that I will
make maximum avalaible usage to hardware.

Michael

Feb 19 '06 #7
"Michael" <ch******@evolving.com> wrote in message
news:11**********************@g43g2000cwa.googlegr oups.com...
: Sorry was half thinking about how to write this?
:
: I know where all the time is being spent as I timed each task as I was
: developing. For each record I am setting a TCL array and then dumping
: to file. I still need to add logic but I am concentrating on raw speed
: at the moment end to end.
I do not know what a TCL array is (TCL/TK, Think Class Library, or??).

: I already have the decoding part which goes through the file and
: creates an index. It is 1 class. It taks about 2 seconds to create the
: index on a 30M file - 70000 records. The index is public so I can
: directly access this index to get an offset to different parts of the
: file. The file is loaded into memory at startup but I will eventually
: mmap it - once I work out how to and if it makes a difference to
: performance.
You say you did time measurements, yet you only account for 2sec
out of 100. Using a profiler will highlight the hot spots in your
program to a single line. Only this will allow you to identify,
for example, that you spend too much time in memory allocations,
or searches, and allow you to optimize your algorithms and data
structures.

: I was thinking of creating another class which would be the decode
: thread manager. This would decide how many threads were needed for a
: particular file, create the threads and then balance the load on each
: thread by deciding which records each thread would process. A thread
: would store output data in a buffer which would then be copied and
: flush to file. Memory isn't a problem I have 32GB to play around with.
Good in terms of caching file outputs.
Keep in mind, though, that memory accesses are nowadays what takes
the most time in all simple-to-moderately complex processing algos.
Avoiding reallocations, and using contiguous memory accesses, can
make a real difference.

Again, don't bother using threads until you have analyzed the
performance profile of your application.
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Feb 19 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
2799
by: netbogus | last post by:
hi, I have a file stored in memory using mmap() and I'd like to parse to read line by line. Also, there are several threads that read this buffer so I think strtok(p, "\n") wouldnt be a good...
6
1410
by: frizzle | last post by:
Hi there, I'm going to build a simple forum in mySQL. I've thought about it for a while now, but still can't figure it out completely: If i have say 5 main categories, One has 5...
4
3684
by: Fabiano Sidler | last post by:
Hi folks! I created an mmap object like so: --- snip --- from mmap import mmap,MAP_ANONYMOUS,MAP_PRIVATE fl = file('/dev/zero','rw') mm = mmap(fl.fileno(), 1, MAP_PRIVATE|MAP_ANONYMOUS) ---...
1
5449
by: James T. Dennis | last post by:
I've been thinking about the Python mmap module quite a bit during the last couple of days. Sadly most of it has just been thinking ... and reading pages from Google searches ... and very little...
1
2679
by: koara | last post by:
Hello all, i am using the mmap module (python2.4) to access contents of a file. My question regards the relative performance of mmap.seek() vs mmap.tell(). I have a generator that returns...
8
2401
by: Unknown Soldier | last post by:
Hello, I have a couple queries about mmap() that ppl here might be able to help with. 1. What's the best way to resize an mmap()d area when you've enlarged the file? Can you call mmap() again...
2
4612
by: Neal Becker | last post by:
On linux, I don't understand why: f = open ('/dev/eos', 'rw') m = mmap.mmap(f.fileno(), 1000000, prot=mmap.PROT_READ|mmap.PROT_WRITE, flags=mmap.MAP_SHARED) gives 'permission denied', but...
0
1131
by: Kris Kennaway | last post by:
If I do the following: def mmap_search(f, string): fh = file(f) mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) return mm.find(string) def mmap_is_in(f, string): fh =...
0
1101
by: Gabriel Genellina | last post by:
En Thu, 29 May 2008 19:17:05 -0300, Kris Kennaway <kris@FreeBSD.org> escribió: Looks like you should define the sq_contains member in mmap_as_sequence, and the type should have the...
0
7067
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7316
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6975
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
1
4992
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4666
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3160
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
1495
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
728
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
371
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.