Built for speed - mmap, threads

Michael

I'm writing an application that decodes a file containing binary
records. Each record is a particular event type. Each record is
translated into ASCII and then written to a file. Each file contains
the same events. At the moment each record is processed one after the
other. It taks about 1m40s to process a large file containing 70,000
records. Would my application benifit from multiple threads and mmap?

If so what is the best way to manage the multiple output files? For
example there are 20 event types. When parsing the file I identify the
event type and build 20 lists. Then have 20 threads each working with
each event file.

How do I extract this into classes?

Feb 19 '06 #1

Subscribe Reply

3107

Ian Collins

Michael wrote:

I'm writing an application that decodes a file containing binary
records. Each record is a particular event type. Each record is
translated into ASCII and then written to a file. Each file contains
the same events. At the moment each record is processed one after the
other. It taks about 1m40s to process a large file containing 70,000
records. Would my application benifit from multiple threads and mmap?
Well that all depends on ow many cores you have to run them on, and is
bit OT here. You'll have better luck on comp.programmin g.threads, or
one specific to your platform.
If so what is the best way to manage the multiple output files? For
example there are 20 event types. When parsing the file I identify the
event type and build 20 lists. Then have 20 threads each working with
each event file.
Odds are that'll slow you down due to context switches, assuming you
have less tan 20 cores.
How do I extract this into classes?

Again, try comp.programmin g.threads, maybe with an example of how you
think it could be done.

--
Ian Collins.

Feb 19 '06 #2

Michael

OK, thanks will try threads. Target is 8 sun sparc IV dual core CPU.

Feb 19 '06 #3

Ivan Vecerina

"Michael" <ch******@evolv ing.com> wrote in message
news:11******** *************@g 14g2000cwa.goog legroups.com...
: I'm writing an application that decodes a file containing binary
: records. Each record is a particular event type. Each record is
: translated into ASCII and then written to a file. Each file contains
: the same events. At the moment each record is processed one after the
: other. It taks about 1m40s to process a large file containing 70,000
: records. Would my application benifit from multiple threads and mmap?

You don't say how much processing is being performed on the events,
or what is the actual size of the file/each record.

Using memory-mapping will typically help a lot if the performance
is i/o bound. It also often simplifies the reading/processing of
the data. So it something I often do upfront.

However, 100s for 70k records seems relatively long, so would assume
that your are doing quite some processing. It is likely that this
processing itself (its algorithms) could be improved quite a bit.
You should use a profiler and find out what is being the most
time-consuming -- you might find an obvious culprit.

: If so what is the best way to manage the multiple output files? For
: example there are 20 event types. When parsing the file I identify the
: event type and build 20 lists. Then have 20 threads each working with
: each event file.

Regarding the output: it might be good to prepare the output
in a memory buffer, and to write/flush them in large chunks.
But this all depends on your current memory usage, etc.

Using multiple threads will not automatically improve performance,
unless you carfully craft your design based on a thorough analysis.
Just creating one thread for each output file typically won't help.

: How do I extract this into classes?

What do you think you need classes for?
By the way, your question has nothing to do with the C++ language,
and therefore doesn't belong in this NG.
Try a platform-specific forum?
hth -Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form

Feb 19 '06 #4

Greg

Michael wrote:

I'm writing an application that decodes a file containing binary
records. Each record is a particular event type. Each record is
translated into ASCII and then written to a file. Each file contains
the same events. At the moment each record is processed one after the
other. It taks about 1m40s to process a large file containing 70,000
records. Would my application benifit from multiple threads and mmap?
The answer is a definite maybe. The threads question is highly hardware
dependent. Multiple threads are most effective on machines with
multiple processors. Otherwise, simply increasing the number of threads
does not increase a machine's processing power to like degree. In fact
because switching between threads entails some overhead, it is just as
possible to wind up with too many threads instead of too few when
guessing blindly for the optimal number.

Since you have not provided a detailed description about the
application's current memory use and I/O characteristics , it is
impossible to say whether mmap would help or not. And the first order
of business in any case has to be to profile the current app and find
out how it is spending those 100 seconds. If 90% of that time is in
parsing code, than no, mmap will be unlikely to help. If, on the other
hand, a large portion of that time is spent in disk I/O operations (as
is often the case), then yes, a few large read and write operations
(instead of many little ones) will do more to improve performance than
almost any other type of optimization. But without knowing the extent
to which the current application has optimized its behavior, it's
futile to estimate how much further its performance could be optimized.
If so what is the best way to manage the multiple output files? For
example there are 20 event types. When parsing the file I identify the
event type and build 20 lists. Then have 20 threads each working with
each event file.
Unless the hardware has a lot of multiprocessing capability, 20 threads
sound like far too many. But only profiling and testing various
implementations will be able to find the optimal number of threads for
this app running on a particular hardware configuation.

As for the 20 event types, I would not do anything fancy. If the 20
possible types are fixed, then declaring an array of 20 file handles
with using an enum as an index into that array to find the
corresponding file handle should suffice. Just avoid "magic numbers"
like 20, and define const integral values in their place.
How do I extract this into classes?

I'm not sure that a program that performs a linear processing task
benefits a great deal from classes. Classes (and a class hierarchy)
work best as a dynamic model - often one driven by an ever-changing
series of events (often generated by the user's interaction with the
application). A program that opens a file, parses its contents, closes
the file and declares itself done is really conducting a series of
predictable sequential operations. And the only reason for wanting to
use classes here would be for maintainability (because I can't see that
performance issues would ever mandate implementing classes).

So the question to ask is whether classes would necessarily make the
code more maintainable? A well-designed and implemented class model
should, but otherwise a class model designed for its own sake would
probably be harder to maintain. Because a class hierarchy of any kind,
almost always increases the total complexity of a program (in other
words there is more code). But because code in a well-designed
hierarchy better encapsulates its complexity, a programmer is able to
work on the program's logic in smaller "pieces" (thereby reducing the
complexity that the programmer has to deal with at any one time).

Lastly, maintainability is a separate issue from performance. And one
that should be addressed first. It wouldn't make sense to fine tune the
app's performance if its code is going to be thrown out and replaced
with an object-oriented implemnentation in the final, shipping version.

So to recap: first, decide whether (and then implement, if the decision
is affirmative) a class hierarchy would improve the maintainability of
the source code to such an extent that would justify the additional
work. Second, profile the app to obtain a precise accounting of the 100
seconds it spends processing records. Next, use that profile
information to target bottlenecks: remedy them using standard
optimization techniques (such as using fewer I/O requests by increasing
the size of each request, or, if parsing is the bottleneck, use a table
driven for maximal speed). And lastly the most important point: it's
simply never effective to try to speed up a program, without first
learning why it is so slow.

Greg

Feb 19 '06 #5

Michael

Sorry was half thinking about how to write this?

I know where all the time is being spent as I timed each task as I was
developing. For each record I am setting a TCL array and then dumping
to file. I still need to add logic but I am concentrating on raw speed
at the moment end to end.

I already have the decoding part which goes through the file and
creates an index. It is 1 class. It taks about 2 seconds to create the
index on a 30M file - 70000 records. The index is public so I can
directly access this index to get an offset to different parts of the
file. The file is loaded into memory at startup but I will eventually
mmap it - once I work out how to and if it makes a difference to
performance.

I was thinking of creating another class which would be the decode
thread manager. This would decide how many threads were needed for a
particular file, create the threads and then balance the load on each
thread by deciding which records each thread would process. A thread
would store output data in a buffer which would then be copied and
flush to file. Memory isn't a problem I have 32GB to play around with.

Feb 19 '06 #6

Michael

Thanks Greg,

I'm using C++ because I haven't used C for ages and don't wont to mess
around with memory management and pointers - core dumps. It's quicker
for me to write code to store things like configuration in vectors and
let them deal with cleaning up memory. I only have one 'new/delete' and
that is to create a large buffer to hold the contents of the file in
memory - this will eventually disappear once I get mmap working - in
cygwin/g++. I'm not that bothered about memory overhead of using
vectors as I've got 32GB to work with.

I did some rough profiling. Without writing to file the processing
(paring file, setting internal TCL variables) maximises the CPU usage.
With writing to disk, the CPU usage goes down to 35% (2 CPU Sparc III)
and there is I/O wait. So with threads and mmap I'm hoping that I will
make maximum avalaible usage to hardware.

Michael

Feb 19 '06 #7

Ivan Vecerina

"Michael" <ch******@evolv ing.com> wrote in message
news:11******** **************@ g43g2000cwa.goo glegroups.com.. .
: Sorry was half thinking about how to write this?
:
: I know where all the time is being spent as I timed each task as I was
: developing. For each record I am setting a TCL array and then dumping
: to file. I still need to add logic but I am concentrating on raw speed
: at the moment end to end.
I do not know what a TCL array is (TCL/TK, Think Class Library, or??).

: I already have the decoding part which goes through the file and
: creates an index. It is 1 class. It taks about 2 seconds to create the
: index on a 30M file - 70000 records. The index is public so I can
: directly access this index to get an offset to different parts of the
: file. The file is loaded into memory at startup but I will eventually
: mmap it - once I work out how to and if it makes a difference to
: performance.
You say you did time measurements, yet you only account for 2sec
out of 100. Using a profiler will highlight the hot spots in your
program to a single line. Only this will allow you to identify,
for example, that you spend too much time in memory allocations,
or searches, and allow you to optimize your algorithms and data
structures.

: I was thinking of creating another class which would be the decode
: thread manager. This would decide how many threads were needed for a
: particular file, create the threads and then balance the load on each
: thread by deciding which records each thread would process. A thread
: would store output data in a buffer which would then be copied and
: flush to file. Memory isn't a problem I have 32GB to play around with.
Good in terms of caching file outputs.
Keep in mind, though, that memory accesses are nowadays what takes
the most time in all simple-to-moderately complex processing algos.
Avoiding reallocations, and using contiguous memory accesses, can
make a real difference.

Again, don't bother using threads until you have analyzed the
performance profile of your application.
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form

Feb 19 '06 #8

Similar topics

2819

mmap parsing...

by: netbogus | last post by:

hi, I have a file stored in memory using mmap() and I'd like to parse to read line by line. Also, there are several threads that read this buffer so I think strtok(p, "\n") wouldnt be a good choice. I'd like to hear from you guys what would be a good implementation in this case. thanks in advance,

C / C++

1435

Advice on forum to built

by: frizzle | last post by:

Hi there, I'm going to build a simple forum in mySQL. I've thought about it for a while now, but still can't figure it out completely: If i have say 5 main categories, One has 5 sub-categories. Each sub category consists of individual threads. Wich each has the possibility to be commented.

PHP

3709

unable to resize mmap object

by: Fabiano Sidler | last post by:

Hi folks! I created an mmap object like so: --- snip --- from mmap import mmap,MAP_ANONYMOUS,MAP_PRIVATE fl = file('/dev/zero','rw') mm = mmap(fl.fileno(), 1, MAP_PRIVATE|MAP_ANONYMOUS) --- snap --- Now, when I try to resize mm to 10 byte

Python

5470

mmap thoughts

by: James T. Dennis | last post by:

I've been thinking about the Python mmap module quite a bit during the last couple of days. Sadly most of it has just been thinking ... and reading pages from Google searches ... and very little of it as been coding. Mostly it's just academic curiosity (I might be teaching an "overview of programming" class in a few months, and I'd use Python for most of the practical examples to cover a broad range of programming topics, including the...

Python

2695

mmap disk performance

by: koara | last post by:

Hello all, i am using the mmap module (python2.4) to access contents of a file. My question regards the relative performance of mmap.seek() vs mmap.tell(). I have a generator that returns stuff from the file, piece by piece. Since other things may happen to the mmap object in between consecutive next() calls (such as another iterator's next()), i have to store the file position before yield and restore it afterwards by means of tell()...

Python

2421

Questions about mmap()

by: Unknown Soldier | last post by:

Hello, I have a couple queries about mmap() that ppl here might be able to help with. 1. What's the best way to resize an mmap()d area when you've enlarged the file? Can you call mmap() again with the existing mapped location, or do you need to remap it from scratch? 2. If I have a file descriptor returned by a call to socket(), can I

C / C++

4637

problem with mmap

by: Neal Becker | last post by:

On linux, I don't understand why: f = open ('/dev/eos', 'rw') m = mmap.mmap(f.fileno(), 1000000, prot=mmap.PROT_READ|mmap.PROT_WRITE, flags=mmap.MAP_SHARED) gives 'permission denied', but this c++ code works: #include <sys/mman.h> #include <fcntl.h>

Python

1150

mmap class has slow "in" operator

by: Kris Kennaway | last post by:

If I do the following: def mmap_search(f, string): fh = file(f) mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) return mm.find(string) def mmap_is_in(f, string): fh = file(f)

Python

1122

Re: mmap class has slow "in" operator

by: Gabriel Genellina | last post by:

En Thu, 29 May 2008 19:17:05 -0300, Kris Kennaway <kris@FreeBSD.org> escribió: Looks like you should define the sq_contains member in mmap_as_sequence, and the type should have the Py_TPFLAGS_HAVE_SEQUENCE_IN flag set (all in mmapmodule.c) -- Gabriel Genellina

Python

9673

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9525

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10169

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

10003

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

5440

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5569

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4115

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3730

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2924

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General