473,473 Members | 1,984 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Increasing C++ throughput

SzH
I need to read very large text files and do some simple processing on
them. I'm trying to make this as fast as possible. Before spending a
lot of time with it and going through a series of futile attempts to
optimize this, I thought I'd post here and ask where to start.

Consider this very small program for outputting every 10th line:

------------ filt.cpp ----------------

#include <iostream>
#include <string>

using namespace std;

int main() {
ios::sync_with_stdio(false);

string line;
unsigned long nr = 0;
while (getline(cin, line))
if (nr++ % 10 == 0)
cout << line << '\n';
return 0;
}

-----------------------------------

How can this be made faster? I know very little about C++ I/O. I
usually only do simple numerical stuff, and I think that to speed this
up, one needs to be familiar with how I/O works (internally) in the C+
+ standard library.

First I found that ios::sync_with_stdio(false); really does help.
Then I noticed that compressing the data file with gzip, and piping it
with zcat to this simple program speeds up things *lot* (it's several
time faster). So I suppose that before compression was applied, the
speed of reading the uncompressed file was limited by the hard drive.

Now this is the time it takes to decompress the file (tt2.gz), and
throw away the result:
>timethis "zcat tt2.gz NUL"
TimeThis : Command Line : zcat tt2.gz NUL
TimeThis : Start Time : Mon Jan 21 19:10:11 2008

TimeThis : Command Line : zcat tt2.gz NUL
TimeThis : Start Time : Mon Jan 21 19:10:11 2008
TimeThis : End Time : Mon Jan 21 19:10:26 2008
TimeThis : Elapsed Time : 00:00:14.750

This is filtering the decompressed data through the filt program from
above, and throw away the result:
>timethis "zcat tt2.gz |filt NUL"
TimeThis : Command Line : zcat tt2.gz |filt NUL
TimeThis : Start Time : Mon Jan 21 18:51:16 2008

TimeThis : Command Line : zcat tt2.gz |filt NUL
TimeThis : Start Time : Mon Jan 21 18:51:16 2008
TimeThis : End Time : Mon Jan 21 18:51:53 2008
TimeThis : Elapsed Time : 00:00:37.031

This is more than twice as slow. Could some knowledgeable people give
some hints on why is simply reading the data line-by-line and
outputting every tenth line more than twice as slow as decompressing
it?

Is it because the memory allocations (happening in string)? Are the
limiting factor the C++ I/O routines? Can this be sped up?

The compression ratio of the data is about 1:10, so zcat is reading
approx. the same amount of data that filt is outputting.

Any insights will be most welcome!

Szabolcs

(P.S. I'm on WinXP, if this matters. The program was compiled with
mingw gcc 4.2.1 with the -O3 option.)
Jan 21 '08 #1
3 1599
I am no C++ expert.
But the most obvious optimization coming to mind is right there in the
specs.
You are only printing one line every ten.
Why are you storing the lines you are not printing?
You could just scan the file (using buffered IO) for line endings, and
store only the tenth line.

Jan 21 '08 #2
SzH wrote:
I need to read very large text files and do some simple processing on
them. I'm trying to make this as fast as possible. Before spending a
lot of time with it and going through a series of futile attempts to
optimize this, I thought I'd post here and ask where to start.
If you want to optimise text file processing, consider platform specific
file mapping facilities over iostreams.

--
Ian Collins.
Jan 21 '08 #3
SzH wrote:
How can this be made faster?
You might want to try using the C I/O functions. They may in some
cases be significantly faster.

(And no, I don't have good suggestions about how to easily print each
10th line using C I/O functions. It's complicated.)
Jan 22 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: JP | last post by:
My company is considering acquisition of a packaged application that's based on My SQL. This would be our first use of MySQL and the CEO is worried about using a "no-name" database (he's a...
0
by: d | last post by:
Hi, I have setup a GigE LAN with 1 mysql server and another having the mysql client. the OS is slackware 10.1. the mysql is compiled from source and is ver 4.1.9. How do I measure throughput...
0
by: Stephan Steiner | last post by:
Hi The project I'm currently working on involves sending large UDP broadcasts. As the .NET framework already provides an easy facility for sending and receiving UDP packets I thought it was a...
2
by: Dave Griffin | last post by:
We're developing an client/server application where the client exports well known services using remoting (using the TCP default formatter) and the clients (usually there is only 1) attaches to...
1
by: Macca | last post by:
Hi, I have an application that uses a queue. I have one thread that populates the queue and a second that takes items off of the queue and processes them At the moment all i have is these 2...
4
by: Rahul B | last post by:
Hi, I was getting the error: sqlcode: -911 sqlstate: 40001 , which is "The maximum number of lock requests has been reached for the database." So i increased the locklist size to 200 from the...
2
by: Spam Catcher | last post by:
Hi all, I'm building a multi-tier web application that is primarily driven by a web service back end. Are there any configuration settings I should know about to increase the performance of...
1
by: elsa | last post by:
hi everyone, i have a question that i cant answer..the qustion has 4 parts...i answered the first 2 parts but i cant answer the other two..here it is: Assume that we have a link with bandwidth...
0
by: pavithrah | last post by:
how to calculate throughput for wireless networking?
3
by: Kaheru | last post by:
Hi, my IT knowledge not that strong. Hope you guys dun mind I asking a stupid question. I recently been assign with an assignment to performance test a FTP server. I gathered the data using a test...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.