473,473 Members | 2,039 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

regarding threading

hi,
im developing a code which requires searching a large
database(bioological) for certain patterns.the size of
the file is 3.5GB . the search pattern is a ten letter
string.the database consists of paragraphs.
the code ive developed searches the data
paragraphwise.
(using xreadlines).
but this takes an awful amt of time.(abt 7 mins)
is there anyway to speed this up.
is use of threading feasible and what code do i
thread( since all i do is process the database).there
are no other concurrent tasks. so do i divide the
database into parts and multithread the searching on
these parts concurrently. is this feasible. or shud i
be using some kind of multiprocessing running the
parts(files) as diff processes.
please help.
thanx

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

Jul 18 '05 #1
6 2226
akash shetty wrote:
but this takes an awful amt of time.(abt 7 mins)
is there anyway to speed this up.
is use of threading feasible and what code do i
thread( since all i do is process the database).there
are no other concurrent tasks. so do i divide the
database into parts and multithread the searching on
these parts concurrently. is this feasible. or shud i
be using some kind of multiprocessing running the
parts(files) as diff processes.


Multiple threads/processes won't buy you anything unless you have a
multiprocessor-machine. In fact, they'll slow down things, as context
switches (which are considerably slower between processes than between
threads) take also their time.

Threads only buy you performance on single processor-machines if you have to
deal with asynchronus events like network packets or userinteraction.

For speeding up your search - if you search brute-force, you could try to go
for something like a shift-and algorithm.

And it might help to use C and memory-map parts of the file - but I have to
admit that I have no expirience in that field.

Diez
Jul 18 '05 #2
akash shetty:
the file is 3.5GB . the search pattern is a ten letter
string.the database consists of paragraphs.
the code ive developed searches the data
paragraphwise.
(using xreadlines).
but this takes an awful amt of time.(abt 7 mins)


How are you doing the search? character by character,
string.find, or regular expressions? What's a "paragraph"?
Might memory mapping the file speed things up?

If you don't have a multiple processor machine,
using threads won't make a difference. How many
processors do you have on a machine?

Andrew
da***@dalkescientific.com
Jul 18 '05 #3
Actually, with Python even on a dual-processor machine,
multi-threading will get you NO speed increase. This because even
though you have multiple threads, only ONE of them is running at a
time (whichever one has the Global Interpreter Lock, or GIL). Python
switches between threads every so often (100 byte codes is the default
if I remember correctly, but it can be changed).

The exception is if you write a C extension module... you can
explicitly release the GIL and reaquire it before returning to Python.
That allows another Python thread to run at the same time as your C
module.

Some Python extension modules implement this (I've been working with
Fredrik Lundh to get this into PIL), but most don't... it's a personal
gripe of mine, but I understand the necessity for the time being. The
GIL makes Python pretty "thread safe" even without locks on shared
objects, but in my opinion that should be up to the programmer to deal
with or die with by themselves.

Hopefully some day we'll get to a Python version that can internally
handle threads properly.

Kevin Cazabon.

"Diez B. Roggisch" <no**********@web.de> wrote in message news:<bm************@ID-111250.news.uni-berlin.de>...
akash shetty wrote:
but this takes an awful amt of time.(abt 7 mins)
is there anyway to speed this up.
is use of threading feasible and what code do i
thread( since all i do is process the database).there
are no other concurrent tasks. so do i divide the
database into parts and multithread the searching on
these parts concurrently. is this feasible. or shud i
be using some kind of multiprocessing running the
parts(files) as diff processes.


Multiple threads/processes won't buy you anything unless you have a
multiprocessor-machine. In fact, they'll slow down things, as context
switches (which are considerably slower between processes than between
threads) take also their time.

Threads only buy you performance on single processor-machines if you have to
deal with asynchronus events like network packets or userinteraction.

For speeding up your search - if you search brute-force, you could try to go
for something like a shift-and algorithm.

And it might help to use C and memory-map parts of the file - but I have to
admit that I have no expirience in that field.

Diez

Jul 18 '05 #4
Andrew Dalke:
If you don't have a multiple processor machine,
using threads won't make a difference. How many
processors do you have on a machine?


There may be some advantage in overlapping computation with I/O although
it would depend on the relative costs of the search and I/O. With a 3.5
Gigabyte file the problem may be I/O bound. In which case splitting the file
onto multiple disks and using 1 thread for each split may increase
performance.

Neil
Jul 18 '05 #5
Neil Hodgson:
In which case splitting the file
onto multiple disks and using 1 thread for each split may increase
performance.


But then so would disk striping, or a bigger cache, or .. hmm,
perhaps the data is on a networked filesystem and the slow
performance comes from the network? Hard to know without
more info from the OP.

Andrew
da***@dalkescientific.com
Jul 18 '05 #6
If your program is network bound there might be some
performance gain to be extracted by using threads, taking
into account GIL and all that.

I/O bound ... cannot say, it depends on how many I/O
writes you do per second, the disk cache and whether
you use multiple disks, too many factors.

But in your case, it does not look as if the program is
network bound. So threading may not help here and in fact
might even slow down performance owing to GIL.

The best option for you might be to speed up your search.
If you are searching for patterns use regexps and not string
search or character search, since that slows up matters
considerably. If you are using just sub-string search *dont*
use regexps as I found out that the simple string search
is faster in most cases.

Otherwise, think about indexing your data using LuPy or
some other indexer and searching the index. You can write
a small funciton that will rebuild this index when your
actual data changes. Otherwise, i.e in most normal searches
, use this index as a cache and search there.

Index searching is a factor of times faster than searching using
strings or regexps and a lot of research has gone into that.

HTH.

-Anand

"Andrew Dalke" <ad****@mindspring.com> wrote in message news:<3e*****************@newsread4.news.pas.earth link.net>...
Neil Hodgson:
In which case splitting the file
onto multiple disks and using 1 thread for each split may increase
performance.


But then so would disk striping, or a bigger cache, or .. hmm,
perhaps the data is on a networked filesystem and the slow
performance comes from the network? Hard to know without
more info from the OP.

Andrew
da***@dalkescientific.com

Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Egor Bolonev | last post by:
hi all my program terminates with error i dont know why it tells 'TypeError: run() takes exactly 1 argument (10 given)' =program==================== import os, os.path, threading, sys def...
77
by: Jon Skeet [C# MVP] | last post by:
Please excuse the cross-post - I'm pretty sure I've had interest in the article on all the groups this is posted to. I've finally managed to finish my article on multi-threading - at least for...
3
by: trialproduct2004 | last post by:
Hi all I am having problem at a time of handling threading. I am having application containing thread. In thread procedure i ma using recursive function. This recursive function is adding some...
8
by: Bf | last post by:
I was creating test projects using c# and was surprised that there seems to be only a form based windows applications available. Is it safe to assume that classic window applications that utilize a...
4
by: trialproduct2004 | last post by:
Hi all i am having application in C#. here what i want it to update one datagrid depending on particular value. I want to start minimum of 5 threads at a time and all these threads are updating...
2
by: Chandrakant Shinde | last post by:
Hi there, I want to copy a Image to the clipboard. When i try to do so following exception occurs : "The current thread must set to Single Thread Apartment (STA) mode before OLE calls can be...
4
by: archana | last post by:
Hi all, I am having one confusion regarding invoking web method of web service asychronously through windows applicaiton. What i am doing is i am having one long runing web method whose one...
5
by: archana | last post by:
hi all, Can someone tell me why can't we modify UI element directly from worker thread. Why we need to write delete of call invoke to modify ui element? Thanks in advance.
3
by: archana | last post by:
Hi all, I have one confusion regarding threading in windows service which is developed in c#. What i am doing is on 'onstart' event i am starting one thread. In thread procedure i am...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.