473,398 Members | 2,125 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

Multiple threads reading from the same file... is it possible?

Hi there... I have a huge text file that needs to be processed. At the
moment, I'm loading it into memory in small chunks (x amount of lines)
and processing it that way. I'd like the process to be faster, so I'd
like to try creating multiple threads, and having them load different
chunks of the file at the same time and process it asynchronously.

Is it possible to do something like that, and if so, what would be
needed to do so?
WATYF

Sep 20 '07 #1
16 15478
First, you have to devise a way for the multiple threads to know which lines
to process. If you plan to delete the processed lines somehow, you would
have to open the file with read/write lock invoked so that only one thread
can get to it at a time. If you do it this way, then you obviously must
protect your file open with Try/Catch b/c two or more threads can be trying
to open the file concurrently and only one can lock it at one time.

If you are not going to delete from the file, then you still have to devise
a way for each thread to know what to process. You could have a "global"
counter/pointer, locked with a mutex while updating from a thread, that told
you where the next thread was to start. However, if one or more threads
abort for some reason, then the counter/pointer would not be updated to
reflect the unprocessed records.

If you simply want to have multiple threads reading the file concurrently,
keeping up with whether a record has been processed by another thread, they
can do that by opening the file with read/share access options.

HTH
Le*@KnowDotNet.com

Try our latest time saving tool, Visual Class Organizer, free for 30 days.

http://www.knowdotnet.com/articles/V...oductHome.html

"WATYF" wrote:
Hi there... I have a huge text file that needs to be processed. At the
moment, I'm loading it into memory in small chunks (x amount of lines)
and processing it that way. I'd like the process to be faster, so I'd
like to try creating multiple threads, and having them load different
chunks of the file at the same time and process it asynchronously.

Is it possible to do something like that, and if so, what would be
needed to do so?
WATYF

Sep 20 '07 #2
Why not think about reading the file using a single thread and let other
thread(s) process the chunks. This must be easier for you to deal with.

Regards,

Trevor Benedict
MCSD

"WATYF" <WA****@gmail.comwrote in message
news:11**********************@k35g2000prh.googlegr oups.com...
Hi there... I have a huge text file that needs to be processed. At the
moment, I'm loading it into memory in small chunks (x amount of lines)
and processing it that way. I'd like the process to be faster, so I'd
like to try creating multiple threads, and having them load different
chunks of the file at the same time and process it asynchronously.

Is it possible to do something like that, and if so, what would be
needed to do so?
WATYF

Sep 20 '07 #3
I need the reading to be multi-threaded the most. Reading from the
file is what takes the longest. Actually processing the chunks of the
file takes much less time than the loading it into memory.

I may have figured it out, though... although I'm not sure yet if it's
really doing what I expect it to be doing.

I used a combination of TextReaders/TextWriters that were created
using TextReader.Synchronized and also synchronized arraylists which
were created using ArrayList.Synchronized. One of the FileOptions for
the Streams is FileOptions.Asynchronous, so I used that when opening
the files for reading/writing.

WATYF
On Sep 20, 1:50 pm, "Trevor Benedict" <trevorn...@yahoo.comwrote:
Why not think about reading the file using a single thread and let other
thread(s) process the chunks. This must be easier for you to deal with.

Regards,

Trevor Benedict
MCSD
Sep 20 '07 #4
If the huge text file is on one hard drive and if the CPU processing is
minimum (as you have said in a separate post) I don't see how multiple
threads will help. Multiple threads are not going to make the hard drive
spin any faster.

Bob

"WATYF" <WA****@gmail.comwrote in message
news:11**********************@k35g2000prh.googlegr oups.com...
Hi there... I have a huge text file that needs to be processed. At the
moment, I'm loading it into memory in small chunks (x amount of lines)
and processing it that way. I'd like the process to be faster, so I'd
like to try creating multiple threads, and having them load different
chunks of the file at the same time and process it asynchronously.

Is it possible to do something like that, and if so, what would be
needed to do so?
WATYF

Sep 21 '07 #5
WATYF schreef:
I need the reading to be multi-threaded the most. Reading from the
file is what takes the longest. Actually processing the chunks of the
file takes much less time than the loading it into memory.
You're basically saying that the harddisk is the bottleneck. It's
supplying the data as fast as it can. Creating multiple threads reading
that file would only make it slower because you'd be making the head of
the drive jump all over the place to satisfy all the requests. Moving
that head takes time.

The only time you can win is probably the time it takes processing, so
you could hand that off to another thread so the reading thread can keep
reading and doesn't have to stop to process the data.

--
Rinze van Huizen
C-Services Holland b.v
Sep 21 '07 #6

Yes it is possible, you can open a file shared so it should be possible to
read it through multiple apps / threads
However are you sure that file access is your botleneck ?

i had once the task of writing a data conversion program on a 7 + GB file ,
on start it took hours to complete in the end it took only a few minutes to
complete
where submitting the data to the database took the most time .

The one that gave me the highest performance boost was using a stringbuilder
object instead of using a temp string
HTH

Michel

"WATYF" <WA****@gmail.comschreef in bericht
news:11**********************@k35g2000prh.googlegr oups.com...
Hi there... I have a huge text file that needs to be processed. At the
moment, I'm loading it into memory in small chunks (x amount of lines)
and processing it that way. I'd like the process to be faster, so I'd
like to try creating multiple threads, and having them load different
chunks of the file at the same time and process it asynchronously.

Is it possible to do something like that, and if so, what would be
needed to do so?
WATYF

Sep 21 '07 #7
If the huge text file is on one hard drive and if the CPU processing is
minimum (as you have said in a separate post) I don't see how multiple
threads will help. Multiple threads are not going to make the hard drive
spin any faster.

Bob
You know... what you're saying makes perfect sense, but apparently, it
doesn't work that way for whatever reason...maybe someone smarter than
me can explain it. :o) Anyway... I figured out how to do it. I did a
write up on it in case anyone else runs into the same scenario.

http://www.musicalnerdery.com/net-pr...e-threads.html

Using multiple threads on a ~2GB file increased my performance
significantly. The total processing time went from 4.1 minutes to 3.0
minutes.

WATYF

Sep 21 '07 #8
On Sep 20, 12:41 pm, "Michel Posseth [MCP]" <msn...@posseth.com>
wrote:
Yes it is possible, you can open a file shared so it should be possible to
read it through multiple apps / threads

However are you sure that file access is your botleneck ?

i had once the task of writing a data conversion program on a 7 + GB file ,
on start it took hours to complete in the end it took only a few minutes to
complete
where submitting the data to the database took the most time .

The one that gave me the highest performance boost was using a stringbuilder
object instead of using a temp string

HTH

Michel

I am sure that disk I/O was the main issue, but thanks for reminding
me about StringBuilders... they're next on my list to investigate for
ways to speed this thing up.

WATYF

Sep 21 '07 #9
Thanks for sharing your experience.

A while ago there was a performance problem known (to me at least) as
"missing a revolution". This would happen when you read and processed
record 1; then you go to read record 2; but by then record 2 would have
already passed the read head and the program then has to wait almost a whole
revolution of the disk to read the next record. BUT today with hard drives
having a cache I don't see how this would explain what you are seeing. I
hope someone who knows more about hard drive I/O then we do will explain
this to us.

Bob

"WATYF" <WA****@gmail.comwrote in message
news:11*********************@q3g2000prf.googlegrou ps.com...
>If the huge text file is on one hard drive and if the CPU processing is
minimum (as you have said in a separate post) I don't see how multiple
threads will help. Multiple threads are not going to make the hard drive
spin any faster.

Bob

You know... what you're saying makes perfect sense, but apparently, it
doesn't work that way for whatever reason...maybe someone smarter than
me can explain it. :o) Anyway... I figured out how to do it. I did a
write up on it in case anyone else runs into the same scenario.

http://www.musicalnerdery.com/net-pr...e-threads.html

Using multiple threads on a ~2GB file increased my performance
significantly. The total processing time went from 4.1 minutes to 3.0
minutes.

WATYF

Sep 21 '07 #10
This is an intersting read
http://forums.storagereview.net/inde...showtopic=3200

Regards,

Trevor Benedict

"eBob.com" <fa******@totallybogus.comwrote in message
news:Oe**************@TK2MSFTNGP06.phx.gbl...
Thanks for sharing your experience.

A while ago there was a performance problem known (to me at least) as
"missing a revolution". This would happen when you read and processed
record 1; then you go to read record 2; but by then record 2 would have
already passed the read head and the program then has to wait almost a
whole revolution of the disk to read the next record. BUT today with hard
drives having a cache I don't see how this would explain what you are
seeing. I hope someone who knows more about hard drive I/O then we do
will explain this to us.

Bob
Sep 21 '07 #11
Unless you are programming multiple processors, I find it hard to understand
how threads will help you since the critical path of your application seems
to be to completely process the file. I don't think threads run at the same
time, i.e., when one thread is executing, others are waiting for processor
time. It used to be called time sharing on the old IBM mainframes.

--
Dennis in Houston
"WATYF" wrote:
On Sep 20, 12:41 pm, "Michel Posseth [MCP]" <msn...@posseth.com>
wrote:
Yes it is possible, you can open a file shared so it should be possible to
read it through multiple apps / threads

However are you sure that file access is your botleneck ?

i had once the task of writing a data conversion program on a 7 + GB file ,
on start it took hours to complete in the end it took only a few minutes to
complete
where submitting the data to the database took the most time .

The one that gave me the highest performance boost was using a stringbuilder
object instead of using a temp string

HTH

Michel


I am sure that disk I/O was the main issue, but thanks for reminding
me about StringBuilders... they're next on my list to investigate for
ways to speed this thing up.

WATYF

Sep 21 '07 #12
Thanks for sharing your experience.
>
A while ago there was a performance problem known (to me at least) as
"missing a revolution". This would happen when you read and processed
record 1; then you go to read record 2; but by then record 2 would have
already passed the read head and the program then has to wait almost a whole
revolution of the disk to read the next record. BUT today with hard drives
having a cache I don't see how this would explain what you are seeing. I
hope someone who knows more about hard drive I/O then we do will explain
this to us.

Bob

I think I figured it out. I put an updated explanation at the very end
of the article.

http://www.musicalnerdery.com/net-pr...e-threads.html

WATYF

Sep 22 '07 #13
Unless you are programming multiple processors, I find it hard to understand
how threads will help you since the critical path of your application seems
to be to completely process the file. I don't think threads run at the same
time, i.e., when one thread is executing, others are waiting for processor
time. It used to be called time sharing on the old IBM mainframes.

--
Dennis in Houston
Yes, I'm programming for multiple processors. There is logic in the
code that only creates as many threads as there are processors on the
machine. So a dual core would use two threads and a quad core would
use four, etc. You're essentially correct, though... multiple threads
on a single CPU don't do any good in this case (so the code is setup
to only create one thread in that instance).

WATYF

Sep 22 '07 #14
Dennis,
>It used to be called time sharing on the old IBM mainframes.
But those had multiports on the diskaccess or whatever they called that in
those times. IBM has tried that on their PS systems with I thought it was
called multi channel, as far as I know has never a disk builder made a disk
for that channel. It was in my idea one of the failures from the PS system.

Cor

>
--
Dennis in Houston
"WATYF" wrote:
>On Sep 20, 12:41 pm, "Michel Posseth [MCP]" <msn...@posseth.com>
wrote:
Yes it is possible, you can open a file shared so it should be possible
to
read it through multiple apps / threads

However are you sure that file access is your botleneck ?

i had once the task of writing a data conversion program on a 7 + GB
file ,
on start it took hours to complete in the end it took only a few
minutes to
complete
where submitting the data to the database took the most time .

The one that gave me the highest performance boost was using a
stringbuilder
object instead of using a temp string

HTH

Michel


I am sure that disk I/O was the main issue, but thanks for reminding
me about StringBuilders... they're next on my list to investigate for
ways to speed this thing up.

WATYF

Sep 22 '07 #15
Michel,

Rinze has written almost the same, however I can not resist.

I answer forever on this. "Do you want to hear the rumble of your disk?"

:-)

See for the rest my answer to Dennis.

Cor
"Michel Posseth [MCP]" <ms****@posseth.comschreef in bericht
news:e8**************@TK2MSFTNGP06.phx.gbl...
>
Yes it is possible, you can open a file shared so it should be possible to
read it through multiple apps / threads
However are you sure that file access is your botleneck ?

i had once the task of writing a data conversion program on a 7 + GB file
, on start it took hours to complete in the end it took only a few minutes
to complete
where submitting the data to the database took the most time .

The one that gave me the highest performance boost was using a
stringbuilder object instead of using a temp string
HTH

Michel

"WATYF" <WA****@gmail.comschreef in bericht
news:11**********************@k35g2000prh.googlegr oups.com...
>Hi there... I have a huge text file that needs to be processed. At the
moment, I'm loading it into memory in small chunks (x amount of lines)
and processing it that way. I'd like the process to be faster, so I'd
like to try creating multiple threads, and having them load different
chunks of the file at the same time and process it asynchronously.

Is it possible to do something like that, and if so, what would be
needed to do so?
WATYF

Sep 22 '07 #16

"WATYF" <WA****@gmail.comwrote in message
news:11**********************@50g2000hsm.googlegro ups.com...
>Thanks for sharing your experience.

A while ago there was a performance problem known (to me at least) as
"missing a revolution". This would happen when you read and processed
record 1; then you go to read record 2; but by then record 2 would have
already passed the read head and the program then has to wait almost a
whole
revolution of the disk to read the next record. BUT today with hard
drives
having a cache I don't see how this would explain what you are seeing. I
hope someone who knows more about hard drive I/O then we do will explain
this to us.

Bob


I think I figured it out. I put an updated explanation at the very end
of the article.

http://www.musicalnerdery.com/net-pr...e-threads.html

WATYF
Sounds plausible to me. Thanks
Sep 22 '07 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Guyon Morée | last post by:
If I have multiple threads reading from the same file, would that be a problem? if yes, how would I solve it? Let's say I want to take it a step further and start writing to 1 file form...
5
by: Sunny | last post by:
Hi, I have to implement client/server application. The client have to instaniate an remoting object via http and pass some auth info. If the auth is OK, the client should invoke a method (or...
4
by: Tony Liu | last post by:
Hi, how can I create multiple new file handles of a file without having to share to file to the other processes? I have a file that will be accessed by multiple threads in my application, each...
6
by: James Radke | last post by:
Hello, I have a multithreaded windows NT service application (vb.net 2003) that I am working on (my first one), which reads a message queue and creates multiple threads to perform the processing...
2
by: PAzevedo | last post by:
I have this Hashtable of Hashtables, and I'm accessing this object from multiple threads, now the Hashtable object is thread safe for reading, but not for writing, so I lock the object every time I...
35
by: keerthyragavendran | last post by:
hi i'm downloading a single file using multiple threads... how can i specify a particular range of bytes alone from a single large file... for example say if i need only bytes ranging from...
4
by: tdahsu | last post by:
All, I'd appreciate any help. I've got a list of files in a directory, and I'd like to iterate through that list and process each one. Rather than do that serially, I was thinking I should...
2
by: DJ Dharme | last post by:
Hi all, I am writing a multi-threaded application in c++ running on solaris. I have a file which is updated by a single thread by appending data into the file and at same time the other threads...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.