By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,677 Members | 1,277 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,677 IT Pros & Developers. It's quick & easy.

1 file, multiple threads

P: n/a
If I have multiple threads reading from the same file, would that be a
problem?

if yes, how would I solve it?

Let's say I want to take it a step further and start writing to 1 file form
multiple threads, how would I solve that?
thanx,

Guyon
Jul 18 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
In article <41***********************@news.nl.uu.net>,
Guyon Morée <gumuz@NO_looze_SPAM.net> wrote:

If I have multiple threads reading from the same file, would that be a
problem?

if yes, how would I solve it?

Let's say I want to take it a step further and start writing to 1 file form
multiple threads, how would I solve that?


Make a new thread just for dealing with the file and post messages to it
using Queue.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

WiFi is the SCSI of the 21st Century -- there are fundamental technical
reasons for sacrificing a goat. (with no apologies to John Woods)
Jul 18 '05 #2

P: n/a
> If I have multiple threads reading from the same file, would that be a
problem?
As long as you open each file with 'r' or 'rb' access only, it is not
a problem. I believe you can even write to that file from one (but
only one) thread while reading the file in multiple other threads.
This probably isn't the most robust way and may even cause some
conflicts. You may want to check on this though.

Let's say I want to take it a step further and start writing to 1 file form
multiple threads, how would I solve that?


If you use the built-in 'threading' module in Python, you can simple
acquire a mutex (implemented using Semaphore/Lock/RLock) before
writing and release it immediately afterward.

Example...

Your main thread would allocate a file lock:

mutex_writefile = threading.Lock()

Each thread writing to a file can do the following:

mutex_writefile.acquire()
fh = open(fname, 'a')
fh.write(str)
mutex_writefile.release()

It is important that you release your locks or you will have a
deadlock on each thread waiting to write. See the threading module's
documentation for more details.

You could also do the same with the thread module's lock object;
however, in my opinion you gain much more functionality with threading
with little increase in complexity.

-Jay
Jul 18 '05 #3

P: n/a
Jason wrote:
If I have multiple threads reading from the same file, would that be a
problem?


As long as you open each file with 'r' or 'rb' access only, it is not
a problem. I believe you can even write to that file from one (but
only one) thread while reading the file in multiple other threads.


You *can*, but due to buffering issues, it's likely that the reader
threads will not see changes made by the writer thread properly, and may
have issues with separate disk-reads which nominally stop/start at the
same location not actually matching because the underlying disk file has
changed. (Note that disk reads, which go into a buffer, do not
necessarily correlate in any predictable way to calls to the read*()
family of functions.)

In order to ensure consistent access to a mutable (i.e. not read-only)
file from multiple threads, it would be necessary to ensure that only
one thread was accessing the file at a given instant (i.e. use some form
of locking/synchronizing mechanism), and to be careful to flush all
buffers both before and after any file access. If the file can change
at all, then the only time that a given thread can make *any*
assumptions about the state of the file is during a single section in
which that thread has exclusive access to the file.

Much simpler to designate a single file-handler 'server', and have each
thread access the file only through the intermediary of this server
(which could be implemented as a separate thread itself). The server
then manages all of the file buffers, both in and out, and can ensure
that each access happens in a consistent way.

Jeff Shannon
Technician/Programmer
Credit International

Jul 18 '05 #4

P: n/a
Jeff Shannon wrote:
Much simpler to designate a single file-handler 'server', and have each
thread access the file only through the intermediary of this server
(which could be implemented as a separate thread itself). The server
then manages all of the file buffers, both in and out, and can ensure
that each access happens in a consistent way.


Yes - Aahz's somewhat terse suggestion of using the threading module and a Queue
is the best way to achieve this.

Localising everything to do with parsing and formatting the file correctly also
becomes trivial in this case - the clients of the file handler thread can deal
with standard Python objects, leaving the file handler to cope with the file
format issues.

It can also be very fast, if the file is small enough that the whole thing can
be loaded into memory.

Regards,
Nick.
Jul 18 '05 #5

P: n/a
"Guyon Morée" <gumuz@NO_looze_SPAM.net> wrote in message news:<41***********************@news.nl.uu.net>...
If I have multiple threads reading from the same file, would that be a
problem?

if yes, how would I solve it?

Let's say I want to take it a step further and start writing to 1 file form
multiple threads, how would I solve that?
thanx,

Guyon


The sort answer: yes you will.
The long: how it will hurt you, depends on OS, the sequence of
operations
and the kind of operations.

If you read "next line" (or something like this) or read data from
random positions,
using lock/unlock around seek/read, will solve the problem
(and maybe defeat the multithreading gain).

Anyway writes may be not seen by readers.

Remember, that for each file OS has one read position and one write
position.
For writing from multiple threads you also needs lock/unlock guard.

You really would be better off if you use Queues in all threads
Jul 18 '05 #6

P: n/a
Other folks have answered your questions, although some of the answers
may be open to interpretation.

I just want to throw in a warning. Multi-threaded programming can be a
little tricky.
If you don't know what you are doing, or if you are not careful, you can
program
bugs that manifest themselves very rarely. I've seen them that cropped up
only about
once every two months on average in a continuously running program.
Reproducing
the failure can be next to impossible. Finding the bugs can be a nightmare.

You can even write a program that works correctly "by accident".
Later you introduce an apparently innocuous change that introduces a bug,
but
it does not show up in testing. Over the course of days or months you make
more
changes to the program. The bug you introduced way back when finally
manifests
itself. You will assume the problem was caused by the last change you made.
Big trouble.

Look before you leap.
Jul 18 '05 #7

P: n/a
Sorry about the line wraps. I've gone over to the dark side and Outlook
Express. Still haven't quite got the hang of it.

Sorry about the HTML. Ditto.
Jul 18 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.