473,406 Members | 2,619 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

multiprocessing eats memory

I'm playing with pyprocessing module and found that it eats lot's of
memory. I've made small test case to show it. I pass ~45mb of data to
worker processes and than get it back slightly modified. At any time
in main process there are shouldn't be no more than two copies of data
(one original data and one result). I run it on 8-core server and top
shows me that main process eats ~220 Mb and worker processes eats 90
-150 mb. Isn't it too much?

Small test-case is uploaded to pastebin: http://pastebin.ca/1210523
Sep 25 '08 #1
6 4627
On Sep 25, 8:40*am, "Max Ivanov" <ivanov.ma...@gmail.comwrote:
At any time in main process there are shouldn't be no more than two copies of data
(one original data and one result).
From the looks of it you are storing a lots of references to various
copies of your data via the async set.
Sep 26 '08 #2
On 26 , 04:20, Istvan Albert <istvan.alb...@gmail.comwrote:
On Sep 25, 8:40am, "Max Ivanov" <ivanov.ma...@gmail.comwrote:
At any time in main process there are shouldn't be no more than two copies of data
(one original data and one result).

From the looks of it you are storing a lots of references to various
copies of your data via the async set.
How could I avoid of storing them? I need something to check does it
ready or not and retrieve results if ready. I couldn't see the way to
achieve same result without storing asyncs set.
Sep 26 '08 #3
On Sep 26, 9:52*am, redbaron <ivanov.ma...@gmail.comwrote:
On 26 , 04:20, Istvan Albert <istvan.alb...@gmail.comwrote:
On Sep 25, 8:40am, "Max Ivanov" <ivanov.ma...@gmail.comwrote:
At any time in main process there are shouldn't be no more than two copies of data
(one original data and one result).
From the looks of it you are storing a lots of references to various
copies of your data via the async set.

How could I avoid of storing them? I need something to check does it
ready or not and retrieve results if ready. I couldn't see the way to
achieve same result without storing asyncs set.
You could give each worker process an ID and then have them put the ID
into a queue to signal to the main process when finished.

BTW, your test-case modifies the asyncs set while iterating over it,
which is a bad idea.
Sep 26 '08 #4
On 26 сент, 17:03, MRAB <goo...@mrabarnett.plus.comwrote:
On Sep 26, 9:52*am, redbaron <ivanov.ma...@gmail.comwrote:
On 26 ÓÅÎÔ, 04:20, Istvan Albert <istvan.alb...@gmail.comwrote:
On Sep 25, 8:40šam, "Max Ivanov" <ivanov.ma...@gmail.comwrote:
At any time in main process there are shouldn't be no more than twocopies of data
(one original data and one result).
From the looks of it you are storing a lots of references to various
copies of your data via the async set.
How could I avoid of storing them? I need something to check does it
ready or not and retrieve results if ready. I couldn't see the way to
achieve same result without storing asyncs set.

You could give each worker process an ID and then have them put the ID
into a queue to signal to the main process when finished.
And how could I retrieve result from worker process without async?
>
BTW, your test-case modifies the asyncs set while iterating over it,
which is a bad idea.
My fault, there was list(asyncs) originally.
Sep 26 '08 #5
On Sep 26, 4:52*am, redbaron <ivanov.ma...@gmail.comwrote:
How could I avoid of storing them? I need something to check does it
ready or not and retrieve results if ready. I couldn't see the way to
achieve same result without storing asyncs set.
It all depends on what you are trying to do. The issue that you
originally brought up is that of memory consumption.

When processing data in parallel you will use up as much memory as
many datasets you are processing at any given time. If you need to
reduce memory use then you need to start fewer processes and use some
mechanism to distribute the work on them as they become free. (see
recommendation that uses Queues)
Sep 27 '08 #6
When processing data in parallel you will use up as muchmemoryas
many datasets you are processing at any given time.
Worker processes eats 2-4 times more than I pass to them.

>If you need to
reducememoryuse then you need to start fewer processes and use some
mechanism to distribute the work on them as they become free. (see
recommendation that uses Queues)
I don't understand how could I use Queue here? If worker process
finish computing, it puts its' id into Queue, in main process I
retrieve that id and how could I retrieve result from worker process
then?

Sep 27 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Mike Peretz | last post by:
I am trying to optimize my C# program, but no matter what I try the application keeps eating memory. I verified all the references and even got special software to count references. I made sure all...
9
by: Mike P | last post by:
I know everything about reference counting and making sure you don't have large objects lying around. I have also profiled my app with multiple tools. I know about the fact GC collects memory but...
6
by: sturlamolden | last post by:
I sometimes read python-dev, but never contribute. So I'll post my rant here instead. I completely support adding this module to the standard lib. Get it in as soon as possible, regardless of...
5
by: pigmartian | last post by:
I recently learned (from I response on this newsgroup to an earlier query) of the processing module for working with subprocesses in a similar manner to threading. For what I needed to do, it...
4
by: nhwarriors | last post by:
I am attempting to use the (new in 2.6) multiprocessing package to process 2 items in a large queue of items simultaneously. I'd like to be able to print to the screen the results of each item...
1
by: redbaron | last post by:
I stuck in new multiprocessing module (ex. processing). I dont' understand why queue.get_nowait() never returns data, but always raises Empty, even if it is guaranteed that queue is not empty. ...
9
by: YouCanCallMeAl | last post by:
It seems that the multiprocessing module in 2.6 is broken for *BSD; I've seen issue 3770 regarding this. I'm curious if there are more details on this issue since the posts in 3770 were a bit...
1
by: davy zhang | last post by:
I mean every process attach like thread in wingide like thread or tasklet in wingide :) maybe I asked toooo much:D
1
by: Jeffrey Barish | last post by:
skip@pobox.com wrote: So I thought at first, but then I saw this statement in the documentation: It is possible to run a manager server on one machine and have clients use it from other...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.