473,657 Members | 2,609 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

collect data using threads

A class Collector, it spawns several threads to read from serial port.
Collector.get_d ata() will get all the data they have read since last
call. Who can tell me whether my implementation correct?

class Collector(objec t):
def __init__(self):
self.data = []
spawn_work_bees (callback=self. on_received)

def on_received(sel f, a_piece_of_data ):
"""This callback is executed in work bee threads!"""
self.data.appen d(a_piece_of_da ta)

def get_data(self):
x = self.data
self.data = []
return x

I am not very sure about the get_data() method. Will it cause data lose
if there is a thread is appending data to self.data at the same time?

Is there a more pythonic/standard recipe to collect thread data?

--
Qiangning Hong

_______________ _______________ _______________ __
< Those who can, do; those who can't, simulate. >
-----------------------------------------------
\ ___-------___
\ _-~~ ~~-_
\ _-~ /~-_
/^\__/^\ /~ \ / \
/| O|| O| / \______________ _/ \
| |___||__| / / \ \
| \ / / \ \
| (_______) /______/ \_________ \
| / / \ / \
\ \^\\ \ / \ /
\ || \______________/ _-_ //\__//
\ ||------_-~~-_ ------------- \ --/~ ~\ || __/
~-----||====/~ |============== ====| |/~~~~~
(_(__/ ./ / \_\ \.
(_(___/ \_____)_)
Jul 19 '05 #1
11 1901
Qiangning Hong wrote:
A class Collector, it spawns several threads to read from serial port.
Collector.get_d ata() will get all the data they have read since last
call. Who can tell me whether my implementation correct? [snip sample with a list] I am not very sure about the get_data() method. Will it cause data lose
if there is a thread is appending data to self.data at the same time?


That will not work, and you will get data loss, as Jeremy points out.

Normally Python lists are safe, but your key problem (in this code) is
that you are rebinding self.data to a new list! If another thread calls
on_received() just after the line "x = self.data" executes, then the new
data will never be seen.

One option that would work safely** is to change get_data() to look like
this:

def get_data(self):
count = len(self.data)
result = self.data[:count]
del self.data[count:]
return result

This does what yours was trying to do, but safely. Not that it doesn't
reassign self.data, but rather uses a single operation (del) to remove
all the "preserved" elements at once. It's possible that after the
first or second line a call to on_received() will add data, but it
simply won't be seen until the next call to get_data(), rather than
being lost.

** I'm showing you this to help you understand why your own approach was
wrong, not to give you code that you should use. The key problem with
even my approach is that it *assumes things about the implementation* .
Specifically, there are no guarantees in Python the Language (as opposed
to CPython, the implementation) about the thread-safety of working with
lists like this. In fact, in Jython (and possibly other Python
implementations ) this would definitely have problems. Unless you are
certain your code will run only under CPython, and you're willing to put
comments in the code about potential thread safety issues, you should
probably just follow Jeremy's advice and use Queue. As a side benefit,
Queues are much easier to work with!

-Peter
Jul 19 '05 #2
Peter Hansen wrote:
Qiangning Hong wrote:
A class Collector, it spawns several threads to read from serial port.
Collector.get_d ata() will get all the data they have read since last
call. Who can tell me whether my implementation correct?


[snip sample with a list]
I am not very sure about the get_data() method. Will it cause data lose
if there is a thread is appending data to self.data at the same time?

That will not work, and you will get data loss, as Jeremy points out.

Normally Python lists are safe, but your key problem (in this code) is
that you are rebinding self.data to a new list! If another thread calls
on_received() just after the line "x = self.data" executes, then the new
data will never be seen.


Can you explain why not? self.data is still bound to the same list as x. At least if the execution sequence is
x = self.data
self.data.appen d(a_piece_of_da ta)
self.data = []

ISTM it should work.

I'm not arguing in favor of the original code, I'm just trying to understand your specific failure mode.

Thanks,
Kent
Jul 19 '05 #3
Previously, on Jun 14, Jeremy Jones said:

# Kent Johnson wrote:
#
# > Peter Hansen wrote:
# >
# > > Qiangning Hong wrote:
# > >
# > >
# > > > A class Collector, it spawns several threads to read from serial port.
# > > > Collector.get_d ata() will get all the data they have read since last
# > > > call. Who can tell me whether my implementation correct?
# > > >
# > > [snip sample with a list]
# > >
# > >
# > > > I am not very sure about the get_data() method. Will it cause data lose
# > > > if there is a thread is appending data to self.data at the same time?
# > > >
# > > That will not work, and you will get data loss, as Jeremy points out.
# > >
# > > Normally Python lists are safe, but your key problem (in this code) is
# > > that you are rebinding self.data to a new list! If another thread calls
# > > on_received() just after the line "x = self.data" executes, then the new
# > > data will never be seen.
# > >
# >
# > Can you explain why not? self.data is still bound to the same list as x. At
# > least if the execution sequence is x = self.data
# > self.data.appen d(a_piece_of_da ta)
# > self.data = []
# >
# > ISTM it should work.
# >
# > I'm not arguing in favor of the original code, I'm just trying to understand
# > your specific failure mode.
# >
# > Thanks,
# > Kent
# >
# Here's the original code:
#
# class Collector(objec t):
# def __init__(self):
# self.data = []
# spawn_work_bees (callback=self. on_received)
#
# def on_received(sel f, a_piece_of_data ):
# """This callback is executed in work bee threads!"""
# self.data.appen d(a_piece_of_da ta)
#
# def get_data(self):
# x = self.data
# self.data = []
# return x
#
# The more I look at this, the more I'm not sure whether data loss will occur.
# For me, that's good enough reason to rewrite this code. I'd rather be clear
# and certain than clever anyday.
# So, let's say you a thread T1 which starts in ``get_data()`` and makes it as
# far as ``x = self.data``. Then another thread T2 comes along in
# ``on_received() `` and gets as far as ``self.data.app end(a_piece_of_ data)``.
# ``x`` in T1's get_data()`` (as you pointed out) is still pointing to the list
# that T2 just appended to and T1 will return that list. But what happens if
# you get multiple guys in ``get_data()`` and multiple guys in
# ``on_received() ``? I can't prove it, but it seems like you're going to have
# an uncertain outcome. If you're just dealing with 2 threads, I can't see how
# that would be unsafe. Maybe someone could come up with a use case that would
# disprove that. But if you've got, say, 4 threads, 2 in each method....that' s
# gonna get messy.
# And, honestly, I'm trying *really* hard to come up with a scenario that would
# lose data and I can't. Maybe someone like Peter or Aahz or some little 13
# year old in Topeka who's smarter than me can come up with something. But I do
# know this - the more I think about this as to whether this is unsafe or not is
# making my head hurt. If you have a piece of code that you have to spend that
# much time on trying to figure out if it is threadsafe or not, why would you
# leave it as is? Maybe the rest of you are more confident in your thinking and
# programming skills than I am, but I would quickly slap a Queue in there. If
# for nothing else than to rest from simulating in my head 1, 2, 3, 5, 10
# threads in the ``get_data()`` method while various threads are in the
# ``on_received() `` method. Aaaagghhh.....n eed....motrin.. ....
#
#
# Jeremy Jones
#

I may be wrong here, but shouldn't you just use a stack, or in other
words, use the list as a stack and just pop the data off the top. I
believe there is a method pop() already supplied for you. Since
you wouldn't require an self.data = [] this should allow you to safely
remove the data you've already seen without accidentally removing data
that may have been added in the mean time.

---
James Tanis
jt****@pycoder. org
http://pycoder.org
Jul 19 '05 #4
James Tanis wrote:
I may be wrong here, but shouldn't you just use a stack, or in other
words, use the list as a stack and just pop the data off the top. I
believe there is a method pop() already supplied for you.


Just a note on terminology here. I believe the word "stack" generally
refers to a LIFO (last-in first-out) structure, not what the OP needs
which is a FIFO (first-in first-out).

Assuming you would refer to the .append() operation as "putting data on
the bottom", then to pop off the "top" you would use pop(0), not just
pop().

Normally though, I think one would refer to these as the head and tail
(not top and bottom), and probably call the whole thing a queue, rather
than a stack.

-Peter
Jul 19 '05 #5
Kent Johnson wrote:
Peter Hansen wrote:
That will not work, and you will get data loss, as Jeremy points out.

Can you explain why not? self.data is still bound to the same list as x.
At least if the execution sequence is x = self.data
self.data.appen d(a_piece_of_da ta)
self.data = []


Ah, since the entire list is being returned, you appear to be correct.
Interesting... this means the OP's code is actually appending things to
a list, over and over (presumably), then returning a reference to that
list and rebinding the internal variable to a new list. If another
thread calls on_received() and causes new data to be appended to "the
list" between those two statements, then it will show up in the returned
list (rather magically, at least to my way of looking at it) and will
not in fact be lost.

Good catch Kent. :-)

-Peter
Jul 19 '05 #6
Peter Hansen wrote:
James Tanis wrote:
I may be wrong here, but shouldn't you just use a stack, or in other
words, use the list as a stack and just pop the data off the top. I
believe there is a method pop() already supplied for you.


Just a note on terminology here. I believe the word "stack" generally
refers to a LIFO (last-in first-out) structure, not what the OP needs
which is a FIFO (first-in first-out).


Or, perhaps he doesn't need either... as Kent points out (I should have
read his post before replying above) this isn't what I think James and I
both thought it was but something a little less usual...

-Peter
Jul 19 '05 #7
James Tanis wrote:
# > > > A class Collector, it spawns several threads to read from serial port.
# > > > Collector.get_d ata() will get all the data they have read since last
# > > > call. Who can tell me whether my implementation correct?
# > > >
# Here's the original code:
#
# class Collector(objec t):
# def __init__(self):
# self.data = []
# spawn_work_bees (callback=self. on_received)
#
# def on_received(sel f, a_piece_of_data ):
# """This callback is executed in work bee threads!"""
# self.data.appen d(a_piece_of_da ta)
#
# def get_data(self):
# x = self.data
# self.data = []
# return x
#
I may be wrong here, but shouldn't you just use a stack, or in other
words, use the list as a stack and just pop the data off the top. I
believe there is a method pop() already supplied for you. Since
you wouldn't require an self.data = [] this should allow you to safely
remove the data you've already seen without accidentally removing data
that may have been added in the mean time.


I am the original poster.

I actually had considered Queue and pop() before I wrote the above code.
However, because there is a lot of data to get every time I call
get_data(), I want a more CPU friendly way to avoid the while-loop and
empty checking, and then the above code comes out. But I am not very
sure whether it will cause serious problem or not, so I ask here. If
anyone can prove it is correct, I'll use it in my program, else I'll go
back to the Queue solution.

To Jeremy Jones: I am very sorry to take you too much effort on this
weird code. I should make it clear that there is only *one* thread (the
main thread in my application) calls the get_data() method,
periodically, driven by a timer. And for on_received(), there may be up
to 16 threads accessing it simultaneously.
--
Qiangning Hong

_______________ _______________ _______________ ______________
/ BOFH Excuse #208: \
| |
| Your mail is being routed through Germany ... and they're |
\ censoring us. /
-----------------------------------------------------------
\ . _ .
\ |\_|/__/|
/ / \/ \ \
/__|O||O|__ \
|/_ \_/\_/ _\ |
| | (____) | ||
\/\___/\__/ //
(_/ ||
| ||
| ||\
\ //_/
\______//
__ || __||
(____(____)
Jul 19 '05 #8
On Tuesday 14 June 2005 17:47, Peter Hansen wrote:
Kent Johnson wrote:
Peter Hansen wrote:
That will not work, and you will get data loss, as Jeremy points out.

Can you explain why not? self.data is still bound to the same list as x.
At least if the execution sequence is x = self.data
self.data.appen d(a_piece_of_da ta)
self.data = []


Ah, since the entire list is being returned, you appear to be correct.
Interesting... this means the OP's code is actually appending things to
a list, over and over (presumably), then returning a reference to that
list and rebinding the internal variable to a new list. If another
thread calls on_received() and causes new data to be appended to "the
list" between those two statements, then it will show up in the returned
list (rather magically, at least to my way of looking at it) and will
not in fact be lost.


But it might not "show up" until too late.

The consumer thread that called get_data presumably does something with that
list, such as iterating over its contents. It might only "show up" after that
iteration has finished, when the consumer has discarded its reference to the
shared list.

--
Toby Dickenson
Jul 19 '05 #9
Qiangning Hong wrote:
I actually had considered Queue and pop() before I wrote the above code.
However, because there is a lot of data to get every time I call
get_data(), I want a more CPU friendly way to avoid the while-loop and
empty checking, and then the above code comes out. But I am not very
sure whether it will cause serious problem or not, so I ask here. If
anyone can prove it is correct, I'll use it in my program, else I'll go
back to the Queue solution.


OK, here is a real failure mode. Here is the code and the disassembly:
class Collector(objec t): ... def __init__(self):
... self.data = []
... def on_received(sel f, a_piece_of_data ):
... """This callback is executed in work bee threads!"""
... self.data.appen d(a_piece_of_da ta)
... def get_data(self):
... x = self.data
... self.data = []
... return x
... import dis
dis.dis(Collect or.on_received) 6 0 LOAD_FAST 0 (self)
3 LOAD_ATTR 1 (data)
6 LOAD_ATTR 2 (append)
9 LOAD_FAST 1 (a_piece_of_dat a)
12 CALL_FUNCTION 1
15 POP_TOP
16 LOAD_CONST 1 (None)
19 RETURN_VALUE dis.dis(Collect or.get_data)

8 0 LOAD_FAST 0 (self)
3 LOAD_ATTR 1 (data)
6 STORE_FAST 1 (x)

9 9 BUILD_LIST 0
12 LOAD_FAST 0 (self)
15 STORE_ATTR 1 (data)

10 18 LOAD_FAST 1 (x)
21 RETURN_VALUE

Imagine the thread calling on_received() gets as far as LOAD_ATTR (data), LOAD_ATTR (append) or LOAD_FAST (a_piece_of_dat a), so it has a reference to self.data; then it blocks and the get_data() thread runs. The get_data() thread could call get_data() and *finish processing the returned list* before the on_received() thread runs again and actually appends to the list. The appended value will never be processed.

If you want to avoid the overhead of a Queue.get() for each data element you could just put your own mutex into on_received() and get_data().

Kent
Jul 19 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1501
by: nick_faye | last post by:
Hi, I hope somebody can help me. I am collecting data from different external ms access database on my VB programming. I am using the SQL command as shown below: While strPaths(iCtr) <> "" And iCtr <= MAX_ALLOW_DBASE dbase.Execute "INSERT INTO DailySalesReport " & _ "SELECT table1.storeqs, table1.styleqs, table1.dateqs, table1.idnumberqs, table1.cashqs, table1.salesclerkqs, table1.qs,
16
6726
by: LP | last post by:
Hi, Considering code below. Will it make GC to actually collect. One application creates new instances of a class from 3rd party assembly in a loop (it has to). That class doesn't have .Dispose or any similar method. I want to make sure GC keeps up with the loop. My reasoning if Thread.Sleep(1000) is called; GC will take priority it do its work, right? GC.Collect(); GC.WaitForPendingFinalizers(); System.Threading.Thread.Sleep(1000);
9
8550
by: Frank Rizzo | last post by:
I understand the basic premise: when the object is out of scope or has been set to null (given that there are no funky finalizers), executing GC.Collect will clean up your resources. So I have a basic test. I read a bunch of data into a dataset by using a command and a data adapter object, .Dispose(ing) as I go. The moment the data is in the Dataset, the Mem Usage column in the Task Manager goes up by 50 MB (which is about right). I...
5
4052
by: Mrinal Kamboj | last post by:
Hi , Any pointers when it's absolute necessary to use it . Does it has a blocking effect on the code , as GC per se is undeterministic . what if GC.collect is followed in next line by GC.WaitForPendingFinalizers , will it actually block .
6
2977
by: Senthil | last post by:
Hi All We are having a VB application on SQL. But we need to collect information from persons who will be offline to verify data and insert new data. Generally they will be entering the data in Excel spread sheets which can be uploaded to the database using the application after some validations. But rather than Excel I was looking at Infopath with Access as the database, to create validation rules and collect data offline that can be...
48
5569
by: Ward Bekker | last post by:
Hi, I'm wondering if the GC.Collect method really collects all objects possible objects? Or is this still a "smart" process sometimes keeping objects alive even if they can be garbage collected? I need to know because I'm looking for memory leaks in an application. It would be really helpful to be able to determine if an object after manually invoking the GC.Collect is only kept alive because it still
4
1971
by: svgeorge | last post by:
I NEED TO COLLECT FROM THE GRIDVIEW(DATASELECTED) IN TO A TABLE(SelectedPayment) -------------------------------------------------------------------------------- How TO COLLECT THE ROWS CHECKED IN CHECK BOX IN THE DATASELECTED TO ANOTHER GRID VIEW ON CLICLING BUTTON I NEED TO COLLECT FROM THE GRIDVIEW(DATASELECTED) IN TO A TABLE(SelectedPayment) SIMILLAR TO HOTMAIL MODEL.....CHECK THE MAILS AND BRING THE CHECKED DATA TO ANOTHER PAGE
3
3635
by: oravm | last post by:
Hi, I re-write a query and used bulk collect to improve the performance of the batch process. The query below has NO compile error but when execute query there is error 'ORA-01403: no data found.' CREATE OR REPLACE PROCEDURE PROCESS_ANGKASA(REF_NO varchar2)is v_cntr_code varchar2(16); v_receipt_code varchar2(3); start_time number; end_time number;
1
3002
by: =?Utf-8?B?SkI=?= | last post by:
Hello As I debug the C# code with a break point and by pressing F11 I eventually get a message stating: ContextSwitchDeadlock was detected Message: The CLR has been unable to transition from COM context 0x17aeb8 to COM context 0x17abd8 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages.
0
8397
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8310
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8732
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
7333
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5632
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4315
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2731
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1957
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1620
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.