473,699 Members | 2,254 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

selecting dictionaries to maximize counts

I have a list of dictionaries. Each dictionary holds counts of various
'words', e.g.:

py> countdicts = [
.... dict(a=9, b=9, c=9),
.... dict(a=8, b=7),
.... dict(a=4, b=5, c=12)]

I need to select dicts with the constraint that the number of each
'word' totalled over all selected dicts doesn't exceed a given
MAX_VALUE. Right now, I do this by:

py> def select(dicts, n):
.... result = []
.... counts = {}
.... def doadd(d):
.... for label, count in d.iteritems():
.... if counts.get(labe l, 0) + count > n:
.... return False
.... return True
.... for d in dicts:
.... if doadd(d):
.... result.append(d )
.... for label, count in d.iteritems():
.... counts[label] = counts.get(labe l, 0) + count
.... return result
....

Basically, I use a greedy approach -- adding a dict each time if I can.
This leads to some suboptimal solutions given that, while the total
counts must not exceed MAX_VALUE, I'd like them to be as close to
MAX_VALUE as possible. An example of data on which my select function
produces such a suboptimal solution:

py> countdicts = [
.... dict(a=9, b=9, c=9),
.... dict(a=8, b=7),
.... dict(a=4, b=5, c=12)]
py> select(countdic ts, 12)
[{'a': 9, 'c': 9, 'b': 9}]

The better solution would have been to select the second two dicts --
then all 'words' would have count 12.

I don't need an optimal solution; in fact, the solution I have right now
is tolerable. But if anyone knows a simple way to improve my
performance, I'd appreciate it. Thanks!

STeVe

P.S. No, I'm not just making these problems up! I'm sick, but not that
sick. ;) It has to do with resampling a set of data... If you're
really interested, I can provide the full details, but they'll only make
the problem even _more_ complicated. =)
Jul 18 '05 #1
5 1751

Steven Bethard wrote:
I have a list of dictionaries. Each dictionary holds counts of various 'words', e.g.: Basically, I use a greedy approach -- adding a dict each time if I can. This leads to some suboptimal solutions given that, while the total counts must not exceed MAX_VALUE, I'd like them to be as close to
MAX_VALUE as possible. An example of data on which my select function produces such a suboptimal solution:

py> countdicts = [
... dict(a=9, b=9, c=9),
... dict(a=8, b=7),
... dict(a=4, b=5, c=12)]
py> select(countdic ts, 12)
[{'a': 9, 'c': 9, 'b': 9}]

The better solution would have been to select the second two dicts -- then all 'words' would have count 12.

I don't need an optimal solution; in fact, the solution I have right now is tolerable. But if anyone knows a simple way to improve my
performance, I'd appreciate it. Thanks!


This is more than a bit OT but you've sucked me in :-)

You should specify a function that assigns a score to valid solutions;
it's not apparent from the wording above and the example exactly what
that function might be. Thinking through the alternatives might help
you solve the problem yourself :-)

Jul 18 '05 #2
John Machin wrote:
Steven Bethard wrote:
I have a list of dictionaries. Each dictionary holds counts of
various 'words'...
...
Basically, I use a greedy approach -- adding a dict each time if I
can.
...
This leads to some suboptimal solutions given that, while the total
counts must not exceed MAX_VALUE, I'd like them to be as close to
MAX_VALUE as possible. An example of data on which my select
function produces such a suboptimal solution:

py> countdicts = [
... dict(a=9, b=9, c=9),
... dict(a=8, b=7),
... dict(a=4, b=5, c=12)]
py> select(countdic ts, 12)
[{'a': 9, 'c': 9, 'b': 9}]

The better solution would have been to select the second two dicts --
then all 'words' would have count 12.

I don't need an optimal solution; in fact, the solution I have right
now is tolerable. But if anyone knows a simple way to improve my
performance, I'd appreciate it. Thanks!


You should specify a function that assigns a score to valid solutions;
it's not apparent from the wording above and the example exactly what
that function might be. Thinking through the alternatives might help
you solve the problem yourself :-)


Yeah, I guess that's a good idea. Here's a scoring function that should
give the right range of scores (where 1.0 is perfect and 0.0 is failure):

py> from __future__ import division
py> def score(origdicts , selecteddicts, n):
.... origwords = set(word for d in origdicts for word in d)
.... counts = {}
.... for d in selecteddicts:
.... for word, count in d.iteritems():
.... counts[word] = counts.get(word , 0) + count
.... if counts[word] > n:
.... return 0.0
.... return sum(counts.iter values())/len(origwords)/n
....
py> countdicts = [
.... dict(a=9, b=9, c=9),
.... dict(a=8, b=7),
.... dict(a=4, b=5, c=12)]
py> score(countdict s, countdicts, 12)
0.0
py> score(countdict s, select(countdic ts, 12), 12)
0.75
py> score(countdict s, [dict(a=8, b=7), dict(a=4, b=5, c=12)], 12)
1.0

It's pretty easy to build a configuration that can't be solved with a
perfect score:

py> def subsets(lst):
.... if not lst:
.... yield []
.... else:
.... first = lst[:1]
.... for subset in subsets(lst[1:]):
.... yield subset
.... yield first + subset
....
py> countdicts = [
.... dict(a=1, b=2, c=3),
.... dict(a=4, b=5, c=6),
.... dict(a=7, b=8, c=9)]
py> for dicts in subsets(countdi cts):
.... print score(countdict s, dicts, 9), dicts
....
0.0 []
0.222222222222 [{'a': 1, 'c': 3, 'b': 2}]
0.555555555556 [{'a': 4, 'c': 6, 'b': 5}]
0.777777777778 [{'a': 1, 'c': 3, 'b': 2}, {'a': 4, 'c': 6, 'b': 5}]
0.888888888889 [{'a': 7, 'c': 9, 'b': 8}]
0.0 [{'a': 1, 'c': 3, 'b': 2}, {'a': 7, 'c': 9, 'b': 8}]
0.0 [{'a': 4, 'c': 6, 'b': 5}, {'a': 7, 'c': 9, 'b': 8}]
0.0 [{'a': 1, 'c': 3, 'b': 2}, {'a': 4, 'c': 6, 'b': 5}, {'a': 7, 'c':
9, 'b': 8}]

Hmmm... I can't do an exhaustive search like this one here -- I have
~8000 dicts, so 2**N is definitely too large of a space to search. Now
that I have a scoring routine maybe I could do an A* search though...

Thanks for the input!

STeVe
Jul 18 '05 #3
Steven Bethard wrote:
I have a list of dictionaries. Each dictionary holds counts of various
'words', e.g.:

py> countdicts = [
... dict(a=9, b=9, c=9),
... dict(a=8, b=7),
... dict(a=4, b=5, c=12)]

I need to select dicts with the constraint that the number of each
'word' totalled over all selected dicts doesn't exceed a given
MAX_VALUE. Right now, I do this by:


Not that you can't still improve performance of course, but this is an
NP-complete problem if you didn't know, so don't bang your head too hard...

--
Brian Beck
Adventurer of the First Order
Jul 18 '05 #4
Brian Beck wrote:
Steven Bethard wrote:
I have a list of dictionaries. Each dictionary holds counts of
various 'words', e.g.:

py> countdicts = [
... dict(a=9, b=9, c=9),
... dict(a=8, b=7),
... dict(a=4, b=5, c=12)]

I need to select dicts with the constraint that the number of each
'word' totalled over all selected dicts doesn't exceed a given
MAX_VALUE. Right now, I do this by:


Not that you can't still improve performance of course, but this is an
NP-complete problem if you didn't know, so don't bang your head too hard...


Yeah, it seemed a lot like one of those box-packing type problems, so if
it's NP-complete, it wouldn't surprise me. That's one of the reasons I
mentioned trying A* in one of my other responses. However, in
retrospect, I don't think I can apply A* because I don't really have a
"goal state". I could call a selection where all "words" had MAX_VALUE
counts a "goal state", but I'm not guaranteed that such a state always
exists (and if it doesn't A* just degenerates into an exhaustive search).

Anyway, do you know what name this problem is usually discussed under?
If I knew what to google for, I could probably find at least a few
simple heuristics to try...

Thanks again,

STeVe
Jul 18 '05 #5
Steven Bethard wrote:
Anyway, do you know what name this problem is usually discussed under?
If I knew what to google for, I could probably find at least a few
simple heuristics to try...


I think the closest thing would be the 'knapsack problem' or the 'subset
sum problem.'

http://en.wikipedia.org/wiki/Subset_sum_problem
http://en.wikipedia.org/wiki/Knapsack_problem

--
Brian Beck
Adventurer of the First Order
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
5468
by: Colleyville Alan | last post by:
I have an app that uses Access to grab various PowerPoint slides using the followhyperlink command. I have set the PPT window to run in a minimized state: FollowHyperlink link Set oPres = oPPT.ActivePresentation oPPT.WindowState = ppWindowMinimized Then I do the copy and paste part: With oPres
4
6842
by: lauren quantrell | last post by:
I have an Access 2K popup form to which I have added buttons to run DoCmd.Maximize The problem is the form opens full height on the screen, with the bottom of the form hidden under the Windows taskbar. Is there a way to maximize the form so that a portion of it is not hidden under the taskbar? thanks, lq
1
2639
by: Brandon | last post by:
Hello there. I'm currently working on a moderately complex Visual C# windows application that I have run into a bit of a problem on. To start things off, the application has normally been run under Windows XP with the theme mode set to Windows Classic (disabling the newer XP theme) at a resolution of 1280 x 1024 on a dual monitor system. The application has two sizes; one is roughly 800 x 600, and the second is full maximize mode. It...
210
10469
by: Christoph Zwerschke | last post by:
This is probably a FAQ, but I dare to ask it nevertheless since I haven't found a satisfying answer yet: Why isn't there an "ordered dictionary" class at least in the standard list? Time and again I am missing that feature. Maybe there is something wrong with my programming style, but I rather think it is generally useful. I fully agree with the following posting where somebody complains why so very basic and useful things are not part...
5
3200
by: Mrozu | last post by:
Hi When I maximize a form in VB.Net 2003 the bottom of the form gets hidden by the start bar (so my status bar is invisible). How can I get my app to maximize to the usable screen area above the start bar? Thx Mrozu
3
4458
by: =?Utf-8?B?Qm9iQWNoZ2lsbA==?= | last post by:
I give my user a button to change the properties of the size property of maximize screen size. How can I programmatically cause the maximize screen function to happen after changing the property so the new max screen property will be realized? Right now I have to tell the user to unmaximize the screen and remaximize the screen to have the property take effect. :o( Thanks! Bob
2
2109
by: Gabriel Genellina | last post by:
En Wed, 30 Jul 2008 21:29:39 -0300, <python@bdurham.comescribi�: You could use a different hash algorithm yielding a smaller value (crc32, by example, fits on an integer). At the expense of having more collisions, and more processing time to check those possible duplicates. -- Gabriel Genellina
9
1260
by: Brandon | last post by:
Hi all, I am not altogether experienced in Python, but I haven't been able to find a good example of the syntax that I'm looking for in any tutorial that I've seen. Hope somebody can point me in the right direction. This should be pretty simple: I have two dictionaries, foo and bar. I am certain that all keys in bar belong to foo as well, but I also know that not all keys in foo exist in bar. All the keys in both foo and bar are...
6
1105
by: Mike P | last post by:
Hi All i have a CSV file that i'm reading in and each line has the look of the below {None: } {None: } {None: } i want to make a dictionary of items in the form {576460847178667334:1, 576460847178632334:8, ..... } for all rows in
0
8685
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8613
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9172
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8880
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6532
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4374
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4626
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2344
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2008
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.