I have a list of dictionaries. Each dictionary holds counts of various
'words', e.g.:
py> countdicts = [
.... dict(a=9, b=9, c=9),
.... dict(a=8, b=7),
.... dict(a=4, b=5, c=12)]
I need to select dicts with the constraint that the number of each
'word' totalled over all selected dicts doesn't exceed a given
MAX_VALUE. Right now, I do this by:
py> def select(dicts, n):
.... result = []
.... counts = {}
.... def doadd(d):
.... for label, count in d.iteritems():
.... if counts.get(labe l, 0) + count > n:
.... return False
.... return True
.... for d in dicts:
.... if doadd(d):
.... result.append(d )
.... for label, count in d.iteritems():
.... counts[label] = counts.get(labe l, 0) + count
.... return result
....
Basically, I use a greedy approach -- adding a dict each time if I can.
This leads to some suboptimal solutions given that, while the total
counts must not exceed MAX_VALUE, I'd like them to be as close to
MAX_VALUE as possible. An example of data on which my select function
produces such a suboptimal solution:
py> countdicts = [
.... dict(a=9, b=9, c=9),
.... dict(a=8, b=7),
.... dict(a=4, b=5, c=12)]
py> select(countdic ts, 12)
[{'a': 9, 'c': 9, 'b': 9}]
The better solution would have been to select the second two dicts --
then all 'words' would have count 12.
I don't need an optimal solution; in fact, the solution I have right now
is tolerable. But if anyone knows a simple way to improve my
performance, I'd appreciate it. Thanks!
STeVe
P.S. No, I'm not just making these problems up! I'm sick, but not that
sick. ;) It has to do with resampling a set of data... If you're
really interested, I can provide the full details, but they'll only make
the problem even _more_ complicated. =) 5 1751
Steven Bethard wrote: I have a list of dictionaries. Each dictionary holds counts of
various 'words', e.g.:
Basically, I use a greedy approach -- adding a dict each time if I
can. This leads to some suboptimal solutions given that, while the total
counts must not exceed MAX_VALUE, I'd like them to be as close to MAX_VALUE as possible. An example of data on which my select
function produces such a suboptimal solution:
py> countdicts = [ ... dict(a=9, b=9, c=9), ... dict(a=8, b=7), ... dict(a=4, b=5, c=12)] py> select(countdic ts, 12) [{'a': 9, 'c': 9, 'b': 9}]
The better solution would have been to select the second two dicts --
then all 'words' would have count 12.
I don't need an optimal solution; in fact, the solution I have right
now is tolerable. But if anyone knows a simple way to improve my performance, I'd appreciate it. Thanks!
This is more than a bit OT but you've sucked me in :-)
You should specify a function that assigns a score to valid solutions;
it's not apparent from the wording above and the example exactly what
that function might be. Thinking through the alternatives might help
you solve the problem yourself :-)
John Machin wrote: Steven Bethard wrote:
I have a list of dictionaries. Each dictionary holds counts of various 'words'... ... Basically, I use a greedy approach -- adding a dict each time if I can. ... This leads to some suboptimal solutions given that, while the total counts must not exceed MAX_VALUE, I'd like them to be as close to MAX_VALUE as possible. An example of data on which my select function produces such a suboptimal solution:
py> countdicts = [ ... dict(a=9, b=9, c=9), ... dict(a=8, b=7), ... dict(a=4, b=5, c=12)] py> select(countdic ts, 12) [{'a': 9, 'c': 9, 'b': 9}]
The better solution would have been to select the second two dicts -- then all 'words' would have count 12.
I don't need an optimal solution; in fact, the solution I have right now is tolerable. But if anyone knows a simple way to improve my performance, I'd appreciate it. Thanks!
You should specify a function that assigns a score to valid solutions; it's not apparent from the wording above and the example exactly what that function might be. Thinking through the alternatives might help you solve the problem yourself :-)
Yeah, I guess that's a good idea. Here's a scoring function that should
give the right range of scores (where 1.0 is perfect and 0.0 is failure):
py> from __future__ import division
py> def score(origdicts , selecteddicts, n):
.... origwords = set(word for d in origdicts for word in d)
.... counts = {}
.... for d in selecteddicts:
.... for word, count in d.iteritems():
.... counts[word] = counts.get(word , 0) + count
.... if counts[word] > n:
.... return 0.0
.... return sum(counts.iter values())/len(origwords)/n
....
py> countdicts = [
.... dict(a=9, b=9, c=9),
.... dict(a=8, b=7),
.... dict(a=4, b=5, c=12)]
py> score(countdict s, countdicts, 12)
0.0
py> score(countdict s, select(countdic ts, 12), 12)
0.75
py> score(countdict s, [dict(a=8, b=7), dict(a=4, b=5, c=12)], 12)
1.0
It's pretty easy to build a configuration that can't be solved with a
perfect score:
py> def subsets(lst):
.... if not lst:
.... yield []
.... else:
.... first = lst[:1]
.... for subset in subsets(lst[1:]):
.... yield subset
.... yield first + subset
....
py> countdicts = [
.... dict(a=1, b=2, c=3),
.... dict(a=4, b=5, c=6),
.... dict(a=7, b=8, c=9)]
py> for dicts in subsets(countdi cts):
.... print score(countdict s, dicts, 9), dicts
....
0.0 []
0.222222222222 [{'a': 1, 'c': 3, 'b': 2}]
0.555555555556 [{'a': 4, 'c': 6, 'b': 5}]
0.777777777778 [{'a': 1, 'c': 3, 'b': 2}, {'a': 4, 'c': 6, 'b': 5}]
0.888888888889 [{'a': 7, 'c': 9, 'b': 8}]
0.0 [{'a': 1, 'c': 3, 'b': 2}, {'a': 7, 'c': 9, 'b': 8}]
0.0 [{'a': 4, 'c': 6, 'b': 5}, {'a': 7, 'c': 9, 'b': 8}]
0.0 [{'a': 1, 'c': 3, 'b': 2}, {'a': 4, 'c': 6, 'b': 5}, {'a': 7, 'c':
9, 'b': 8}]
Hmmm... I can't do an exhaustive search like this one here -- I have
~8000 dicts, so 2**N is definitely too large of a space to search. Now
that I have a scoring routine maybe I could do an A* search though...
Thanks for the input!
STeVe
Steven Bethard wrote: I have a list of dictionaries. Each dictionary holds counts of various 'words', e.g.:
py> countdicts = [ ... dict(a=9, b=9, c=9), ... dict(a=8, b=7), ... dict(a=4, b=5, c=12)]
I need to select dicts with the constraint that the number of each 'word' totalled over all selected dicts doesn't exceed a given MAX_VALUE. Right now, I do this by:
Not that you can't still improve performance of course, but this is an
NP-complete problem if you didn't know, so don't bang your head too hard...
--
Brian Beck
Adventurer of the First Order
Brian Beck wrote: Steven Bethard wrote:
I have a list of dictionaries. Each dictionary holds counts of various 'words', e.g.:
py> countdicts = [ ... dict(a=9, b=9, c=9), ... dict(a=8, b=7), ... dict(a=4, b=5, c=12)]
I need to select dicts with the constraint that the number of each 'word' totalled over all selected dicts doesn't exceed a given MAX_VALUE. Right now, I do this by:
Not that you can't still improve performance of course, but this is an NP-complete problem if you didn't know, so don't bang your head too hard...
Yeah, it seemed a lot like one of those box-packing type problems, so if
it's NP-complete, it wouldn't surprise me. That's one of the reasons I
mentioned trying A* in one of my other responses. However, in
retrospect, I don't think I can apply A* because I don't really have a
"goal state". I could call a selection where all "words" had MAX_VALUE
counts a "goal state", but I'm not guaranteed that such a state always
exists (and if it doesn't A* just degenerates into an exhaustive search).
Anyway, do you know what name this problem is usually discussed under?
If I knew what to google for, I could probably find at least a few
simple heuristics to try...
Thanks again,
STeVe This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Colleyville Alan |
last post by:
I have an app that uses Access to grab various PowerPoint slides using the
followhyperlink command. I have set the PPT window to run in a minimized
state:
FollowHyperlink link
Set oPres = oPPT.ActivePresentation
oPPT.WindowState = ppWindowMinimized
Then I do the copy and paste part:
With oPres
|
by: lauren quantrell |
last post by:
I have an Access 2K popup form to which I have added buttons to run
DoCmd.Maximize
The problem is the form opens full height on the screen, with the
bottom of the form hidden under the Windows taskbar.
Is there a way to maximize the form so that a portion of it is not
hidden under the taskbar?
thanks,
lq
|
by: Brandon |
last post by:
Hello there.
I'm currently working on a moderately complex Visual C# windows application
that I have run into a bit of a problem on. To start things off, the
application has normally been run under Windows XP with the theme mode set to
Windows Classic (disabling the newer XP theme) at a resolution of 1280 x 1024
on a dual monitor system. The application has two sizes; one is roughly 800
x 600, and the second is full maximize mode. It...
|
by: Christoph Zwerschke |
last post by:
This is probably a FAQ, but I dare to ask it nevertheless since I
haven't found a satisfying answer yet: Why isn't there an "ordered
dictionary" class at least in the standard list? Time and again I am
missing that feature. Maybe there is something wrong with my programming
style, but I rather think it is generally useful.
I fully agree with the following posting where somebody complains why so
very basic and useful things are not part...
|
by: Mrozu |
last post by:
Hi
When I maximize a form in VB.Net 2003 the bottom of the form gets
hidden
by the start bar (so my status bar is invisible). How can I get my app
to maximize to the usable screen area above the start bar?
Thx Mrozu
| |
by: =?Utf-8?B?Qm9iQWNoZ2lsbA==?= |
last post by:
I give my user a button to change the properties of the size property of
maximize screen size. How can I programmatically cause the maximize screen
function to happen after changing the property so the new max screen property
will be realized? Right now I have to tell the user to unmaximize the screen
and remaximize the screen to have the property take effect. :o(
Thanks!
Bob
|
by: Gabriel Genellina |
last post by:
En Wed, 30 Jul 2008 21:29:39 -0300, <python@bdurham.comescribi�:
You could use a different hash algorithm yielding a smaller value (crc32,
by example, fits on an integer). At the expense of having more collisions,
and more processing time to check those possible duplicates.
--
Gabriel Genellina
|
by: Brandon |
last post by:
Hi all,
I am not altogether experienced in Python, but I haven't been able to
find a good example of the syntax that I'm looking for in any tutorial
that I've seen. Hope somebody can point me in the right direction.
This should be pretty simple: I have two dictionaries, foo and bar.
I am certain that all keys in bar belong to foo as well, but I also
know that not all keys in foo exist in bar. All the keys in both foo
and bar are...
|
by: Mike P |
last post by:
Hi All
i have a CSV file that i'm reading in and each line has the look of
the below
{None: }
{None: }
{None: }
i want to make a dictionary of items in the form
{576460847178667334:1, 576460847178632334:8, ..... } for all rows in
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |