Re: dict generator question

Simon Mullis wrote:

Hi,

Let's say I have an arbitrary list of minor software versions of an
imaginary software product:

l = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"]

I'd like to create a dict with major_version : count.

(So, in this case:

dict_of_counts = { "1.1" : "1",
"1.2" : "2",
"1.3" : "2" }

[...]
data = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"]

from itertools import groupby

datadict = \
dict((k, len(list(g))) for k,g in groupby(data, lambda s: s[:3]))
print datadict

Sep 18 '08 #1

Subscribe Reply

1677

George Sakkis

On Sep 18, 11:43 am, Gerard flanagan <grflana...@gma il.comwrote:

Simon Mullis wrote:
Hi,

Let's say I have an arbitrary list of minor software versions of an
imaginary software product:

l = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"]

I'd like to create a dict with major_version : count.

(So, in this case:

dict_of_counts = { "1.1" : "1",
"1.2" : "2",
"1.3" : "2" }

[...]
data = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"]

from itertools import groupby

datadict = \
dict((k, len(list(g))) for k,g in groupby(data, lambda s: s[:3]))
print datadict

Note that this works correctly only if the versions are already sorted
by major version.

George

Sep 18 '08 #2

Gerard flanagan

George Sakkis wrote:

On Sep 18, 11:43 am, Gerard flanagan <grflana...@gma il.comwrote:
>Simon Mullis wrote:
>>Hi,
Let's say I have an arbitrary list of minor software versions of an
imaginary software product:
l = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"]
I'd like to create a dict with major_version : count.
(So, in this case:
dict_of_count s = { "1.1" : "1",
"1.2" : "2",
"1.3" : "2" }
[...]
data = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"]

from itertools import groupby

datadict = \
dict((k, len(list(g))) for k,g in groupby(data, lambda s: s[:3]))
print datadict

Note that this works correctly only if the versions are already sorted
by major version.

Yes, I should have mentioned it. Here's a fuller example below. There's
maybe better ways of sorting version numbers, but this is what I do.
data = [ "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.1.1.1", "1.3.14.5",
"1.3.21.6" ]

from itertools import groupby
import re

RXBUILDSORT = re.compile(r'\d +|[a-zA-Z]')

def versionsort(s):
key = []
for part in RXBUILDSORT.fin dall(s.lower()) :
try:
key.append(int( part))
except ValueError:
key.append(ord( part))
return tuple(key)

data.sort(key=v ersionsort)
print data

datadict = \
dict((k, len(list(g))) for k,g in groupby(data, lambda s: s[:3]))
print datadict

Sep 18 '08 #3

bearophileHUGS

Gerard flanagan:

data.sort()
datadict = \
dict((k, len(list(g))) for k,g in groupby(data, lambda s:
'.'.join(s.spli t('.',2)[:2])))

That code may run correctly, but it's quite unreadable, while good
Python programmers value high readability. So the right thing to do is
to split that line into parts, giving meaningful names, and maybe even
add comments.

len(list(g))) looks like a good job for my little leniter() function
(or better just an extension to the semantics of len) that time ago
some people here have judged as useless, while I use it often in both
Python and D ;-)

Bye,
bearophile

Sep 19 '08 #4

MRAB

On Sep 19, 2:01*pm, bearophileH...@ lycos.com wrote:

Gerard flanagan:

data.sort()
datadict = \
dict((k, len(list(g))) for k,g in groupby(data, lambda s:
* * *'.'.join(s.spl it('.',2)[:2])))

That code may run correctly, but it's quite unreadable, while good
Python programmers value high readability. So the right thing to do is
to split that line into parts, giving meaningful names, and maybe even
add comments.

len(list(g))) looks like a good job for my little leniter() function
(or better just an extension to the semantics of len) that time ago
some people here have judged as useless, while I use it often in both
Python and D ;-)

Extending len() to support iterables sounds like a good idea, except
that it could be misleading when:

len(file(path))

returns the number of lines and /not/ the length in bytes as you might
first think! :-)

Anyway, here's another possible implementation using bags (multisets):

def major_version(v ersion_string):
"convert '1.2.3.2' to '1.2'"
return '.'.join(versio n_string.split( '.')[:2])

versions = ["1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"]

bag_of_versions = bag(major_versi on(x) for x in versions)
dict_of_counts = dict(bag_of_ver sions.items())

Here's my implementation of the bag class in Python (sorry about the
length):

class bag(object):
def __init__(self, iterable = None):
self._counts = {}
if isinstance(iter able, dict):
for x, n in iterable.items( ):
if not isinstance(n, int):
raise TypeError()
if n < 0:
raise ValueError()
self._counts[x] = n
elif iterable:
for x in iterable:
try:
self._counts[x] += 1
except KeyError:
self._counts[x] = 1
def __and__(self, other):
new_counts = {}
for x, n in other._counts.i tems():
try:
new_counts[x] = min(self._count s[x], n)
except KeyError:
pass
result = bag()
result._counts = new_counts
return result
def __iand__(self):
new_counts = {}
for x, n in other._counts.i tems():
try:
new_counts[x] = min(self._count s[x], n)
except KeyError:
pass
self._counts = new_counts
def __or__(self, other):
new_counts = self._counts.co py()
for x, n in other._counts.i tems():
try:
new_counts[x] = max(new_counts[x], n)
except KeyError:
new_counts[x] = n
result = bag()
result._counts = new_counts
return result
def __ior__(self):
for x, n in other._counts.i tems():
try:
self._counts[x] = max(self._count s[x], n)
except KeyError:
self._counts[x] = n
def __len__(self):
return sum(self._count s.values())
def __list__(self):
result = []
for x, n in self._counts.it ems():
result.extend([x] * n)
return result
def __repr__(self):
return "bag([%s])" % ", ".join(", ".join([repr(x)] * n) for x,
n in self._counts.it ems())
def __iter__(self):
for x, n in self._counts.it ems():
for i in range(n):
yield x
def keys(self):
return self._counts.ke ys()
def values(self):
return self._counts.va lues()
def items(self):
return self._counts.it ems()
def __add__(self, other):
for x, n in other.items():
self._counts[x] = self._counts.ge t(x, 0) + n
def __contains__(se lf, x):
return x in self._counts
def add(self, x):
try:
self._counts[x] += 1
except KeyError:
self._counts[x] = 1
def __add__(self, other):
new_counts = self._counts.co py()
for x, n in other.items():
try:
new_counts[x] += n
except KeyError:
new_counts[x] = n
result = bag()
result._counts = new_counts
return result
def __sub__(self, other):
new_counts = self._counts.co py()
for x, n in other.items():
try:
new_counts[x] -= n
if new_counts[x] < 1:
del new_counts[x]
except KeyError:
pass
result = bag()
result._counts = new_counts
return result
def __iadd__(self, other):
for x, n in other.items():
try:
self._counts[x] += n
except KeyError:
self._counts[x] = n
def __isub__(self, other):
for x, n in other.items():
try:
self._counts[x] -= n
if self._counts[x] < 1:
del self._counts[x]
except KeyError:
pass
def clear(self):
self._counts = {}
def count(self, x):
return self._counts.ge t(x, 0)

Sep 20 '08 #5

Steven D'Aprano

On Fri, 19 Sep 2008 17:00:56 -0700, MRAB wrote:

Extending len() to support iterables sounds like a good idea, except
that it could be misleading when:

len(file(path))

returns the number of lines and /not/ the length in bytes as you might
first think!

Extending len() to support iterables sounds like a good idea, except that
it's not.

Here are two iterables:
def yes(): # like the Unix yes command
while True:
yield "y"

def rand(total):
"Return random numbers up to a given total."
from random import random
tot = 0.0
while tot < total:
x = random()
yield x
tot += x
What should len(yes()) and len(rand(100)) return?

--
Steven

Sep 20 '08 #6

bearophileHUGS

MRAB:

except that it could be misleading when:
len(file(path))
returns the number of lines and /not/ the length in bytes as you might
first think! :-)

Well, file(...) returns an iterable of lines, so its len is the number
of lines :-)
I think I am able to always remember this fact.

Anyway, here's another possible implementation using bags (multisets):

This function looks safer/faster:

def major_version(v ersion_string):
"convert '1.2.3.2' to '1.2'"
return '.'.join(versio n_string.strip( ).split('.', 2)[:2])

Another version:

import re
patt = re.compile(r"^( \d+\.\d+)")

dict_of_counts = defaultdict(int )
for ver in versions:
dict_of_counts[patt.match(ver) .group(1)] += 1

print dict_of_counts

Bye,
bearophile

Sep 20 '08 #7

Miles

On Fri, Sep 19, 2008 at 9:51 PM, Steven D'Aprano
<st***@remove-this-cybersource.com .auwrote:

Extending len() to support iterables sounds like a good idea, except that
it's not.

Here are two iterables:
def yes(): # like the Unix yes command
while True:
yield "y"

def rand(total):
"Return random numbers up to a given total."
from random import random
tot = 0.0
while tot < total:
x = random()
yield x
tot += x
What should len(yes()) and len(rand(100)) return?

Clearly, len(yes()) would never return, and len(rand(100)) would
return a random integer not less than 101.

-Miles

Sep 22 '08 #8

bearophileHUGS

Steven D'Aprano:

>Extending len() to support iterables sounds like a good idea, except that it's not.<

Python language lately has shifted toward more and more usage of lazy
iterables (see range lazy by default, etc). So they are now quite
common. So extending len() to make it act like leniter() too is a way
to adapt a basic Python construct to the changes of the other parts of
the language.

In languages like Haskell you can count how many items a lazy sequence
has. But those sequences are generally immutable, so they can be
accessed many times, so len(iterable) doesn't exhaust them like in
Python. So in Python it's less useful.
This is a common situation where I can only care of the len of the g
group:
[leniter(g) for h,g in groupby(iterabl e)]

There are other situations where I may be interested only in how many
items there are:
leniter(ifilter (predicate, iterable))
leniter(el for el in iterable if predicate(el))

For my usage I have written a version of the itertools module in D (a
lot of work, but the result is quite useful and flexible, even if I
miss the generator/iterator syntax a lot), and later I have written a
len() able to count the length of lazy iterables too (if the given
variable has a length attribute/property then it returns that value),
and I have found that it's useful often enough (almost as the
string.xsplit() ). But in Python there is less need for a len() that
counts lazy iterables too because you can use the following syntax
that isn't bad (and isn't available in D):

[sum(1 for x in g) for h,g in groupby(iterabl e)]
sum(1 for x in ifilter(predica te, iterable))
sum(1 for el in iterable if predicate(el))

So you and Python designers may choose to not extend the semantics of
len() for various good reasons, but you will have a hard time
convincing me it's a useless capability :-)

Bye,
bearophile

Sep 22 '08 #9

Steven D'Aprano

On Mon, 22 Sep 2008 04:21:12 -0700, bearophileHUGS wrote:

Steven D'Aprano:

>>Extending len() to support iterables sounds like a good idea, except
that it's not.<

Python language lately has shifted toward more and more usage of lazy
iterables (see range lazy by default, etc). So they are now quite
common. So extending len() to make it act like leniter() too is a way to
adapt a basic Python construct to the changes of the other parts of the
language.

I'm sorry, I don't recognise leniter(). Did I miss something?

In languages like Haskell you can count how many items a lazy sequence
has. But those sequences are generally immutable, so they can be
accessed many times, so len(iterable) doesn't exhaust them like in
Python. So in Python it's less useful.

In Python, xrange() is a lazy sequence that isn't exhausted, but that's a
special case: it actually has a __len__ method, and presumably the length
is calculated from the xrange arguments, not by generating all the items
and counting them. How would you count the number of items in a generic
lazy sequence without actually generating the items first?

This is a common situation where I can only care of the len of the g
group:
[leniter(g) for h,g in groupby(iterabl e)]

There are other situations where I may be interested only in how many
items there are:
leniter(ifilter (predicate, iterable)) leniter(el for el in iterable if
predicate(el))

For my usage I have written a version of the itertools module in D (a
lot of work, but the result is quite useful and flexible, even if I miss
the generator/iterator syntax a lot), and later I have written a len()
able to count the length of lazy iterables too (if the given variable
has a length attribute/property then it returns that value),

I'm not saying that no iterables can accurately predict how many items
they will produce. If they can, then len() should support iterables with
a __len__ attribute. But in general there's no way of predicting how many
items the iterable will produce without iterating over it, and len()
shouldn't do that.

and I have
found that it's useful often enough (almost as the string.xsplit() ). But
in Python there is less need for a len() that counts lazy iterables too
because you can use the following syntax that isn't bad (and isn't
available in D):

[sum(1 for x in g) for h,g in groupby(iterabl e)] sum(1 for x in
ifilter(predica te, iterable)) sum(1 for el in iterable if predicate(el))

I think the idiom sum(1 for item in iterable) is, in general, a mistake.
For starters, it doesn't work for arbitrary iterables, only sequences
(lazy or otherwise) and your choice of variable name may fool people into
thinking they can pass a use-once iterator to your code and have it work.

Secondly, it's not clear what sum(1 for item in iterable) does without
reading over it carefully. Since you're generating the entire length
anyway, len(list(iterab le)) is more readable and almost as efficient for
most practical cases.

As things stand now, list(iterable) is a "dangerous" operation, as it may
consume arbitrarily huge resources. But len() isn't[1], because len()
doesn't operate on arbitrary iterables. This is a good thing.

So you and Python designers may choose to not extend the semantics of
len() for various good reasons, but you will have a hard time convincing
me it's a useless capability :-)

I didn't say that knowing the length of iterators up front was useless.
Sometimes it may be useful, but it is rarely (never?) essential.

[1] len(x) may call x.__len__() which might do anything. But the expected
semantics of __len__ is that it is expected to return an int, and do it
quickly with minimal effort. Methods that do something else are an abuse
of __len__ and should be treated as a bug.

--
Steven

Sep 22 '08 #10

Similar topics

1870

'inverting' a dict

by: Irmen de Jong | last post by:

Hi I have this dict that maps a name to a sequence of other names. I want to have it reversed, i.e., map the other names each to the key they belong to (yes, the other names are unique and they only occur once). Like this: { "key1": ("value1", "value2"), "key2": ("value3,) } -->

Python

2392

expression form of one-to-many dict?

by: Steven Bethard | last post by:

So I end up writing code like this a fair bit: map = {} for key, value in sequence: map.setdefault(key, ).append(value) This code basically constructs a one-to-many mapping -- each value that a key occurs with is stored in the list for that key. This code's fine, and seems pretty simple, but thanks to generator

Python

1681

loop beats generator expr creating large dict!?

by: George Young | last post by:

I am puzzled that creating large dicts with an explicit iterable of key,value pairs seems to be slow. I thought to save time by doing: palettes = dict((w,set(w)) for w in words) instead of: palettes={} for w in words: palettes=set(w)

Python

3016

dict.reserve and other tricks

by: bearophileHUGS | last post by:

I have started doing practice creating C extensions for CPython, so here are two ideas I have had, possibly useless. If you keep adding elements to a CPython dict/set, it periodically rebuilds itself. So maybe dict.reserve(n) and a set.reserve(n) methods may help, reserving enough (empty) memory for about n *distinct* keys the programmer wants to add to the dict/set in a short future. I have seen that the the C API of the dicts doesn't...

Python

37273

dict.items() vs dict.iteritems and similar questions

by: Drew | last post by:

When is it appropriate to use dict.items() vs dict.iteritems. Both seem to work for something like: for key,val in mydict.items(): print key,val for key,val in mydict.iteritems(): print key,val Also, when is it appropriate to use range() vs xrange(). From my

Python

2093

File to dict

by: mrkafk | last post by:

Hello everyone, I have written this small utility function for transforming legacy file to Python dict: def lookupdmo(domain): lines = open('/etc/virtual/domainowners','r').readlines() lines = for x in lines]

Python

1444

dict generator question

by: Simon Mullis | last post by:

Hi, Let's say I have an arbitrary list of minor software versions of an imaginary software product: l = I'd like to create a dict with major_version : count. (So, in this case:

Python

2731

Dict Comprehension ?

by: Ernst-Ludwig Brust | last post by:

Given 2 Number-Lists say l0 and l1, count the various positiv differences between the 2 lists the following part works: dif= da={} for d in dif: da=da.get(d,0)+1 i wonder, if there is a way, to avoid the list dif

Python

1722

set/dict comp in Py2.6

by: bearophileHUGS | last post by:

I'd like to know why Python 2.6 doesn't have the syntax to create sets/ dicts of Python 3.0, like: {x*x for x in xrange(10)} {x:x*x for x in xrange(10)} Bye, bearophile

Python

9389

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10003

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

9943

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8825

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7370

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6643

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5271

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5410

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3918

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp