470,815 Members | 2,263 Online

# Unique Elements in a List

Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.
Probably something like a Hash Table approach!!
I would like to get this done without unnecessary overhead.And the list
could be essentially anything strings,floats,int etc...

Or is it already avaliable as an attr to a list or an array?
I dont remember seeing anything like that.

Jul 19 '05 #1
24 3984 OK I need to be more clear I guess!Unique Elements I mean, elements
that are non repeating. so in the above list 0.4, 0.9 are unique as
they exist only once in the list.

Jul 19 '05 #2
This is not the most beautiful idiom, but it works...

d = {}
for k in data:
try:
d[k] += 1
except:
d[k] = 1

for k,v in d.items():
if v == 1:
print k

Jul 19 '05 #3
su*******@gmail.com wrote:
Is there an easy way to grab the Unique elements from a list?

Yes.
The set data type.
Look in the reference of python 2.4.
--
Best regards
Roie Kerstein

Jul 19 '05 #4
On Monday 09 May 2005 03:15 pm, su*******@gmail.com wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.
Probably something like a Hash Table approach!!
I would like to get this done without unnecessary overhead.And the list
could be essentially anything strings,floats,int etc...

Or is it already avaliable as an attr to a list or an array?
I dont remember seeing anything like that.

from sets import Set

data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

[x for x in Set(data) if data.count(x) == 1]

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Jul 19 '05 #5
runes:
d = {}
for k in data:
try:
d[k] += 1
except:
d[k] = 1

for k,v in d.items():
if v == 1:
print k

For this problem, if duplicated keys are common enough, this version is
probably the faster.

With this *different* problem, it seems that sometimes the faster
solution is using "in":
http://pyalot.blogspot.com/2005/04/d...ary-speed.html

Bye,
Bearophile

Jul 19 '05 #6
"su*******@gmail.com" <su*******@gmail.com> wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.
Probably something like a Hash Table approach!!
I would like to get this done without unnecessary overhead.And the list
could be essentially anything strings,floats,int etc...

Or is it already avaliable as an attr to a list or an array?
I dont remember seeing anything like that.

of the input sequence which occur exactly once, and return not only
these elements, but also their positions.

One reasonable solution might be as follows:

def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p) for (x, p) in elts.iteritems()
if len(p) == 1 ]

This returns a list of tuples of the form (x, pos), where x is an
element of seq that occurs exactly once, and pos is its index in the
original sequence. This implementation traverses the input sequence
exactly once, and requires storage proportional to the length of the
input.

-M

--
Michael J. Fromberger | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
Jul 19 '05 #7
runes <ru*********@gmail.com> wrote:

This is not the most beautiful idiom, but it works...

d = {}
for k in data:
try:
d[k] += 1
except:
d[k] = 1

for k,v in d.items():
if v == 1:
print k

Only if "works" does not include "order preserving".
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"And if that makes me an elitist...I couldn't be happier." --JMS
Jul 19 '05 #8
"su*******@gmail.com" <su*******@gmail.com> writes:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

Untested: here's an iterator that gives you the index and value,
without reordering. Uses dicts instead of sets for backwards compatibility.

def unique_elements(data):
seen = {}
for n,x in data:
if x not in seen:
seen[x] = 1
yield n,x
Jul 19 '05 #9
You're right. I somehow missed the index part :-o. It's easy to fix
though.

Jul 19 '05 #10
Paul Rubin wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
Untested: here's an iterator that gives you the index and value,
without reordering. Uses dicts instead of sets for backwards compatibility.

def unique_elements(data):
seen = {}
for n,x in data:
if x not in seen:
seen[x] = 1
yield n,x

you forgot enumerate()

and if you fix that, you'll notice that the output doesn't quite match
the OP's spec:
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.

here's a straight-forward variation that gives the specified output,
in the original order:

def unique_elements(data):
count = {}
data = list(enumerate(data))
for n,x in data:
count[x] = count.setdefault(x, 0) + 1
for n,x in data:
if count[x] == 1:
yield x, n # value with index

depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like

from operator import itemgetter

def unique_elements(data):
seen = {}; index = {}
for n, x in enumerate(data):
if x in seen:
del index[x]
else:
index[x] = seen[x] = n
index = index.items()
index.sort(key=itemgetter(1)) # leave this out if order doesn't matter
return index

could work.

</F>

Jul 19 '05 #11
James Stroud <js*****@mbi.ucla.edu> writes:
from sets import Set

data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

[x for x in Set(data) if data.count(x) == 1]

Um.

....I must have missed something, but I'll post nevertheless:

wouldn't just

[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:
[x for x in Set(data) if data.count(x) == 1] [0.90000000000000002, 0.40000000000000002]
[x for x in data if data.count(x) == 1]

[0.40000000000000002, 0.90000000000000002]

Though I'll admit I also thought of Sets first, because I didn't remember
there was such a nice method list.count().

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!
One day, when he was naughty, Mr Bunnsy looked over the hedge into Farmer
Fred's field and it was full of fresh green lettuces. Mr Bunnsy, however, was
not full of lettuces. This did not seem fair. --Mr Bunnsy has an adventure
Jul 19 '05 #12
Fredrik Lundh wrote:
depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like

form sets import Set

data = list(Set([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]))
--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
Jul 19 '05 #13
"Michael J. Fromberger" <Mi******************@Clothing.Dartmouth.EDU> writes:
One reasonable solution might be as follows:

def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p) for (x, p) in elts.iteritems()
if len(p) == 1 ]

This is neat. Being lazy (the wrong way) I've never payed much attention why
would I need dict.setdefault() when I have things as dict.get(k, [def]). This
was a nice example of good use for setdefault() because it works like
dict.get() except it also sets the value if it didn't exist.

+1 IOTW (idiom of the week).

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

\$_ = '456476617264204d616a616b6172692c20612043687269737 469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";
Jul 19 '05 #14
Max M wrote:
depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like

form sets import Set

data = list(Set([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]))

</F>

Jul 19 '05 #15
On 9 May 2005 15:36:37 -0700, "su*******@gmail.com" <su*******@gmail.com> wrote:
OK I need to be more clear I guess!Unique Elements I mean, elements
that are non repeating. so in the above list 0.4, 0.9 are unique as
they exist only once in the list.

You want to be careful of your definitions, especially with floating point values,
which may surprise the uninitiated. Dicts and sets hash numerically equal values
to the same hash, and then do equality test, so you get legitimate results like:
set([9*.1-8*.1, .1]) set([0.10000000000000001, 0.099999999999999978]) set([123, 123.0, 123L]) set() set([123.0, 123, 123L]) set([123.0]) set([123L, 123, 123.0]) set([123L])
You may want to consider creating representations other than the original data

If you are happy with the way dicts and sets compare elements, you could do
a dict that keeps counts, or FTHOI something different (hot off untested griddle,
so test before relying ;-)
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
once=set()
more=set()
for el in data: ... if el in once: more.add(el)
... once-more

set([0.90000000000000002, 0.40000000000000002])

Not the most efficient space-wise, not sure about speed.

Regards,
Bengt Richter
Jul 19 '05 #16
Fredrik Lundh wrote:
Max M wrote:

depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like

form sets import Set

data = list(Set([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]))

Well if you want to haggle over minor details like a spec, it's easy to
be critical!
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

freqs = {}
for position in xrange(len(data)):
freqs.setdefault(data[position], []).append(position)
unique = [(value, positions) for (value, positions) in freqs.items()
if len(positions) == 1]

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
Jul 19 '05 #17
"Fredrik Lundh" <fr*****@pythonware.com> writes:
you forgot enumerate()
Oh, oops.
and if you fix that, you'll notice that the output doesn't quite match
the OP's spec:
> what I am looking for is the unique elements 0.4 and 0.9 with their
> index from the list.

Oh, I see, he wanted the elements that only occur once. I misunderstood.
Jul 19 '05 #18
In article <87************@titan.staselog.com>,
Edvard Majakari <ed*********@majakari.net> wrote:
James Stroud <js*****@mbi.ucla.edu> writes:

from sets import Set

data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

[x for x in Set(data) if data.count(x) == 1]

Um.

...I must have missed something, but I'll post nevertheless:

wouldn't just

[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:

Only for small datasets -- this is an O(N^2) algorithm.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"And if that makes me an elitist...I couldn't be happier." --JMS
Jul 19 '05 #19
su*******@gmail.com wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.

This is basically the same as Fredrik Lundh's solution, but here it is
anyway:

py> def f(lst):
.... counts = dicttools.counts(lst)
.... return [(i, elem)
.... for i, elem in enumerate(lst)
.... if counts[elem] == 1]
....
py> f([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9])
[(3, 0.40000000000000002), (7, 0.90000000000000002)]

Where dicttools.counts is defined as:

def counts(iterable, key=None):
result = {}
for item in iterable:
# apply key function if necessary
if key is None:
k = item
else:
k = key(item)
# increment key's count
try:
result[k] += 1
except KeyError:
result[k] = 1
return result

(I use this so often that I have a module called dicttools to hold it.
You don't actually need the key= argument for your problem, but I just
copied the code directly.)

STeVe
Jul 19 '05 #20
aa**@pythoncraft.com (Aahz) writes:
[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:

Only for small datasets -- this is an O(N^2) algorithm.

I realized that, but maybe I should've pointed it out too. For the OP if
he/she is unaware - notation O(N^2) (big O n squared) means the computing time
of the algorithm increases exponentially (where exponent is 2) relative to the
size of the input.

Eg. if the input size is i and it takes p seconds to compute it, then given
input size 10*i, the computing time would be 100*p. These notions can apply
for memory usage as well, but the problem in here is the computing time:
list.count() must iterate through the list each time, and as such the loop

[x for x in data if data.count(x) == 1]

iterates through each item in data (x for x in data), and for each item it
will again iterate through each item in data to see how many times it
occurred. If data contains 200 items, this idiom would iterate the structure
40 000 times. With today's computers one wouldn't notice it, unless each item
requires heavy processing (eg. launching a separate process per item
etc). However, if the length of the data can be thousands or even tens of
thousands, this idiom would become unusable. If data contained 75 000 items,
the loop would do 25 625 000 000 iterations, effectively bringing cpu to
halt..

So, generally one should avoid using exponential O(n^k) (where k > 1)
algorithms, unless faster O(n) or O(n*lg(n)) solution is either impossible (or
too complex, and inputs are known to be small etc).

Wikipedia has good resources and pointers to related things, see
http://en.wikipedia.org/wiki/Analysis_of_algorithms

--
#!/usr/bin/perl -w
\$h={23,69,28,'6e',2,64,3,76,7,20,13,61,8,'4d',24,7 3,10,'6a',12,'6b',21,68,14,
72,16,'2c',17,20,9,61,11,61,25,74,4,61,1,45,29,20, 5,72,18,61,15,69,20,43,26,
69,19,20,6,64,27,61,22,72};\$_=join'',map{chr hex \$h->{\$_}}sort{\$a<=>\$b}
keys%\$h;m/(\w).*\s(\w+)/x;\$_.=uc substr(crypt(join('',60,28,14,49),join'',
map{lc}(\$1,substr \$2,4,1)),2,4)."\n"; print;
Jul 19 '05 #21
Edvard Majakari wrote:
I realized that, but maybe I should've pointed it out too. For the OP if
he/she is unaware - notation O(N^2) (big O n squared) means the computing time
of the algorithm increases exponentially (where exponent is 2) relative to the
size of the input. Normally this is called a polynomial, rather than exponential increase.
Exponential increases are typically of the form (C^N) (they are all
equivalent).
Polynomial times are hallways characterized by their largest exponent,
So you never call something O(N^3 - N^2) Since, as N gets large enough,
The N^2 term shrinks to non-existence.

http://en.wikipedia.org/wiki/Exponential_time
... generally one should avoid using exponential O(n^k) (where k > 1)

Again polynomial, not exponential time. Note that there is no
polynomial time algorithm with (k < 1), since it takes O(n) time
--Scott David Daniels
Sc***********@Acm.Org

Jul 19 '05 #22
Scott David Daniels <Sc***********@Acm.Org> writes:
Normally this is called a polynomial, rather than exponential increase.
Exponential increases are typically of the form (C^N) (they are all
equivalent).
Polynomial times are hallways characterized by their largest exponent,
So you never call something O(N^3 - N^2) Since, as N gets large enough,
The N^2 term shrinks to non-existence.

Yup, you are of course, completely correct. I was thinking of "exponent here
is two" and mistakenly named in exponential.

my_text.replace('exponent','polynom'), there :)

Reminding of ignoring terms with smaller exponent was good, too.

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

\$_ = '456476617264204d616a616b6172692c20612043687269737 469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";
Jul 19 '05 #23
Scott David Daniels wrote:
Again polynomial, not exponential time. Note that there is no
polynomial time algorithm with (k < 1), since it takes O(n) time

Being a stickler (I develop software after all :) quantum computers
can do better than that. For example, Grover's algorithm
http://en.wikipedia.org/wiki/Grover%27s_algorithm
for searching an unsorted list solves the problem in O(N**0.5) time.

Being even more picky, I think the largest N that's been tested
so far is on the order of 5.

Andrew
da***@dalkescientific.com

Jul 19 '05 #24
> One reasonable solution might be as follows:

def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p) for (x, p) in elts.iteritems()
if len(p) == 1 ]

Minor tweak to conserve space:

def bachelor_filter(iter_over_hashables):
B={}
for index, elem in enumerate(iter_over_hashables):
if B.setdefault(elem, index) != index:
B[elem]=None
return [pair for pair in B.items() if pair is not None]

Jul 19 '05 #25

### This discussion thread is closed

Replies have been disabled for this discussion.

### Similar topics

 2 posts views Thread by kevin parks | last post: by 15 posts views Thread by les_ander | last post: by 1 post views Thread by James Stroud | last post: by 22 posts views Thread by Claudio Jolowicz | last post: by 5 posts views Thread by Paulers | last post: by 9 posts views Thread by Brian Tkatch | last post: by 6 posts views Thread by Tekkaman | last post: by 1 post views Thread by Asko Telinen | last post: by 5 posts views Thread by =?Utf-8?B?UGF1bA==?= | last post: by reply views Thread by AlexandraMT | last post: by reply views Thread by prepawan | last post: by 1 post views Thread by DANILIN | last post: by reply views Thread by tracyyun | last post: by reply views Thread by sjain6 | last post: by reply views Thread by xykovyci | last post: by reply views Thread by mihailmihai484 | last post: by reply views Thread by patriciajames | last post: by reply views Thread by milkinvl | last post: by