472,980 Members | 1,567 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,980 software developers and data experts.

Unique Elements in a List

Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.
Probably something like a Hash Table approach!!
I would like to get this done without unnecessary overhead.And the list
could be essentially anything strings,floats,int etc...

Or is it already avaliable as an attr to a list or an array?
I dont remember seeing anything like that.

Jul 19 '05 #1
24 4097
OK I need to be more clear I guess!Unique Elements I mean, elements
that are non repeating. so in the above list 0.4, 0.9 are unique as
they exist only once in the list.

Jul 19 '05 #2
This is not the most beautiful idiom, but it works...

d = {}
for k in data:
try:
d[k] += 1
except:
d[k] = 1

for k,v in d.items():
if v == 1:
print k

Jul 19 '05 #3
su*******@gmail.com wrote:
Is there an easy way to grab the Unique elements from a list?

Yes.
The set data type.
Look in the reference of python 2.4.
--
Best regards
Roie Kerstein

Jul 19 '05 #4
On Monday 09 May 2005 03:15 pm, su*******@gmail.com wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.
Probably something like a Hash Table approach!!
I would like to get this done without unnecessary overhead.And the list
could be essentially anything strings,floats,int etc...

Or is it already avaliable as an attr to a list or an array?
I dont remember seeing anything like that.


from sets import Set

data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

[x for x in Set(data) if data.count(x) == 1]

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Jul 19 '05 #5
runes:
d = {}
for k in data:
try:
d[k] += 1
except:
d[k] = 1

for k,v in d.items():
if v == 1:
print k


For this problem, if duplicated keys are common enough, this version is
probably the faster.

With this *different* problem, it seems that sometimes the faster
solution is using "in":
http://pyalot.blogspot.com/2005/04/d...ary-speed.html

Bye,
Bearophile

Jul 19 '05 #6
In article <11**********************@z14g2000cwz.googlegroups .com>,
"su*******@gmail.com" <su*******@gmail.com> wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.
Probably something like a Hash Table approach!!
I would like to get this done without unnecessary overhead.And the list
could be essentially anything strings,floats,int etc...

Or is it already avaliable as an attr to a list or an array?
I dont remember seeing anything like that.


From your comments downthread, it seems you want to find those elements
of the input sequence which occur exactly once, and return not only
these elements, but also their positions.

One reasonable solution might be as follows:

def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p[0]) for (x, p) in elts.iteritems()
if len(p) == 1 ]

This returns a list of tuples of the form (x, pos), where x is an
element of seq that occurs exactly once, and pos is its index in the
original sequence. This implementation traverses the input sequence
exactly once, and requires storage proportional to the length of the
input.

-M

--
Michael J. Fromberger | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
Jul 19 '05 #7
In article <11**********************@f14g2000cwb.googlegroups .com>,
runes <ru*********@gmail.com> wrote:

This is not the most beautiful idiom, but it works...

d = {}
for k in data:
try:
d[k] += 1
except:
d[k] = 1

for k,v in d.items():
if v == 1:
print k


Only if "works" does not include "order preserving".
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"And if that makes me an elitist...I couldn't be happier." --JMS
Jul 19 '05 #8
"su*******@gmail.com" <su*******@gmail.com> writes:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]


Untested: here's an iterator that gives you the index and value,
without reordering. Uses dicts instead of sets for backwards compatibility.

def unique_elements(data):
seen = {}
for n,x in data:
if x not in seen:
seen[x] = 1
yield n,x
Jul 19 '05 #9
You're right. I somehow missed the index part :-o. It's easy to fix
though.

Jul 19 '05 #10
Paul Rubin wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
Untested: here's an iterator that gives you the index and value,
without reordering. Uses dicts instead of sets for backwards compatibility.

def unique_elements(data):
seen = {}
for n,x in data:
if x not in seen:
seen[x] = 1
yield n,x


you forgot enumerate()

and if you fix that, you'll notice that the output doesn't quite match
the OP's spec:
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.


here's a straight-forward variation that gives the specified output,
in the original order:

def unique_elements(data):
count = {}
data = list(enumerate(data))
for n,x in data:
count[x] = count.setdefault(x, 0) + 1
for n,x in data:
if count[x] == 1:
yield x, n # value with index

depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like

from operator import itemgetter

def unique_elements(data):
seen = {}; index = {}
for n, x in enumerate(data):
if x in seen:
del index[x]
else:
index[x] = seen[x] = n
index = index.items()
index.sort(key=itemgetter(1)) # leave this out if order doesn't matter
return index

could work.

</F>

Jul 19 '05 #11
James Stroud <js*****@mbi.ucla.edu> writes:
from sets import Set

data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

[x for x in Set(data) if data.count(x) == 1]


Um.

....I must have missed something, but I'll post nevertheless:

wouldn't just

[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:
[x for x in Set(data) if data.count(x) == 1] [0.90000000000000002, 0.40000000000000002]
[x for x in data if data.count(x) == 1]

[0.40000000000000002, 0.90000000000000002]

Though I'll admit I also thought of Sets first, because I didn't remember
there was such a nice method list.count().

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!
One day, when he was naughty, Mr Bunnsy looked over the hedge into Farmer
Fred's field and it was full of fresh green lettuces. Mr Bunnsy, however, was
not full of lettuces. This did not seem fair. --Mr Bunnsy has an adventure
Jul 19 '05 #12
Fredrik Lundh wrote:
depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like

form sets import Set

data = list(Set([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]))
--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
Jul 19 '05 #13
"Michael J. Fromberger" <Mi******************@Clothing.Dartmouth.EDU> writes:
One reasonable solution might be as follows:

def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p[0]) for (x, p) in elts.iteritems()
if len(p) == 1 ]


This is neat. Being lazy (the wrong way) I've never payed much attention why
would I need dict.setdefault() when I have things as dict.get(k, [def]). This
was a nice example of good use for setdefault() because it works like
dict.get() except it also sets the value if it didn't exist.

+1 IOTW (idiom of the week).

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

$_ = '456476617264204d616a616b6172692c20612043687269737 469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";
Jul 19 '05 #14
Max M wrote:
depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like


form sets import Set

data = list(Set([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]))


read the OP's spec again.

</F>

Jul 19 '05 #15
On 9 May 2005 15:36:37 -0700, "su*******@gmail.com" <su*******@gmail.com> wrote:
OK I need to be more clear I guess!Unique Elements I mean, elements
that are non repeating. so in the above list 0.4, 0.9 are unique as
they exist only once in the list.

You want to be careful of your definitions, especially with floating point values,
which may surprise the uninitiated. Dicts and sets hash numerically equal values
to the same hash, and then do equality test, so you get legitimate results like:
set([9*.1-8*.1, .1]) set([0.10000000000000001, 0.099999999999999978]) set([123, 123.0, 123L]) set([123]) set([123.0, 123, 123L]) set([123.0]) set([123L, 123, 123.0]) set([123L])
You may want to consider creating representations other than the original data
to use in your uniqueness testing, depending on your definition.

If you are happy with the way dicts and sets compare elements, you could do
a dict that keeps counts, or FTHOI something different (hot off untested griddle,
so test before relying ;-)
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
once=set()
more=set()
for el in data: ... if el in once: more.add(el)
... else: once.add(el)
... once-more

set([0.90000000000000002, 0.40000000000000002])

Not the most efficient space-wise, not sure about speed.

Regards,
Bengt Richter
Jul 19 '05 #16
Fredrik Lundh wrote:
Max M wrote:

depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like


form sets import Set

data = list(Set([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]))

read the OP's spec again.


Well if you want to haggle over minor details like a spec, it's easy to
be critical!
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

freqs = {}
for position in xrange(len(data)):
freqs.setdefault(data[position], []).append(position)
unique = [(value, positions[0]) for (value, positions) in freqs.items()
if len(positions) == 1]

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
Jul 19 '05 #17
"Fredrik Lundh" <fr*****@pythonware.com> writes:
you forgot enumerate()
Oh, oops.
and if you fix that, you'll notice that the output doesn't quite match
the OP's spec:
> what I am looking for is the unique elements 0.4 and 0.9 with their
> index from the list.


Oh, I see, he wanted the elements that only occur once. I misunderstood.
Jul 19 '05 #18
In article <87************@titan.staselog.com>,
Edvard Majakari <ed*********@majakari.net> wrote:
James Stroud <js*****@mbi.ucla.edu> writes:

from sets import Set

data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

[x for x in Set(data) if data.count(x) == 1]


Um.

...I must have missed something, but I'll post nevertheless:

wouldn't just

[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:


Only for small datasets -- this is an O(N^2) algorithm.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"And if that makes me an elitist...I couldn't be happier." --JMS
Jul 19 '05 #19
su*******@gmail.com wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.


This is basically the same as Fredrik Lundh's solution, but here it is
anyway:

py> def f(lst):
.... counts = dicttools.counts(lst)
.... return [(i, elem)
.... for i, elem in enumerate(lst)
.... if counts[elem] == 1]
....
py> f([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9])
[(3, 0.40000000000000002), (7, 0.90000000000000002)]

Where dicttools.counts is defined as:

def counts(iterable, key=None):
result = {}
for item in iterable:
# apply key function if necessary
if key is None:
k = item
else:
k = key(item)
# increment key's count
try:
result[k] += 1
except KeyError:
result[k] = 1
return result

(I use this so often that I have a module called dicttools to hold it.
You don't actually need the key= argument for your problem, but I just
copied the code directly.)

STeVe
Jul 19 '05 #20
aa**@pythoncraft.com (Aahz) writes:
[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:


Only for small datasets -- this is an O(N^2) algorithm.


I realized that, but maybe I should've pointed it out too. For the OP if
he/she is unaware - notation O(N^2) (big O n squared) means the computing time
of the algorithm increases exponentially (where exponent is 2) relative to the
size of the input.

Eg. if the input size is i and it takes p seconds to compute it, then given
input size 10*i, the computing time would be 100*p. These notions can apply
for memory usage as well, but the problem in here is the computing time:
list.count() must iterate through the list each time, and as such the loop

[x for x in data if data.count(x) == 1]

iterates through each item in data (x for x in data), and for each item it
will again iterate through each item in data to see how many times it
occurred. If data contains 200 items, this idiom would iterate the structure
40 000 times. With today's computers one wouldn't notice it, unless each item
requires heavy processing (eg. launching a separate process per item
etc). However, if the length of the data can be thousands or even tens of
thousands, this idiom would become unusable. If data contained 75 000 items,
the loop would do 25 625 000 000 iterations, effectively bringing cpu to
halt..

So, generally one should avoid using exponential O(n^k) (where k > 1)
algorithms, unless faster O(n) or O(n*lg(n)) solution is either impossible (or
too complex, and inputs are known to be small etc).

Wikipedia has good resources and pointers to related things, see
http://en.wikipedia.org/wiki/Analysis_of_algorithms

--
#!/usr/bin/perl -w
$h={23,69,28,'6e',2,64,3,76,7,20,13,61,8,'4d',24,7 3,10,'6a',12,'6b',21,68,14,
72,16,'2c',17,20,9,61,11,61,25,74,4,61,1,45,29,20, 5,72,18,61,15,69,20,43,26,
69,19,20,6,64,27,61,22,72};$_=join'',map{chr hex $h->{$_}}sort{$a<=>$b}
keys%$h;m/(\w).*\s(\w+)/x;$_.=uc substr(crypt(join('',60,28,14,49),join'',
map{lc}($1,substr $2,4,1)),2,4)."\n"; print;
Jul 19 '05 #21
Edvard Majakari wrote:
I realized that, but maybe I should've pointed it out too. For the OP if
he/she is unaware - notation O(N^2) (big O n squared) means the computing time
of the algorithm increases exponentially (where exponent is 2) relative to the
size of the input. Normally this is called a polynomial, rather than exponential increase.
Exponential increases are typically of the form (C^N) (they are all
equivalent).
Polynomial times are hallways characterized by their largest exponent,
So you never call something O(N^3 - N^2) Since, as N gets large enough,
The N^2 term shrinks to non-existence.

http://en.wikipedia.org/wiki/Exponential_time
... generally one should avoid using exponential O(n^k) (where k > 1)


Again polynomial, not exponential time. Note that there is no
polynomial time algorithm with (k < 1), since it takes O(n) time
to read the problem.
--Scott David Daniels
Sc***********@Acm.Org

Jul 19 '05 #22
Scott David Daniels <Sc***********@Acm.Org> writes:
Normally this is called a polynomial, rather than exponential increase.
Exponential increases are typically of the form (C^N) (they are all
equivalent).
Polynomial times are hallways characterized by their largest exponent,
So you never call something O(N^3 - N^2) Since, as N gets large enough,
The N^2 term shrinks to non-existence.


Yup, you are of course, completely correct. I was thinking of "exponent here
is two" and mistakenly named in exponential.

my_text.replace('exponent','polynom'), there :)

Reminding of ignoring terms with smaller exponent was good, too.

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

$_ = '456476617264204d616a616b6172692c20612043687269737 469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";
Jul 19 '05 #23
Scott David Daniels wrote:
Again polynomial, not exponential time. Note that there is no
polynomial time algorithm with (k < 1), since it takes O(n) time
to read the problem.


Being a stickler (I develop software after all :) quantum computers
can do better than that. For example, Grover's algorithm
http://en.wikipedia.org/wiki/Grover%27s_algorithm
for searching an unsorted list solves the problem in O(N**0.5) time.

Being even more picky, I think the largest N that's been tested
so far is on the order of 5.

Andrew
da***@dalkescientific.com

Jul 19 '05 #24
> One reasonable solution might be as follows:

def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p[0]) for (x, p) in elts.iteritems()
if len(p) == 1 ]


Minor tweak to conserve space:

def bachelor_filter(iter_over_hashables):
B={}
for index, elem in enumerate(iter_over_hashables):
if B.setdefault(elem, index) != index:
B[elem]=None
return [pair for pair in B.items() if pair[1] is not None]

Jul 19 '05 #25

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: kevin parks | last post by:
hi. I've been banging my head against this one a while and have asked around, and i am throwing this one out there in the hopes that some one can shed some light on what has turned out to be a...
15
by: les_ander | last post by:
Hi, I have many set objects some of which can contain same group of object while others can be subset of the other. Given a list of sets, I need to get a list of unique sets such that non of the...
1
by: James Stroud | last post by:
On Monday 09 May 2005 03:15 pm, superprad@gmail.com wrote: > Is there an easy way to grab the Unique elements from a list? from sets import Set data = --
22
by: Claudio Jolowicz | last post by:
Is it possible to store unique objects in an STL container? Suppose an object of class C is unique: class C { public: C() {} ~C() {} private:
5
by: Paulers | last post by:
Hello all, I have a string array with duplicate elements. I need to create a new string array containing only the unique elements. Is there an easy way to do this? I have tried looping through...
9
by: Brian Tkatch | last post by:
I'm looking for a simple way to unique an array of strings. I came up with this. Does it make sense? Am i missing anything? (Testing seems to show it to work.) Public Function Unique(ByVal...
6
by: Tekkaman | last post by:
I have a list of lists and I want to define an iterator (let's call that uniter) over all unique elements, in any order. For example, calling: sorted(uniter(, , ])) must return . I tried the...
1
by: Asko Telinen | last post by:
Hi all. Im a bit newbie writing xml schemas. Is it possible to define xml element that must have unique attribute values in same level. For example if i have a xml - document: <list>...
5
by: =?Utf-8?B?UGF1bA==?= | last post by:
Hi I have a list of type object. The object has an ID as one of the elements and I would like to create another list that just has objects with unique IDs. For example in the list if I have...
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
4
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.