473,320 Members | 1,854 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Unique Elements in a List

Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.
Probably something like a Hash Table approach!!
I would like to get this done without unnecessary overhead.And the list
could be essentially anything strings,floats,int etc...

Or is it already avaliable as an attr to a list or an array?
I dont remember seeing anything like that.

Jul 19 '05 #1
24 4125
OK I need to be more clear I guess!Unique Elements I mean, elements
that are non repeating. so in the above list 0.4, 0.9 are unique as
they exist only once in the list.

Jul 19 '05 #2
This is not the most beautiful idiom, but it works...

d = {}
for k in data:
try:
d[k] += 1
except:
d[k] = 1

for k,v in d.items():
if v == 1:
print k

Jul 19 '05 #3
su*******@gmail.com wrote:
Is there an easy way to grab the Unique elements from a list?

Yes.
The set data type.
Look in the reference of python 2.4.
--
Best regards
Roie Kerstein

Jul 19 '05 #4
On Monday 09 May 2005 03:15 pm, su*******@gmail.com wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.
Probably something like a Hash Table approach!!
I would like to get this done without unnecessary overhead.And the list
could be essentially anything strings,floats,int etc...

Or is it already avaliable as an attr to a list or an array?
I dont remember seeing anything like that.


from sets import Set

data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

[x for x in Set(data) if data.count(x) == 1]

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
Jul 19 '05 #5
runes:
d = {}
for k in data:
try:
d[k] += 1
except:
d[k] = 1

for k,v in d.items():
if v == 1:
print k


For this problem, if duplicated keys are common enough, this version is
probably the faster.

With this *different* problem, it seems that sometimes the faster
solution is using "in":
http://pyalot.blogspot.com/2005/04/d...ary-speed.html

Bye,
Bearophile

Jul 19 '05 #6
In article <11**********************@z14g2000cwz.googlegroups .com>,
"su*******@gmail.com" <su*******@gmail.com> wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.
Probably something like a Hash Table approach!!
I would like to get this done without unnecessary overhead.And the list
could be essentially anything strings,floats,int etc...

Or is it already avaliable as an attr to a list or an array?
I dont remember seeing anything like that.


From your comments downthread, it seems you want to find those elements
of the input sequence which occur exactly once, and return not only
these elements, but also their positions.

One reasonable solution might be as follows:

def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p[0]) for (x, p) in elts.iteritems()
if len(p) == 1 ]

This returns a list of tuples of the form (x, pos), where x is an
element of seq that occurs exactly once, and pos is its index in the
original sequence. This implementation traverses the input sequence
exactly once, and requires storage proportional to the length of the
input.

-M

--
Michael J. Fromberger | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
Jul 19 '05 #7
In article <11**********************@f14g2000cwb.googlegroups .com>,
runes <ru*********@gmail.com> wrote:

This is not the most beautiful idiom, but it works...

d = {}
for k in data:
try:
d[k] += 1
except:
d[k] = 1

for k,v in d.items():
if v == 1:
print k


Only if "works" does not include "order preserving".
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"And if that makes me an elitist...I couldn't be happier." --JMS
Jul 19 '05 #8
"su*******@gmail.com" <su*******@gmail.com> writes:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]


Untested: here's an iterator that gives you the index and value,
without reordering. Uses dicts instead of sets for backwards compatibility.

def unique_elements(data):
seen = {}
for n,x in data:
if x not in seen:
seen[x] = 1
yield n,x
Jul 19 '05 #9
You're right. I somehow missed the index part :-o. It's easy to fix
though.

Jul 19 '05 #10
Paul Rubin wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
Untested: here's an iterator that gives you the index and value,
without reordering. Uses dicts instead of sets for backwards compatibility.

def unique_elements(data):
seen = {}
for n,x in data:
if x not in seen:
seen[x] = 1
yield n,x


you forgot enumerate()

and if you fix that, you'll notice that the output doesn't quite match
the OP's spec:
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.


here's a straight-forward variation that gives the specified output,
in the original order:

def unique_elements(data):
count = {}
data = list(enumerate(data))
for n,x in data:
count[x] = count.setdefault(x, 0) + 1
for n,x in data:
if count[x] == 1:
yield x, n # value with index

depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like

from operator import itemgetter

def unique_elements(data):
seen = {}; index = {}
for n, x in enumerate(data):
if x in seen:
del index[x]
else:
index[x] = seen[x] = n
index = index.items()
index.sort(key=itemgetter(1)) # leave this out if order doesn't matter
return index

could work.

</F>

Jul 19 '05 #11
James Stroud <js*****@mbi.ucla.edu> writes:
from sets import Set

data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

[x for x in Set(data) if data.count(x) == 1]


Um.

....I must have missed something, but I'll post nevertheless:

wouldn't just

[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:
[x for x in Set(data) if data.count(x) == 1] [0.90000000000000002, 0.40000000000000002]
[x for x in data if data.count(x) == 1]

[0.40000000000000002, 0.90000000000000002]

Though I'll admit I also thought of Sets first, because I didn't remember
there was such a nice method list.count().

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!
One day, when he was naughty, Mr Bunnsy looked over the hedge into Farmer
Fred's field and it was full of fresh green lettuces. Mr Bunnsy, however, was
not full of lettuces. This did not seem fair. --Mr Bunnsy has an adventure
Jul 19 '05 #12
Fredrik Lundh wrote:
depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like

form sets import Set

data = list(Set([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]))
--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
Jul 19 '05 #13
"Michael J. Fromberger" <Mi******************@Clothing.Dartmouth.EDU> writes:
One reasonable solution might be as follows:

def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p[0]) for (x, p) in elts.iteritems()
if len(p) == 1 ]


This is neat. Being lazy (the wrong way) I've never payed much attention why
would I need dict.setdefault() when I have things as dict.get(k, [def]). This
was a nice example of good use for setdefault() because it works like
dict.get() except it also sets the value if it didn't exist.

+1 IOTW (idiom of the week).

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

$_ = '456476617264204d616a616b6172692c20612043687269737 469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";
Jul 19 '05 #14
Max M wrote:
depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like


form sets import Set

data = list(Set([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]))


read the OP's spec again.

</F>

Jul 19 '05 #15
On 9 May 2005 15:36:37 -0700, "su*******@gmail.com" <su*******@gmail.com> wrote:
OK I need to be more clear I guess!Unique Elements I mean, elements
that are non repeating. so in the above list 0.4, 0.9 are unique as
they exist only once in the list.

You want to be careful of your definitions, especially with floating point values,
which may surprise the uninitiated. Dicts and sets hash numerically equal values
to the same hash, and then do equality test, so you get legitimate results like:
set([9*.1-8*.1, .1]) set([0.10000000000000001, 0.099999999999999978]) set([123, 123.0, 123L]) set([123]) set([123.0, 123, 123L]) set([123.0]) set([123L, 123, 123.0]) set([123L])
You may want to consider creating representations other than the original data
to use in your uniqueness testing, depending on your definition.

If you are happy with the way dicts and sets compare elements, you could do
a dict that keeps counts, or FTHOI something different (hot off untested griddle,
so test before relying ;-)
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
once=set()
more=set()
for el in data: ... if el in once: more.add(el)
... else: once.add(el)
... once-more

set([0.90000000000000002, 0.40000000000000002])

Not the most efficient space-wise, not sure about speed.

Regards,
Bengt Richter
Jul 19 '05 #16
Fredrik Lundh wrote:
Max M wrote:

depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like


form sets import Set

data = list(Set([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]))

read the OP's spec again.


Well if you want to haggle over minor details like a spec, it's easy to
be critical!
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

freqs = {}
for position in xrange(len(data)):
freqs.setdefault(data[position], []).append(position)
unique = [(value, positions[0]) for (value, positions) in freqs.items()
if len(positions) == 1]

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
Jul 19 '05 #17
"Fredrik Lundh" <fr*****@pythonware.com> writes:
you forgot enumerate()
Oh, oops.
and if you fix that, you'll notice that the output doesn't quite match
the OP's spec:
> what I am looking for is the unique elements 0.4 and 0.9 with their
> index from the list.


Oh, I see, he wanted the elements that only occur once. I misunderstood.
Jul 19 '05 #18
In article <87************@titan.staselog.com>,
Edvard Majakari <ed*********@majakari.net> wrote:
James Stroud <js*****@mbi.ucla.edu> writes:

from sets import Set

data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

[x for x in Set(data) if data.count(x) == 1]


Um.

...I must have missed something, but I'll post nevertheless:

wouldn't just

[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:


Only for small datasets -- this is an O(N^2) algorithm.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"And if that makes me an elitist...I couldn't be happier." --JMS
Jul 19 '05 #19
su*******@gmail.com wrote:
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.


This is basically the same as Fredrik Lundh's solution, but here it is
anyway:

py> def f(lst):
.... counts = dicttools.counts(lst)
.... return [(i, elem)
.... for i, elem in enumerate(lst)
.... if counts[elem] == 1]
....
py> f([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9])
[(3, 0.40000000000000002), (7, 0.90000000000000002)]

Where dicttools.counts is defined as:

def counts(iterable, key=None):
result = {}
for item in iterable:
# apply key function if necessary
if key is None:
k = item
else:
k = key(item)
# increment key's count
try:
result[k] += 1
except KeyError:
result[k] = 1
return result

(I use this so often that I have a module called dicttools to hold it.
You don't actually need the key= argument for your problem, but I just
copied the code directly.)

STeVe
Jul 19 '05 #20
aa**@pythoncraft.com (Aahz) writes:
[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:


Only for small datasets -- this is an O(N^2) algorithm.


I realized that, but maybe I should've pointed it out too. For the OP if
he/she is unaware - notation O(N^2) (big O n squared) means the computing time
of the algorithm increases exponentially (where exponent is 2) relative to the
size of the input.

Eg. if the input size is i and it takes p seconds to compute it, then given
input size 10*i, the computing time would be 100*p. These notions can apply
for memory usage as well, but the problem in here is the computing time:
list.count() must iterate through the list each time, and as such the loop

[x for x in data if data.count(x) == 1]

iterates through each item in data (x for x in data), and for each item it
will again iterate through each item in data to see how many times it
occurred. If data contains 200 items, this idiom would iterate the structure
40 000 times. With today's computers one wouldn't notice it, unless each item
requires heavy processing (eg. launching a separate process per item
etc). However, if the length of the data can be thousands or even tens of
thousands, this idiom would become unusable. If data contained 75 000 items,
the loop would do 25 625 000 000 iterations, effectively bringing cpu to
halt..

So, generally one should avoid using exponential O(n^k) (where k > 1)
algorithms, unless faster O(n) or O(n*lg(n)) solution is either impossible (or
too complex, and inputs are known to be small etc).

Wikipedia has good resources and pointers to related things, see
http://en.wikipedia.org/wiki/Analysis_of_algorithms

--
#!/usr/bin/perl -w
$h={23,69,28,'6e',2,64,3,76,7,20,13,61,8,'4d',24,7 3,10,'6a',12,'6b',21,68,14,
72,16,'2c',17,20,9,61,11,61,25,74,4,61,1,45,29,20, 5,72,18,61,15,69,20,43,26,
69,19,20,6,64,27,61,22,72};$_=join'',map{chr hex $h->{$_}}sort{$a<=>$b}
keys%$h;m/(\w).*\s(\w+)/x;$_.=uc substr(crypt(join('',60,28,14,49),join'',
map{lc}($1,substr $2,4,1)),2,4)."\n"; print;
Jul 19 '05 #21
Edvard Majakari wrote:
I realized that, but maybe I should've pointed it out too. For the OP if
he/she is unaware - notation O(N^2) (big O n squared) means the computing time
of the algorithm increases exponentially (where exponent is 2) relative to the
size of the input. Normally this is called a polynomial, rather than exponential increase.
Exponential increases are typically of the form (C^N) (they are all
equivalent).
Polynomial times are hallways characterized by their largest exponent,
So you never call something O(N^3 - N^2) Since, as N gets large enough,
The N^2 term shrinks to non-existence.

http://en.wikipedia.org/wiki/Exponential_time
... generally one should avoid using exponential O(n^k) (where k > 1)


Again polynomial, not exponential time. Note that there is no
polynomial time algorithm with (k < 1), since it takes O(n) time
to read the problem.
--Scott David Daniels
Sc***********@Acm.Org

Jul 19 '05 #22
Scott David Daniels <Sc***********@Acm.Org> writes:
Normally this is called a polynomial, rather than exponential increase.
Exponential increases are typically of the form (C^N) (they are all
equivalent).
Polynomial times are hallways characterized by their largest exponent,
So you never call something O(N^3 - N^2) Since, as N gets large enough,
The N^2 term shrinks to non-existence.


Yup, you are of course, completely correct. I was thinking of "exponent here
is two" and mistakenly named in exponential.

my_text.replace('exponent','polynom'), there :)

Reminding of ignoring terms with smaller exponent was good, too.

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

$_ = '456476617264204d616a616b6172692c20612043687269737 469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";
Jul 19 '05 #23
Scott David Daniels wrote:
Again polynomial, not exponential time. Note that there is no
polynomial time algorithm with (k < 1), since it takes O(n) time
to read the problem.


Being a stickler (I develop software after all :) quantum computers
can do better than that. For example, Grover's algorithm
http://en.wikipedia.org/wiki/Grover%27s_algorithm
for searching an unsorted list solves the problem in O(N**0.5) time.

Being even more picky, I think the largest N that's been tested
so far is on the order of 5.

Andrew
da***@dalkescientific.com

Jul 19 '05 #24
> One reasonable solution might be as follows:

def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p[0]) for (x, p) in elts.iteritems()
if len(p) == 1 ]


Minor tweak to conserve space:

def bachelor_filter(iter_over_hashables):
B={}
for index, elem in enumerate(iter_over_hashables):
if B.setdefault(elem, index) != index:
B[elem]=None
return [pair for pair in B.items() if pair[1] is not None]

Jul 19 '05 #25

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: kevin parks | last post by:
hi. I've been banging my head against this one a while and have asked around, and i am throwing this one out there in the hopes that some one can shed some light on what has turned out to be a...
15
by: les_ander | last post by:
Hi, I have many set objects some of which can contain same group of object while others can be subset of the other. Given a list of sets, I need to get a list of unique sets such that non of the...
1
by: James Stroud | last post by:
On Monday 09 May 2005 03:15 pm, superprad@gmail.com wrote: > Is there an easy way to grab the Unique elements from a list? from sets import Set data = --
22
by: Claudio Jolowicz | last post by:
Is it possible to store unique objects in an STL container? Suppose an object of class C is unique: class C { public: C() {} ~C() {} private:
5
by: Paulers | last post by:
Hello all, I have a string array with duplicate elements. I need to create a new string array containing only the unique elements. Is there an easy way to do this? I have tried looping through...
9
by: Brian Tkatch | last post by:
I'm looking for a simple way to unique an array of strings. I came up with this. Does it make sense? Am i missing anything? (Testing seems to show it to work.) Public Function Unique(ByVal...
6
by: Tekkaman | last post by:
I have a list of lists and I want to define an iterator (let's call that uniter) over all unique elements, in any order. For example, calling: sorted(uniter(, , ])) must return . I tried the...
1
by: Asko Telinen | last post by:
Hi all. I´m a bit newbie writing xml schemas. Is it possible to define xml element that must have unique attribute values in same level. For example if i have a xml - document: <list>...
5
by: =?Utf-8?B?UGF1bA==?= | last post by:
Hi I have a list of type object. The object has an ID as one of the elements and I would like to create another list that just has objects with unique IDs. For example in the list if I have...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.