473,320 Members | 2,097 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Searching for uniqness in a list of data

Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want. I have the questioning part under control. I can't seem to get
my hands around the logic - the list could be 2 items or 100 long. The
point of this is that I am trying to narrow a decision down for an end
user. In other words the end user needs to select one of the list
items, and by breaking it down for them I hope to simplify this.

list=['1p2m_3.3-1.8v_sal_ms','1p6m_3.3-1.8_sal_log']
would only question the first data set ['1p2m', '1p6m' ]

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
If on the list ['1p2m','1p2m','1p3m'] the user selected 1p2m then the
next list would only be ['sal','pol']
but if the user initially only selected 1p3m they would be done..

I hope this clarifies what I am trying to do. I just can't seem to get
my hands around this - so an explaination of logic would really be
helpfull. I picture a 2d list but I can't seem to get it..

Mar 1 '06 #1
6 1584
rh0dium wrote:
Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want. I have the questioning part under control. I can't seem to get
my hands around the logic - the list could be 2 items or 100 long. The
point of this is that I am trying to narrow a decision down for an end
user. In other words the end user needs to select one of the list
items, and by breaking it down for them I hope to simplify this.

list=['1p2m_3.3-1.8v_sal_ms','1p6m_3.3-1.8_sal_log']
would only question the first data set ['1p2m', '1p6m' ]

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
If on the list ['1p2m','1p2m','1p3m'] the user selected 1p2m then the
next list would only be ['sal','pol']
but if the user initially only selected 1p3m they would be done..

I hope this clarifies what I am trying to do. I just can't seem to get
my hands around this - so an explaination of logic would really be
helpfull. I picture a 2d list but I can't seem to get it..

<code>
list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
dictQlevel_1={}
dictQlevel_2={}
dictQlevel_3={}
for item in list:
splitted = item.split('_')
dictQlevel_1[splitted[0]] = True
dictQlevel_2[splitted[1]] = True
dictQlevel_3[splitted[2]] = True

print 'choose one of: '
for key_1 in dictQlevel_1.keys():
print key_1
print
usrInput = raw_input()

if usrInput == '':
print 'choose one of: '
for key_1 in dictQlevel_1.keys():
for key_2 in dictQlevel_2.keys():
print key_1, key_2
print
usrInput = raw_input()
else:
pass
# or do something

# etc.
</code>

Hope it is what you are looking for.

Claudio
Mar 1 '06 #2
"rh0dium" <st**********@gmail.com> writes:
Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want. I have the questioning part under control. I can't seem to get
my hands around the logic - the list could be 2 items or 100 long. The
point of this is that I am trying to narrow a decision down for an end
user. In other words the end user needs to select one of the list
items, and by breaking it down for them I hope to simplify this.

list=['1p2m_3.3-1.8v_sal_ms','1p6m_3.3-1.8_sal_log']
would only question the first data set ['1p2m', '1p6m' ]

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
If on the list ['1p2m','1p2m','1p3m'] the user selected 1p2m then the
next list would only be ['sal','pol']
but if the user initially only selected 1p3m they would be done..

I hope this clarifies what I am trying to do. I just can't seem to get
my hands around this - so an explaination of logic would really be
helpfull. I picture a 2d list but I can't seem to get it..


The easiest way to do this is to have a nested dictionary of prefixes: for
each prefix as key add a nested dictionary of the rest of the split as value
or an empty dict if the split is empty. Accessing the dict with an userinput
will give you all the possible next choices.

Spoiler Warning -- sample implementation follows below.










(mostly untested)

def addSplit(d, split):
if len(split):
if split[0] not in d:
d[split[0]] = addSplit({}, split[1:])
else:
addSplit(d[split[0]], split[1:])
return d
def queryUser(chosen, choices):
next = raw_input('So far: %s\nNow type one of %s: ' %
(chosen,choices.keys()))
return chosen+next, choices[next]
wordList=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
choices = reduce(addSplit,(s.split('_') for s in wordList), {})
chosen = ""
while choices:
chosen, choices = queryUser(chosen, choices)
print "You chose:", chosen

'as
Mar 1 '06 #3
You can come quite close to what you want without splitting the string
at all. It sounds like you are asking the user to build up a string,
and you want to keep checking through your list to find any items that
begin with the string built up by the user. Try something like this:

mylist = ['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']
sofar = ""

loop = True
while loop:
selections = [ x[len(sofar):x.index("_", len(sofar) + 1)]
for x in mylist if x.startswith(sofar) ]
loop = len(selections) > 1
if loop:
print selections
sofar += raw_input("Pick one of those: ")

Mar 1 '06 #4
Alexander Schmolck <a.********@gmail.com> writes:
The easiest way to do this is to have a nested dictionary of prefixes: for
each prefix as key add a nested dictionary of the rest of the split as value
or an empty dict if the split is empty. Accessing the dict with an userinput
will give you all the possible next choices.


Oops I was reading this too hastily -- forgot to compact and take care of sep.
You might also want to google 'trie', BTW.
(again, not really tested)
def addSplit(d, split):
if len(split):
if split[0] not in d:
d[split[0]] = addSplit({}, split[1:])
else:
addSplit(d[split[0]], split[1:])
return d
def compactify(choices, parentKey='', sep=''):
if len(choices) == 1:
return compactify(choices.values()[0],
parentKey+sep+choices.keys()[0], sep)
else:
for key in choices.keys():
newKey, newValue = compactify(choices[key], key, sep)
if newKey != key: del choices[key]
choices[newKey] = newValue
return (parentKey, choices)
def queryUser(chosen, choices, sep=''):
next = raw_input('So far: %s\nNow type one of %s: ' %
(chosen,choices.keys()))
return chosen+sep+next, choices[next]
wordList=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
choices = compactify(reduce(addSplit,(s.split('_') for s in wordList), {}),
sep='_')[1]
chosen = ""

while choices:
chosen, choices = queryUser(chosen, choices, '_')
print "You chose:", chosen
Mar 1 '06 #5
"rh0dium" <st**********@gmail.com> wrote in message
news:11********************@j33g2000cwa.googlegrou ps.com...
Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want.

<snip>

Check out difflib.
data=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']
data[0].split("_") ['1p2m', '3.3-1.8v', 'sal', 'ms'] data[1].split("_") ['1p2m', '3.3-1.8', 'sal', 'log'] from difflib import SequenceMatcher
s = SequenceMatcher(None, data[0].split("_"), data[1].split("_"))
s.matching_blocks

[(0, 0, 1), (2, 2, 1), (4, 4, 0)]

I believe one interprets the tuples in matching_blocks as:
(seq1index,seq2index,numberOfMatchingItems)

In your case, the sequences have a matching element 0 and matching element
2, each of length 1. I don't fully grok the meaning of the (4,4,0) tuple,
unless this is intended to show that both sequences have the same length.

Perhaps from here, you could locate the gaps in the
SequenceMatcher.matching_blocks property, and prompt for the user's choice.

-- Paul
Mar 1 '06 #6
Claudio Grondi a écrit :
(snip)
<code>
list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']


Avoid using 'list' as an identifier.

(snip)

Mar 1 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Daniel Pryde | last post by:
Hi everyone. I was wondering if anyone might be able to help me out here. I'm currently looking to find the quickest way to find a best fit match in a large array. My problem is that I have an...
2
by: Kakarot | last post by:
I'm gona be very honest here, I suck at programming, *especially* at C++. It's funny because I actually like the idea of programming ... normally what I like I'm atleast decent at. But C++ is a...
3
by: aaj | last post by:
This is a simple question compared to some of the stuff that gets asked (and answered) here, but sometimes its easy to over look the simpler things. I've been working with databases for a few...
6
by: Jonathan | last post by:
I am hoping that someone more experienced than myself can point me towards what might be the fastest data lookup method to use for storing ip addresses. My situation is that I will need to maintain...
33
by: Geoff Jones | last post by:
Hiya I have a DataTable containing thousands of records. Each record has a primary key field called "ID" and another field called "PRODUCT" I want to retrieve the rows that satisy the following...
20
by: Seongsu Lee | last post by:
Hi, I have a dictionary with million keys. Each value in the dictionary has a list with up to thousand integers. Follow is a simple example with 5 keys. dict = {1: , 2: , 900000: , 900001:...
12
by: Alexnb | last post by:
This is similar to my last post, but a little different. Here is what I would like to do. Lets say I have a text file. The contents look like this, only there is A LOT of the same thing. () A...
5
by: lemlimlee | last post by:
hello, this is the task i need to do: For this task, you are to develop a Java program that allows a user to search or sort an array of numbers using an algorithm that the user chooses. The...
4
by: aaronkmar | last post by:
Hello Bytes, I hope this post finds you well on this wonderful Friday! I've been kicking this code around for over a week now and cannot seem to find the correct syntax to handle all of the...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.