Hi,
I have the following in a text file:
1,a
1,b
1,b
2,a
2,c
2,a
2,c
etc....
I have the following code to open the text file create a list from the data inside. I am trying to create a dictionary like:
{[1:a], [1:b], [1:b], [2:a], [2:c], [2:a], [2:c]}
I am using the following: -
infile = open('input.txt', 'r')
-
records = infile.readlines()
-
infile.close()
-
records = [s.replace('\n', '') for s in records]
-
finalrecords = map(string.split() ,records)
-
However I keep getting the following error:
"pythontest.py", line 5, in <module>
finalrecords = map(string.split() ,records)
NameError: name 'string' is not defined
Any advice - also moving forward I would like to create from the dictionary a count associated with each unique instance of a key:value relationship so using the above data I would write to a file:
KEY UNIQUE INSTANCES
1 2 (sum for unique key value instance 1:a and 1:b)
2 2 (sum for unique key value instance 2:a and 2:c)
I can do this in SQL but would prefer to do in python for speed and flexibility with computations.
Any advice is greatly appreciated.
GTXY20
15 17519
string.split is deprecated.
use <string>.split() instead.
eg -
s = "test , test1"
-
s.split()
-
by the way, you can't create dictionary will same key. dictionary keys should be unique.
Thanks - if they need to be unique how do i import so that I keep the unique key but assign the multiple associated values so that I get:
{[1:a,b], [2:a,c]}
thanks again..
My friend ghostdog74 is correct. Given that data, you'd end up with a very small dictionary: -
>>> records = '1,a\n1,b\n1,b\n2,a\n2,c\n2,a\n2,c' # often missing the last newline
-
>>> lines = records.split()
-
>>> lines
-
['1,a', '1,b', '1,b', '2,a', '2,c', '2,a', '2,c']
-
>>> dd = dict((key, value) for key, value in (line.split(',') for line in lines))
-
>>> dd
-
{'1': 'b', '2': 'c'}
-
>>>
OK - if i just run the first part:
infile = open('input.txt', 'r')
records = infile.readlines()
infile.close()
records
['1,a\n', '1,b\n', '1,c\n', '1,a\n', '1,c\n', '1,a\n', '1,b\n', '2,a\n', '2,b\n', '2,c\n', '2,a\n', '3,c\n', '3,a\n', '3,b\n', '4,a\n', '4,a\n', '4,c\n', '4,c\n']
so when I try and:
lines = records.split
I am thrown:
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'list' object has no attribute 'split'
I think it is becauase records is:
['1,a\n', '1,b\n', '1,c\n', '1,a\n', '1,c\n', '1,a\n', '1,b\n', '2,a\n', '2,b\n', '2,c\n', '2,a\n', '3,c\n', '3,a\n', '3,b\n', '4,a\n', '4,a\n', '4,c\n', '4,c\n']
and not:
'1,a\n1,b\n1,b\n2,a\n2,c\n2,a\n2,c'
How can I open a text file and store records as above not using readlines.
As for teh dictionary I would like to have it so that I get:
{[1:a,b], [2:a,c]}
Any ideas - sorry new to Python and used to just working in SQL.
G.
Thanks - if they need to be unique how do i import so that I keep the unique key but assign the multiple associated values so that I get:
{[1:a,b], [2:a,c]}
thanks again..
-
>>> records = '1,a\n1,b\n1,b\n2,a\n2,c\n2,a\n2,c' # often missing the last newline
-
>>> lines = records.split()
-
>>> lines
-
['1,a', '1,b', '1,b', '2,a', '2,c', '2,a', '2,c']
-
>>> dd = {}
-
>>> for line in lines:
-
... key, value = line.split(',')
-
... if key in dd:
-
... oldvalue = dd[key]
-
... if value not in oldvalue:
-
... dd[key] = '%s,%s' %(oldvalue, value)
-
... else:
-
... dd[key] = value
-
...
-
>>> dd
-
{'1': 'a,b', '2': 'a,c'}
-
>>>
OK - if i just run the first part:
infile = open('input.txt', 'r')
records = infile.readlines()
infile.close()
records
Use
instead.
Use
instead.
Even better: Use a tuple in the value: -
>>> records = '1,a\n1,b\n1,b\n2,a\n2,c\n2,a\n2,c' # often missing the last newline
-
>>> lines = records.split()
-
>>> lines
-
['1,a', '1,b', '1,b', '2,a', '2,c', '2,a', '2,c']
-
>>> dd = {}
-
>>> for line in lines:
-
... key, value = line.split(',')
-
... if key in dd:
-
... oldvalue = dd[key]
-
... if value not in oldvalue:
-
... dd[key] = oldvalue + (value,) # tuple addition
-
... else:
-
... dd[key] = (value,) # a tuple of one
-
...
-
>>> dd
-
{'1': ('a', 'b'), '2': ('a', 'c')}
-
>>>
This allows any type a conversion on the text prior to being stored.
This is perfect!!!
I assume you can also sort the values so that values would always start like a,b,c or a,cor a,b depending on the value?
Finally I need to do two more things:
1. If I wanted to list the quantity of unique value combinations based on keys within a dictionary so for example I have the following dictionary:
{'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}
I would need:
QTY VALUE COMBINATION
3 a,b,c
1 a,c
2. Get the total number of values for a key:
{'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}
I would need:
KEY NUMBER OF VALUES
1 3
3 3
2 3
4 2
Thank you so much this is so helpful and incredibly more efficient than using SQL and VB to come up with. Do you know if there are any size limitations of a dictionary in python - I am thinking I may eventually have 2 million keys with a variety of values (average of about 5 values per key).
G.
This is perfect!!!
I assume you can also sort the values so that values would always start like a,b,c or a,cor a,b depending on the value?
Finally I need to do two more things:
1. If I wanted to list the quantity of unique value combinations based on keys within a dictionary so for example I have the following dictionary:
{'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}
I would need:
QTY VALUE COMBINATION
3 a,b,c
1 a,c
2. Get the total number of values for a key:
{'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}
I would need:
KEY NUMBER OF VALUES
1 3
3 3
2 3
4 2
Thank you so much this is so helpful and incredibly more efficient than using SQL and VB to come up with. Do you know if there are any size limitations of a dictionary in python - I am thinking I may eventually have 2 million keys with a variety of values (average of about 5 values per key).
G.
In order to sort, you'll need a list in the value: -
>>> records = '1,b\n1,a\n1,b\n2,c\n2,a\n2,a\n2,c' # reordered elements
-
>>> lines = records.split()
-
>>> lines
-
['1,b', '1,a', '1,b', '2,c', '2,a', '2,a', '2,c']
-
>>> dd = {}
-
>>> for line in lines:
-
... key, value = line.split(',')
-
... if key in dd:
-
... valueList = dd[key]
-
... if value not in valueList:
-
... valueList.append(value)
-
... else:
-
... dd[key] = [value] # a list of one
-
...
-
>>> dd
-
{'1': ['b', 'a'], '2': ['c', 'a']}
-
>>> for key, valueList in dd.items():
-
... valueList.sort()
-
...
-
>>> dd
-
{'1': ['a', 'b'], '2': ['a', 'c']}
Since dictionaries are not ordered containers, you'll want to work with a sorted() list of its keys: -
>>> for key in sorted(dd.keys()):
-
... print key, len(dd[key])
-
...
-
1 2
-
2 2
-
>>>
Size limit, huh? With Python, memory is usually the limiting factor (as in (L)ong integers, which can contain a single value large enough to fill available memory - try it sometime!).
I was able to sort by KEY with the following:
sorted(dd.items(), key=lambda(k,v):(v,k))
I was able to sort by KEY with the following:
sorted(dd.items(), key=lambda(k,v):(v,k))
I though that
would be sufficient.
Your way: -
>>> sorted(dd.items(), key=lambda(k,v):(v,k))
-
[('1', ['a', 'b']), ('2', ['a', 'c'])]
-
>>>
actually creates a list of tuples with one tuple for each entry in the dictionary.
PS: It's actually a rule on this site that you use the [code] tags around your code, as instructed on the right hand side of the page when posting or replying.
Thanks again - point taken about the code tags I will do this moving forward - too excited about this working out and got caught up with everything.
Hi there,
Sorry for all the questions - this is enligtening...
Any way to display the count of the values in the values list so here is my dictionary: - {'1': ['a', 'b', 'c'], '3': ['a', 'b', 'c'], '2': ['a', 'b', 'c'], '4': ['a', 'c']}
I would like to display count as follows and I would not know all the values in the values list:
Value QTY
a 4
b 3
c 4
Also is there anyway to display the count of the values list combinations so here again is my dictionary: - {'1': ['a', 'b', 'c'], '3': ['a', 'b', 'c'], '2': ['a', 'b', 'c'], '4': ['a', 'c']}
And I would like to display as follows
QTY Value List Combination
3 a,b,c
1 a,c
Once again all help is much appreciated.
G.
Here's a neat trick that will give you a place to start: -
-
>>> dd = {'1': ['a', 'b', 'c'], '3': ['a', 'b', 'c'], '2': ['a', 'b', 'c'], '4': ['a', 'c']}
-
>>> uniques = set(tuple(value) for key, value in dd.items())
-
>>> uniques
-
set([('a', 'b', 'c'), ('a', 'c')])
-
>>>
Then, for the last part, use list.count() on a list of values: -
>>> all = [tuple(value) for key, value in dd.items()]
-
>>> all
-
[('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'c')]
-
>>> for item in uniques:
-
... print item, all.count(item)
-
...
-
('a', 'b', 'c') 3
-
('a', 'c') 1
-
>>>
Here's a neat trick that will give you a place to start: -
-
>>> dd = {'1': ['a', 'b', 'c'], '3': ['a', 'b', 'c'], '2': ['a', 'b', 'c'], '4': ['a', 'c']}
-
>>> uniques = set(tuple(value) for key, value in dd.items())
-
>>> uniques
-
set([('a', 'b', 'c'), ('a', 'c')])
-
>>>
Then, for the last part, use list.count() on a list of values: -
>>> all = [tuple(value) for key, value in dd.items()]
-
>>> all
-
[('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'c')]
-
>>> for item in uniques:
-
... print item, all.count(item)
-
...
-
('a', 'b', 'c') 3
-
('a', 'c') 1
-
>>>
And this may just do the first part nicely: -
>>> uniques = list(uniques)
-
>>> uniques
-
[('a', 'b', 'c'), ('a', 'c')]
-
>>> # Assumes only two results above! Needs work for a longer list!
-
>>> bits = set.union(set(uniques[0]), set(uniques[1]))
-
>>> bits
-
set(['a', 'c', 'b'])
-
>>> counts = [0 for i in range(len(bits))]
-
>>> counts
-
[0, 0, 0]
-
>>> for item in all:
-
... for i, bit in enumerate(bits):
-
... if bit in item:
-
... counts[i] += 1
-
...
-
>>> zip(bits, counts)
-
[('a', 4), ('c', 4), ('b', 3)]
-
>>>
Post your reply Sign in to post your reply or Sign up for a free account.
Similar topics
6 posts
views
Thread by Byron |
last post: by
|
2 posts
views
Thread by Tom Grove |
last post: by
|
6 posts
views
Thread by buzzweetman |
last post: by
|
10 posts
views
Thread by Ben |
last post: by
|
3 posts
views
Thread by Ameet Nanda |
last post: by
|
35 posts
views
Thread by Abandoned |
last post: by
|
3 posts
views
Thread by =?Utf-8?B?YW1pcg==?= |
last post: by
|
3 posts
views
Thread by bruce |
last post: by
|
2 posts
views
Thread by Terry Reedy |
last post: by
| | | | | | | | | | |