472,129 Members | 1,615 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,129 software developers and data experts.

Convert list to dictionary problem

GTXY20
29
Hi,

I have the following in a text file:

1,a
1,b
1,b
2,a
2,c
2,a
2,c
etc....

I have the following code to open the text file create a list from the data inside. I am trying to create a dictionary like:

{[1:a], [1:b], [1:b], [2:a], [2:c], [2:a], [2:c]}

I am using the following:
Expand|Select|Wrap|Line Numbers
  1. infile = open('input.txt', 'r')
  2. records = infile.readlines()
  3. infile.close()
  4. records = [s.replace('\n', '') for s in records]
  5. finalrecords = map(string.split() ,records)
  6.  
However I keep getting the following error:

"pythontest.py", line 5, in <module>
finalrecords = map(string.split() ,records)
NameError: name 'string' is not defined

Any advice - also moving forward I would like to create from the dictionary a count associated with each unique instance of a key:value relationship so using the above data I would write to a file:

KEY UNIQUE INSTANCES
1 2 (sum for unique key value instance 1:a and 1:b)
2 2 (sum for unique key value instance 2:a and 2:c)

I can do this in SQL but would prefer to do in python for speed and flexibility with computations.

Any advice is greatly appreciated.

GTXY20
Oct 1 '07 #1
15 17519
ghostdog74
511 Expert 256MB
string.split is deprecated.
use <string>.split() instead.
eg
Expand|Select|Wrap|Line Numbers
  1. s = "test , test1"
  2. s.split()
  3.  
by the way, you can't create dictionary will same key. dictionary keys should be unique.
Oct 1 '07 #2
GTXY20
29
Thanks - if they need to be unique how do i import so that I keep the unique key but assign the multiple associated values so that I get:

{[1:a,b], [2:a,c]}

thanks again..
Oct 1 '07 #3
bartonc
6,596 Expert 4TB
My friend ghostdog74 is correct. Given that data, you'd end up with a very small dictionary:
Expand|Select|Wrap|Line Numbers
  1. >>> records = '1,a\n1,b\n1,b\n2,a\n2,c\n2,a\n2,c' # often missing the last newline
  2. >>> lines = records.split()
  3. >>> lines
  4. ['1,a', '1,b', '1,b', '2,a', '2,c', '2,a', '2,c']
  5. >>> dd = dict((key, value) for key, value in (line.split(',') for line in lines))
  6. >>> dd
  7. {'1': 'b', '2': 'c'}
  8. >>> 
Oct 1 '07 #4
GTXY20
29
OK - if i just run the first part:

infile = open('input.txt', 'r')
records = infile.readlines()
infile.close()
records

['1,a\n', '1,b\n', '1,c\n', '1,a\n', '1,c\n', '1,a\n', '1,b\n', '2,a\n', '2,b\n', '2,c\n', '2,a\n', '3,c\n', '3,a\n', '3,b\n', '4,a\n', '4,a\n', '4,c\n', '4,c\n']

so when I try and:

lines = records.split

I am thrown:

Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'list' object has no attribute 'split'

I think it is becauase records is:

['1,a\n', '1,b\n', '1,c\n', '1,a\n', '1,c\n', '1,a\n', '1,b\n', '2,a\n', '2,b\n', '2,c\n', '2,a\n', '3,c\n', '3,a\n', '3,b\n', '4,a\n', '4,a\n', '4,c\n', '4,c\n']

and not:

'1,a\n1,b\n1,b\n2,a\n2,c\n2,a\n2,c'

How can I open a text file and store records as above not using readlines.

As for teh dictionary I would like to have it so that I get:

{[1:a,b], [2:a,c]}

Any ideas - sorry new to Python and used to just working in SQL.

G.
Oct 1 '07 #5
bartonc
6,596 Expert 4TB
Thanks - if they need to be unique how do i import so that I keep the unique key but assign the multiple associated values so that I get:

{[1:a,b], [2:a,c]}

thanks again..
Expand|Select|Wrap|Line Numbers
  1. >>> records = '1,a\n1,b\n1,b\n2,a\n2,c\n2,a\n2,c' # often missing the last newline
  2. >>> lines = records.split()
  3. >>> lines
  4. ['1,a', '1,b', '1,b', '2,a', '2,c', '2,a', '2,c']
  5. >>> dd = {}
  6. >>> for line in lines:
  7. ...     key, value = line.split(',')
  8. ...     if key in dd:
  9. ...         oldvalue = dd[key]
  10. ...         if value not in oldvalue:
  11. ...             dd[key] = '%s,%s' %(oldvalue, value)
  12. ...     else:
  13. ...         dd[key] = value
  14. ...         
  15. >>> dd
  16. {'1': 'a,b', '2': 'a,c'}
  17. >>> 
Oct 1 '07 #6
bartonc
6,596 Expert 4TB
OK - if i just run the first part:

infile = open('input.txt', 'r')
records = infile.readlines()
infile.close()
records
Use
Expand|Select|Wrap|Line Numbers
  1. infile.read()
instead.
Oct 1 '07 #7
bartonc
6,596 Expert 4TB
Use
Expand|Select|Wrap|Line Numbers
  1. infile.read()
instead.
Even better: Use a tuple in the value:
Expand|Select|Wrap|Line Numbers
  1. >>> records = '1,a\n1,b\n1,b\n2,a\n2,c\n2,a\n2,c' # often missing the last newline
  2. >>> lines = records.split()
  3. >>> lines
  4. ['1,a', '1,b', '1,b', '2,a', '2,c', '2,a', '2,c']
  5. >>> dd = {}
  6. >>> for line in lines:
  7. ...     key, value = line.split(',')
  8. ...     if key in dd:
  9. ...         oldvalue = dd[key]
  10. ...         if value not in oldvalue:
  11. ...             dd[key] = oldvalue + (value,) # tuple addition
  12. ...     else:
  13. ...         dd[key] = (value,) # a tuple of one
  14. ...         
  15. >>> dd
  16. {'1': ('a', 'b'), '2': ('a', 'c')}
  17. >>> 
This allows any type a conversion on the text prior to being stored.
Oct 1 '07 #8
GTXY20
29
This is perfect!!!

I assume you can also sort the values so that values would always start like a,b,c or a,cor a,b depending on the value?

Finally I need to do two more things:

1. If I wanted to list the quantity of unique value combinations based on keys within a dictionary so for example I have the following dictionary:

{'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}

I would need:

QTY VALUE COMBINATION
3 a,b,c
1 a,c

2. Get the total number of values for a key:

{'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}

I would need:

KEY NUMBER OF VALUES
1 3
3 3
2 3
4 2

Thank you so much this is so helpful and incredibly more efficient than using SQL and VB to come up with. Do you know if there are any size limitations of a dictionary in python - I am thinking I may eventually have 2 million keys with a variety of values (average of about 5 values per key).

G.
Oct 1 '07 #9
bartonc
6,596 Expert 4TB
This is perfect!!!

I assume you can also sort the values so that values would always start like a,b,c or a,cor a,b depending on the value?

Finally I need to do two more things:

1. If I wanted to list the quantity of unique value combinations based on keys within a dictionary so for example I have the following dictionary:

{'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}

I would need:

QTY VALUE COMBINATION
3 a,b,c
1 a,c

2. Get the total number of values for a key:

{'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}

I would need:

KEY NUMBER OF VALUES
1 3
3 3
2 3
4 2

Thank you so much this is so helpful and incredibly more efficient than using SQL and VB to come up with. Do you know if there are any size limitations of a dictionary in python - I am thinking I may eventually have 2 million keys with a variety of values (average of about 5 values per key).

G.
In order to sort, you'll need a list in the value:
Expand|Select|Wrap|Line Numbers
  1. >>> records = '1,b\n1,a\n1,b\n2,c\n2,a\n2,a\n2,c' # reordered elements
  2. >>> lines = records.split()
  3. >>> lines
  4. ['1,b', '1,a', '1,b', '2,c', '2,a', '2,a', '2,c']
  5. >>> dd = {}
  6. >>> for line in lines:
  7. ...     key, value = line.split(',')
  8. ...     if key in dd:
  9. ...         valueList = dd[key]
  10. ...         if value not in valueList:
  11. ...             valueList.append(value)
  12. ...     else:
  13. ...         dd[key] = [value] # a list of one
  14. ...         
  15. >>> dd
  16. {'1': ['b', 'a'], '2': ['c', 'a']}
  17. >>> for key, valueList in dd.items():
  18. ...     valueList.sort()
  19. ...     
  20. >>> dd
  21. {'1': ['a', 'b'], '2': ['a', 'c']}
Since dictionaries are not ordered containers, you'll want to work with a sorted() list of its keys:
Expand|Select|Wrap|Line Numbers
  1. >>> for key in sorted(dd.keys()):
  2. ...     print key, len(dd[key])
  3. ...     
  4. 1 2
  5. 2 2
  6. >>> 
Size limit, huh? With Python, memory is usually the limiting factor (as in (L)ong integers, which can contain a single value large enough to fill available memory - try it sometime!).
Oct 1 '07 #10
GTXY20
29
I was able to sort by KEY with the following:

sorted(dd.items(), key=lambda(k,v):(v,k))
Oct 1 '07 #11
bartonc
6,596 Expert 4TB
I was able to sort by KEY with the following:

sorted(dd.items(), key=lambda(k,v):(v,k))
I though that
Expand|Select|Wrap|Line Numbers
  1. sorted(dd.keys())
would be sufficient.

Your way:
Expand|Select|Wrap|Line Numbers
  1. >>> sorted(dd.items(), key=lambda(k,v):(v,k))
  2. [('1', ['a', 'b']), ('2', ['a', 'c'])]
  3. >>> 
actually creates a list of tuples with one tuple for each entry in the dictionary.


PS: It's actually a rule on this site that you use the [code] tags around your code, as instructed on the right hand side of the page when posting or replying.
Oct 1 '07 #12
GTXY20
29
Thanks again - point taken about the code tags I will do this moving forward - too excited about this working out and got caught up with everything.
Oct 1 '07 #13
GTXY20
29
Hi there,

Sorry for all the questions - this is enligtening...

Any way to display the count of the values in the values list so here is my dictionary:

Expand|Select|Wrap|Line Numbers
  1. {'1': ['a', 'b', 'c'], '3': ['a', 'b', 'c'], '2': ['a', 'b', 'c'], '4': ['a', 'c']}
I would like to display count as follows and I would not know all the values in the values list:

Value QTY
a 4
b 3
c 4

Also is there anyway to display the count of the values list combinations so here again is my dictionary:

Expand|Select|Wrap|Line Numbers
  1. {'1': ['a', 'b', 'c'], '3': ['a', 'b', 'c'], '2': ['a', 'b', 'c'], '4': ['a', 'c']}
And I would like to display as follows

QTY Value List Combination
3 a,b,c
1 a,c

Once again all help is much appreciated.

G.
Oct 1 '07 #14
bartonc
6,596 Expert 4TB
Here's a neat trick that will give you a place to start:
Expand|Select|Wrap|Line Numbers
  1.  
  2. >>> dd = {'1': ['a', 'b', 'c'], '3': ['a', 'b', 'c'], '2': ['a', 'b', 'c'], '4': ['a', 'c']}
  3. >>> uniques = set(tuple(value) for key, value in dd.items())
  4. >>> uniques
  5. set([('a', 'b', 'c'), ('a', 'c')])
  6. >>> 
Then, for the last part, use list.count() on a list of values:
Expand|Select|Wrap|Line Numbers
  1. >>> all = [tuple(value) for key, value in dd.items()]
  2. >>> all
  3. [('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'c')]
  4. >>> for item in uniques:
  5. ...     print item, all.count(item)
  6. ...     
  7. ('a', 'b', 'c') 3
  8. ('a', 'c') 1
  9. >>> 
Oct 1 '07 #15
bartonc
6,596 Expert 4TB
Here's a neat trick that will give you a place to start:
Expand|Select|Wrap|Line Numbers
  1.  
  2. >>> dd = {'1': ['a', 'b', 'c'], '3': ['a', 'b', 'c'], '2': ['a', 'b', 'c'], '4': ['a', 'c']}
  3. >>> uniques = set(tuple(value) for key, value in dd.items())
  4. >>> uniques
  5. set([('a', 'b', 'c'), ('a', 'c')])
  6. >>> 
Then, for the last part, use list.count() on a list of values:
Expand|Select|Wrap|Line Numbers
  1. >>> all = [tuple(value) for key, value in dd.items()]
  2. >>> all
  3. [('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'c')]
  4. >>> for item in uniques:
  5. ...     print item, all.count(item)
  6. ...     
  7. ('a', 'b', 'c') 3
  8. ('a', 'c') 1
  9. >>> 
And this may just do the first part nicely:
Expand|Select|Wrap|Line Numbers
  1. >>> uniques = list(uniques)
  2. >>> uniques
  3. [('a', 'b', 'c'), ('a', 'c')]
  4. >>> # Assumes only two results above! Needs work for a longer list!
  5. >>> bits = set.union(set(uniques[0]), set(uniques[1]))
  6. >>> bits
  7. set(['a', 'c', 'b'])
  8. >>> counts = [0 for i in range(len(bits))]
  9. >>> counts
  10. [0, 0, 0]
  11. >>> for item in all:
  12. ...     for i, bit in enumerate(bits):
  13. ...         if bit in item:
  14. ...             counts[i] += 1
  15. ...             
  16. >>> zip(bits, counts)
  17. [('a', 4), ('c', 4), ('b', 3)]
  18. >>> 
Oct 1 '07 #16

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

6 posts views Thread by Byron | last post: by
2 posts views Thread by Tom Grove | last post: by
10 posts views Thread by Ben | last post: by
3 posts views Thread by Ameet Nanda | last post: by
35 posts views Thread by Abandoned | last post: by
3 posts views Thread by =?Utf-8?B?YW1pcg==?= | last post: by
3 posts views Thread by bruce | last post: by
2 posts views Thread by Terry Reedy | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.