By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,427 Members | 1,379 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,427 IT Pros & Developers. It's quick & easy.

Iterating over a string

P: 19
Hi everybody,

Does anyone know if the adict.has_key(k) command can be used to match a string against a dictionary key? I'm trying to append a value from my dictionary to a string when it is found.

String example:

browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21

Dictionary example:

'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU'
'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU'

What I want:

browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
etc.

Thanks,

Mark
Aug 3 '07 #1
Share this Question
Share on Google+
13 Replies


ilikepython
Expert 100+
P: 844
Hi everybody,

Does anyone know if the adict.has_key(k) command can be used to match a string against a dictionary key? I'm trying to append a value from my dictionary to a string when it is found.

String example:

browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21

Dictionary example:

'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU'
'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU'

What I want:

browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
etc.

Thanks,

Mark
I'm not sure if this is exactly what you need:
Expand|Select|Wrap|Line Numbers
  1. import re
  2. patt = re.compile("Musmusculuslet-..")
  3.  
  4. teststr = "browser details Musmusculuslet-7g    21     1    21    21 100.0%    22   +   46884872  46884892     21"
  5. match = patt.findall(teststr)
  6.  
  7. if match:
  8.     if adict.has_key(match[0]):
  9.         ind = teststr.index(match[0])
  10.         finalstring = "%s%s%s" % (teststr[:ind+len(match[0])], adict[match[0]], teststr[ind+len(match[0]):])
  11.  
Aug 4 '07 #2

bvdet
Expert Mod 2.5K+
P: 2,851
Hi everybody,

Does anyone know if the adict.has_key(k) command can be used to match a string against a dictionary key? I'm trying to append a value from my dictionary to a string when it is found.

String example:

browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21

Dictionary example:

'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU'
'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU'

What I want:

browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
etc.

Thanks,

Mark
Following are a couple of ways:
Expand|Select|Wrap|Line Numbers
  1. print dd
  2. import re
  3.  
  4. s1 = "browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21"
  5. patt = re.compile(r'Musmusculuslet-[0-9a-z]+|MusmusculusmiR-\d+')
  6. strList = patt.findall(s1)
  7. s2 = s1
  8. for item in strList:
  9.     if dd.has_key(item):
  10.         s2 = s2.replace(item, '%s %s' % (item, dd[item]))
  11.  
  12. print s2
  13.  
  14. print
  15.  
  16. s3 = s1
  17. for key in dd:
  18.     if key in s3:
  19.         s3 = s3.replace(key, '%s %s' % (key, dd[key]))
  20.  
  21. print s3
Output:
>>> {'MusmusculusmiR-1': 'UGGAAUGUAAAGAAGUAUGUA', 'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU', 'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU'}
browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i UGAGGUAGUAGUUUGUGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21

browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i UGAGGUAGUAGUUUGUGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21
>>>
Aug 4 '07 #3

P: 19
Thanks for the help ilikepython and bvdet. I'm running into only one problem. I am getting multiple matches for certain strings, e.g. the key
MusmusculusmiR-1 also matches with MusmusculusmiR-146b, so I get the following output:

browser details MusmusculusmiR-1 UGGAAUGUAAAGAAGUAUGUA46b UGAGAACUGAAUUCCAUAGGCU 22 1 22 22 100.0% 26 - 20924724 20924745 22

when the original string is:

browser details MusmusculusmiR-146b 22 1 22 22 100.0% 26 - 20924724 20924745 22

is there a way to prevent this?

my full list of keys is:
['MusmusculusmiR-106a', 'MusmusculusmiR-433-3p', 'MusmusculusmiR-126-5p', 'MusmusculusmiR-106b', 'MusmusculusmiR-216a', 'MusmusculusmiR-324-5p', 'MusmusculusmiR-762', 'MusmusculusmiR-7121', 'MusmusculusmiR-760', 'MusmusculusmiR-200b', 'MusmusculusmiR-200c', 'MusmusculusmiR-200a', 'MusmusculusmiR-241', 'MusmusculusmiR-30a-5p', 'MusmusculusmiR-802', 'MusmusculusmiR-801', 'MusmusculusmiR-805', 'MusmusculusmiR-804', 'MusmusculusmiR-216b', 'MusmusculusmiR-667', 'MusmusculusmiR-666', 'MusmusculusmiR-665', 'MusmusculusmiR-741', 'MusmusculusmiR-742', 'MusmusculusmiR-668', 'MusmusculusmiR-744', 'MusmusculusmiR-1401', 'MusmusculusmiR-34a', 'MusmusculusmiR-34b', 'MusmusculusmiR-34c', 'MusmusculusmiR-592', 'MusmusculusmiR-455-5p', 'MusmusculusmiR-698', 'MusmusculusmiR-376a1', 'MusmusculusmiR-344', 'MusmusculusmiR-697', 'MusmusculusmiR-694', 'MusmusculusmiR-695', 'MusmusculusmiR-340', 'MusmusculusmiR-341', 'MusmusculusmiR-342', 'MusmusculusmiR-691', 'MusmusculusmiR-542-5p', 'MusmusculusmiR-764-5p', 'MusmusculusmiR-122a', 'MusmusculusmiR-142-5p', 'MusmusculusmiR-449', 'MusmusculusmiR-448', 'MusmusculusmiR-23a', 'MusmusculusmiR-23b', 'MusmusculusmiR-6741', 'MusmusculusmiR-135b', 'MusmusculusmiR-135a', 'MusmusculusmiR-301b', 'MusmusculusmiR-129-5p', 'MusmusculusmiR-30b', 'MusmusculusmiR-30c', 'MusmusculusmiR-30d', 'MusmusculusmiR-30e', 'MusmusculusmiR-292-3p', 'MusmusculusmiR-713', 'MusmusculusmiR-499', 'MusmusculusmiR-711', 'MusmusculusmiR-710', 'MusmusculusmiR-717', 'MusmusculusmiR-715', 'MusmusculusmiR-714', 'MusmusculusmiR-490', 'MusmusculusmiR-491', 'MusmusculusmiR-719', 'MusmusculusmiR-718', 'MusmusculusmiR-494', 'MusmusculusmiR-495', 'MusmusculusmiR-496', 'MusmusculusmiR-497', 'MusmusculusmiR-297b', 'MusmusculusmiR-485-5p', 'MusmusculusmiR-300', 'MusmusculusmiR-301', 'MusmusculusmiR-302', 'MusmusculusmiR-422b', 'MusmusculusmiR-33', 'MusmusculusmiR-32', 'MusmusculusmiR-31', 'MusmusculusmiR-181d', 'MusmusculusmiR-27a', 'MusmusculusmiR-27b', 'MusmusculusmiR-450b1', 'MusmusculusmiR-551b', 'MusmusculusmiR-302b1', 'MusmusculusmiR-155', 'MusmusculusmiR-154', 'MusmusculusmiR-151', 'MusmusculusmiR-150', 'MusmusculusmiR-153', 'MusmusculusmiR-152', 'MusmusculusmiR-409', 'MusmusculusmiR-470', 'MusmusculusmiR-471', 'MusmusculusmiR-15a', 'MusmusculusmiR-15b', 'MusmusculusmiR-675-3p', 'MusmusculusmiR-712', 'MusmusculusmiR-199a', 'MusmusculusmiR-199b', 'MusmusculusmiR-148b', 'MusmusculusmiR-148a', 'MusmusculusmiR-615', 'MusmusculusmiR-759', 'MusmusculusmiR-758', 'MusmusculusmiR-30e1', 'MusmusculusmiR-374-3p', 'MusmusculusmiR-291a-5p', 'MusmusculusmiR-488', 'MusmusculusmiR-689', 'MusmusculusmiR-688', 'MusmusculusmiR-685', 'MusmusculusmiR-684', 'MusmusculusmiR-687', 'MusmusculusmiR-686', 'MusmusculusmiR-681', 'MusmusculusmiR-680', 'MusmusculusmiR-683', 'MusmusculusmiR-682', 'MusmusculusmiR-351', 'MusmusculusmiR-350', 'MusmusculusmiR-720', 'MusmusculusmiR-721', 'MusmusculusmiR-4671', 'MusmusculusmiR-181a1', 'MusmusculusmiR-7b', 'MusmusculusmiR-130a', 'MusmusculusmiR-130b', 'MusmusculusmiR-4881', 'MusmusculusmiR-380-5p', 'MusmusculusmiR-127', 'MusmusculusmiR-467b', 'MusmusculusmiR-467a', 'MusmusculusmiR-431', 'MusmusculusmiR-291b-5p', 'MusmusculusmiR-532', 'MusmusculusmiR-539', 'MusmusculusmiR-128a', 'MusmusculusmiR-128b', 'MusmusculusmiR-543', 'MusmusculusmiR-540', 'MusmusculusmiR-542-3p', 'MusmusculusmiR-546', 'MusmusculusmiR-547', 'MusmusculusmiR-223', 'MusmusculusmiR-222', 'MusmusculusmiR-693-5p', 'MusmusculusmiR-224', 'MusmusculusmiR-91', 'MusmusculusmiR-93', 'MusmusculusmiR-92', 'MusmusculusmiR-96', 'MusmusculusmiR-98', 'MusmusculusmiR-99b', 'MusmusculusmiR-17-5p', 'MusmusculusmiR-434-3p', 'MusmusculusmiR-770-3p', 'MusmusculusmiR-763', 'MusmusculusmiR-489', 'MusmusculusmiR-761', 'MusmusculusmiR-486', 'MusmusculusmiR-484', 'MusmusculusmiR-483', 'MusmusculusmiR-652', 'MusmusculusmiR-21', 'MusmusculusmiR-22', 'MusmusculusmiR-24', 'MusmusculusmiR-25', 'MusmusculusmiR-146b', 'MusmusculusmiR-28', 'MusmusculusmiR-362', 'MusmusculusmiR-363', 'MusmusculusmiR-361', 'MusmusculusmiR-367', 'MusmusculusmiR-365', 'MusmusculusmiR-302c1', 'MusmusculusmiR-692', 'MusmusculusmiR-182', 'MusmusculusmiR-183', 'MusmusculusmiR-186', 'MusmusculusmiR-187', 'MusmusculusmiR-184', 'MusmusculusmiR-185', 'MusmusculusmiR-324-3p', 'MusmusculusmiR-188', 'MusmusculusmiR-124a', 'MusmusculusmiR-463', 'MusmusculusmiR-464', 'MusmusculusmiR-466', 'MusmusculusmiR-469', 'MusmusculusmiR-468', 'MusmusculusmiR-505', 'MusmusculusmiR-503', 'MusmusculusmiR-500', 'MusmusculusmiR-501', 'MusmusculusmiR-212', 'MusmusculusmiR-210', 'MusmusculusmiR-211', 'MusmusculusmiR-26b', 'MusmusculusmiR-26a', 'MusmusculusmiR-215', 'MusmusculusmiR-218', 'MusmusculusmiR-219', 'MusmusculusmiR-465-3p', 'MusmusculusmiR-376a', 'MusmusculusmiR-376b', 'MusmusculusmiR-376c', 'MusmusculusmiR-369-5p', 'MusmusculusmiR-133a', 'MusmusculusmiR-133b', 'MusmusculusmiR-6761', 'MusmusculusmiR-9', 'MusmusculusmiR-129-3p', 'MusmusculusmiR-1', 'MusmusculusmiR-7', 'MusmusculusmiR-675-5p', 'MusmusculusmiR-101a', 'MusmusculusmiR-101b', 'MusmusculusmiR-217', 'MusmusculusmiR-214', 'MusmusculusmiR-699', 'MusmusculusmiR-326', 'MusmusculusmiR-696', 'MusmusculusmiR-325', 'MusmusculusmiR-322', 'MusmusculusmiR-323', 'MusmusculusmiR-320', 'MusmusculusmiR-345', 'MusmusculusmiR-346', 'MusmusculusmiR-328', 'MusmusculusmiR-329', 'MusmusculusmiR-18', 'MusmusculusmiR-764-3p', 'MusmusculusmiR-16', 'MusmusculusmiR-690', 'MusmusculusmiR-429', 'MusmusculusmiR-425', 'MusmusculusmiR-424', 'MusmusculusmiR-423', 'MusmusculusmiR-132', 'MusmusculusmiR-137', 'MusmusculusmiR-136', 'MusmusculusmiR-134', 'MusmusculusmiR-139', 'MusmusculusmiR-138', 'MusmusculusmiR-30a-3p', 'MusmusculusmiR-541', 'MusmusculusmiR-199a1', 'MusmusculusmiR-291b-3p', 'MusmusculusmiR-221', 'MusmusculusmiR-292-5p', 'MusmusculusmiR-450b', 'MusmusculusmiR-455-3p', 'MusmusculusmiR-181b', 'MusmusculusmiR-708', 'MusmusculusmiR-709', 'MusmusculusmiR-704', 'MusmusculusmiR-705', 'MusmusculusmiR-376b1', 'MusmusculusmiR-706', 'MusmusculusmiR-291a-3p', 'MusmusculusmiR-700', 'MusmusculusmiR-701', 'MusmusculusmiR-485-3p', 'MusmusculusmiR-678', 'MusmusculusmiR-679', 'MusmusculusmiR-674', 'MusmusculusmiR-676', 'MusmusculusmiR-677', 'MusmusculusmiR-670', 'MusmusculusmiR-671', 'MusmusculusmiR-672', 'MusmusculusmiR-673', 'MusmusculusmiR-19a', 'MusmusculusmiR-19b', 'MusmusculusmiR-379', 'MusmusculusmiR-378', 'MusmusculusmiR-29b', 'MusmusculusmiR-370', 'MusmusculusmiR-29a', 'MusmusculusmiR-375', 'MusmusculusmiR-377', 'MusmusculusmiR-10b', 'MusmusculusmiR-10a', 'MusmusculusmiR-487b', 'MusmusculusmiR-702', 'MusmusculusmiR-191', 'MusmusculusmiR-190', 'MusmusculusmiR-193', 'MusmusculusmiR-192', 'MusmusculusmiR-195', 'MusmusculusmiR-194', 'MusmusculusmiR-380-3p', 'MusmusculusmiR-450', 'MusmusculusmiR-451', 'MusmusculusmiR-452', 'MusmusculusmiR-126-3p', 'MusmusculusmiR-103', 'MusmusculusmiR-100', 'MusmusculusmiR-107', 'MusmusculusmiR-133a1', 'MusmusculusmiR-298', 'MusmusculusmiR-299', 'MusmusculusmiR-293', 'MusmusculusmiR-290', 'MusmusculusmiR-296', 'MusmusculusmiR-297', 'MusmusculusmiR-294', 'MusmusculusmiR-295', 'MusmusculusmiR-743', 'MusmusculusmiR-201', 'MusmusculusmiR-203', 'MusmusculusmiR-202', 'MusmusculusmiR-205', 'MusmusculusmiR-204', 'MusmusculusmiR-207', 'MusmusculusmiR-206', 'MusmusculusmiR-208', 'MusmusculusmiR-433-5p', 'MusmusculusmiR-693-3p', 'Musmusculuslet-7d1', 'MusmusculusmiR-125b', 'MusmusculusmiR-125a', 'MusmusculusmiR-381', 'MusmusculusmiR-99a', 'MusmusculusmiR-434-5p', 'MusmusculusmiR-17-3p', 'MusmusculusmiR-5011', 'MusmusculusmiR-374-5p', 'MusmusculusmiR-465-5p', 'MusmusculusmiR-142-3p', 'MusmusculusmiR-20a', 'MusmusculusmiR-20b', 'MusmusculusmiR-146', 'MusmusculusmiR-144', 'MusmusculusmiR-335', 'MusmusculusmiR-181a', 'MusmusculusmiR-337', 'MusmusculusmiR-181c', 'MusmusculusmiR-331', 'MusmusculusmiR-330', 'MusmusculusmiR-669c', 'MusmusculusmiR-669b', 'MusmusculusmiR-669a', 'MusmusculusmiR-707', 'MusmusculusmiR-339', 'MusmusculusmiR-338', 'MusmusculusmiR-369-3p', 'MusmusculusmiR-703', 'MusmusculusmiR-302c', 'MusmusculusmiR-302b', 'MusmusculusmiR-141', 'MusmusculusmiR-302d', 'Musmusculuslet-7b', 'Musmusculuslet-7c', 'Musmusculuslet-7a', 'Musmusculuslet-7f', 'Musmusculuslet-7g', 'Musmusculuslet-7d', 'Musmusculuslet-7e', 'Musmusculuslet-7i', 'MusmusculusmiR-449b', 'MusmusculusmiR-382', 'MusmusculusmiR-383', 'MusmusculusmiR-384', 'MusmusculusmiR-410', 'MusmusculusmiR-411', 'MusmusculusmiR-412', 'MusmusculusmiR-145', 'MusmusculusmiR-143', 'MusmusculusmiR-140', 'MusmusculusmiR-29c', 'MusmusculusmiR-196a', 'MusmusculusmiR-196b', 'MusmusculusmiR-149'

thanks,

Mark
Aug 4 '07 #4

ilikepython
Expert 100+
P: 844
Thanks for the help ilikepython and bvdet. I'm running into only one problem. I am getting multiple matches for certain strings, e.g. the key
MusmusculusmiR-1 also matches with MusmusculusmiR-146b, so I get the following output:

browser details MusmusculusmiR-1 UGGAAUGUAAAGAAGUAUGUA46b UGAGAACUGAAUUCCAUAGGCU 22 1 22 22 100.0% 26 - 20924724 20924745 22

when the original string is:

browser details MusmusculusmiR-146b 22 1 22 22 100.0% 26 - 20924724 20924745 22

is there a way to prevent this?

my full list of keys is:
['MusmusculusmiR-106a', 'MusmusculusmiR-433-3p', 'MusmusculusmiR-126-5p', 'MusmusculusmiR-106b', 'MusmusculusmiR-216a', 'MusmusculusmiR-324-5p', 'MusmusculusmiR-762', 'MusmusculusmiR-7121', 'MusmusculusmiR-760', 'MusmusculusmiR-200b', 'MusmusculusmiR-200c', 'MusmusculusmiR-200a', 'MusmusculusmiR-241', 'MusmusculusmiR-30a-5p',
<CLIPPED>
'MusmusculusmiR-143', 'MusmusculusmiR-140', 'MusmusculusmiR-29c', 'MusmusculusmiR-196a', 'MusmusculusmiR-196b', 'MusmusculusmiR-149'

thanks,

Mark
This is similar to Bv's second way:
Expand|Select|Wrap|Line Numbers
  1. teststr = "browser details MusmusculusmiR-146b    22     1    22    22 100.0%    26   -   20924724  20924745     22"
  2. words = teststr.split()
  3.  
  4. key = words[2]    # will the key always be the second word?
  5. if key in adict.keys():      
  6.     finalstring = teststr.replace(key, "%s %s" % (key, adict[key])
  7.  
If the key is not always the second word you could check every word if there is only one key per string.
Aug 5 '07 #5

P: 19
I tried your suggestion but recieved the same result. Is there a statement I could write that checks each line for capital A,T,C, or G? If I could put that into an 'if' statement then maybe it wouldn't re-format a line that has already been formatted. Of course then there would be the problem of did it replace it with Mus..R-1, or with Mus..R-106a, etc. Is there an order, or is it random because I am using a dictionary?

Mark
Aug 5 '07 #6

ilikepython
Expert 100+
P: 844
I tried your suggestion but recieved the same result. Is there a statement I could write that checks each line for capital A,T,C, or G? If I could put that into an 'if' statement then maybe it wouldn't re-format a line that has already been formatted. Of course then there would be the problem of did it replace it with Mus..R-1, or with Mus..R-106a, etc. Is there an order, or is it random because I am using a dictionary?

Mark
I'm not really sure what you mean. Are you checking each string more than once? Everytime you finish formatting a string you can append it to a list and the next time, if it is in the list, don't format it. I don't think you should have a problem with matching the wrong key. Could you post the code you used?
Aug 5 '07 #7

bvdet
Expert Mod 2.5K+
P: 2,851
I tried your suggestion but recieved the same result. Is there a statement I could write that checks each line for capital A,T,C, or G? If I could put that into an 'if' statement then maybe it wouldn't re-format a line that has already been formatted. Of course then there would be the problem of did it replace it with Mus..R-1, or with Mus..R-106a, etc. Is there an order, or is it random because I am using a dictionary?

Mark
Try this regex solution to see if it works for you. It matches the empty string at the beginning or end of a word. Then the string is split on the space character and should replace only on a full match:
Expand|Select|Wrap|Line Numbers
  1. print dd
  2.  
  3. import re
  4.  
  5. s1 = "browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21\nbrowser details MusmusculusmiR-314-5p 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details MusmusculusmiR-31 21 1 21 21 100.0% 22 + 46884872 46884892 21"
  6. patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b              # Matches "Musmusculuslet-" followed by alphanumeric
  7.                                                                 # characters at word borderlines
  8.                       |\bMusmusculusmiR-[0-9a-z\-]+\b           # Matches "MusmusculusmiR-" followed by alphanumeric
  9.                                                                 # characters or dashes at word borderlines
  10.                       ''', re.VERBOSE)
  11.  
  12. strList = patt.findall(s1)
  13. s2 = s1
  14. for item in strList:
  15.     if dd.has_key(item):
  16.         s2List = s2.split(' ')
  17.         idx = s2List.index(item)
  18.         s2List[idx] = '%s %s' % (item, dd[item])
  19.         s2 = ' '.join(s2List)
  20.  
  21. print s2
Output:
>>> {'MusmusculusmiR-1': 'UGGAAUGUAAAGAAGUAUGUA', 'MusmusculusmiR-314-5p': 'UGAGGUAGUAGUUUGUACAGU', 'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU', 'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU', 'MusmusculusmiR-31': 'UGGAAUGUAAAGAAGUAUGUA'}
browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i UGAGGUAGUAGUUUGUGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21
browser details MusmusculusmiR-314-5p UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details MusmusculusmiR-31 UGGAAUGUAAAGAAGUAUGUA 21 1 21 21 100.0% 22 + 46884872 46884892 21

>>>
Aug 5 '07 #8

P: 19
The code I am currently using and still getting the same problem:

Expand|Select|Wrap|Line Numbers
  1. def EditFile ( s1, dd ):
  2.     print dd
  3.     import re
  4.         patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
  5.     strList = patt.findall(s1)
  6.     s2 = s1
  7.     for item in strList:
  8.         if dd.has_key(item):
  9.             s2List = s2.split(' ')
  10.             idx = s2List.index(item)
  11.             s2List[idx] = '%s %s' % (item, dd[item]))
  12.             s2 = ' '.join(s2List)
  13.     print s2
  14. ##      print
  15. ##    s3 = s1
  16. ##    words = s3.split()
  17. ##    key = words[2]
  18. ##    for key in dd:
  19. ##        if key in s3:
  20. ##            s3 = s3.replace(key, '%s %s' % (key, dd[key]))
  21. ##    print s3
  22.     f = open('editted BLAT Search Results-Mouse.txt', 'w')
  23.     f.writelines(s2)
  24.     f.close()
  25.     return s2
  26.  
  27.  
It seems to choke on the following matches:

MusmusculusmiR-1 is read when it reads MusmusculusmiR-124a, thus it gets written twice with two separate values from two separate keys:

UGGAAUGUAAAGAAGUAUGUA24a
followed by UAAGGCACGCGGUGAAUGCC

The first is the value for key MusmusculusmiR-1(without that 24a that it at the end), the second is the value for key MusmusculusmiR-124a

It is also still choking on the following matches:

MusmusculusmiR-126-5p (weird, since it doesn't mind MusmusculusmiR-126-3p)
MusmusculusmiR-127, MusmusculusmiR-128a, MusmusculusmiR-130, MusmusculusmiR-129-5p
and MusmusculusmiR-324-3p because there is a MusmusculusmiR-32.

I ran the above code and got the same results I did with the previous code, which is strange. Did I miss something in my transcription? what exactly does the \b do in your code?

Mark
Aug 6 '07 #9

P: 19
Okay, now I am getting a new error:

Traceback (most recent call last):
File "<pyshell#28>", line 1, in <module>
newfile = EditFile ( data, mouse )
File "BatchEditor.py", line 45, in EditFile
patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
File "C:\Python25\lib\re.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python25\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: nothing to repeat

I edited the code, so it is now like this:

Expand|Select|Wrap|Line Numbers
  1. def EditFile ( s1, dd ):
  2.  
  3.     #print dd
  4.     import re
  5.     patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
  6.     strList = patt.findall(s1)
  7.     s2 = s1
  8.     print strList
  9.     for item in strList:
  10.         if dd.has_key(item):
  11.             s2List = s2.split(' ')
  12.             idx = s2List.index(item)
  13.             s2List[idx] = '%s %s' % (item, dd[item])
  14.             s2 = ' '.join(s2List)
  15.     print s2
  16. ##      print
  17. ##    s3 = s1
  18. ##    words = s3.split()
  19. ##    key = words[2]
  20. ##    for key in dd:
  21. ##        if key in s3:
  22. ##            s3 = s3.replace(key, '%s %s' % (key, dd[key]))
  23. ##    print s3
  24.     f = open('editted BLAT Search Results-Mouse.txt', 'w')
  25.     f.writelines(s2)
  26.     f.close()
  27.     return s2
  28.  
I should ask is that a single quote followed by a double-quote at the beginning and end of the re.compile statement? I had it set as three single-quotes and then realized that is probably wrong.

Mark
Aug 6 '07 #10

bvdet
Expert Mod 2.5K+
P: 2,851
Okay, now I am getting a new error:

Traceback (most recent call last):
File "<pyshell#28>", line 1, in <module>
newfile = EditFile ( data, mouse )
File "BatchEditor.py", line 45, in EditFile
patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
File "C:\Python25\lib\re.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python25\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: nothing to repeat

I edited the code, so it is now like this:

Expand|Select|Wrap|Line Numbers
  1. def EditFile ( s1, dd ):
  2.  
  3.     #print dd
  4.     import re
  5.     patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
  6.     strList = patt.findall(s1)
  7.     s2 = s1
  8.     print strList
  9.     for item in strList:
  10.         if dd.has_key(item):
  11.             s2List = s2.split(' ')
  12.             idx = s2List.index(item)
  13.             s2List[idx] = '%s %s' % (item, dd[item])
  14.             s2 = ' '.join(s2List)
  15.     print s2
  16. ##      print
  17. ##    s3 = s1
  18. ##    words = s3.split()
  19. ##    key = words[2]
  20. ##    for key in dd:
  21. ##        if key in s3:
  22. ##            s3 = s3.replace(key, '%s %s' % (key, dd[key]))
  23. ##    print s3
  24.     f = open('editted BLAT Search Results-Mouse.txt', 'w')
  25.     f.writelines(s2)
  26.     f.close()
  27.     return s2
  28.  
I should ask is that a single quote followed by a double-quote at the beginning and end of the re.compile statement? I had it set as three single-quotes and then realized that is probably wrong.

Mark
The error you received is caused by an additional '+' character after '\b'. Since '\b' just matches the whitespace between words, there is nothing to repeat.

Three single quotes or three double quotes would be correct.
Aug 6 '07 #11

P: 19
I re-copied and re-pasted the code, and it is working much better now. The program is no longer splitting the keys, but it is pasting multiple values back-to-back instead of next to the key for multiple matches:

browser details MusmusculusmiR-450b1 AUUGGGAACAUUUUGCAUGCAU AUUGGGAACAUUUUGCAUGCAU 20 1 22 22 95.5% Un.003.104 - 440337 440358 22
browser details MusmusculusmiR-450b1 20 1 22 22 95.5% Un.003.104 - 440652 440673 22

This is something I can live with, unless there is some easy way to fix it. I am going to import the whole thing into Access for a database when I am through.

Thanks again,
Mark
Aug 6 '07 #12

bvdet
Expert Mod 2.5K+
P: 2,851
I re-copied and re-pasted the code, and it is working much better now. The program is no longer splitting the keys, but it is pasting multiple values back-to-back instead of next to the key for multiple matches:

browser details MusmusculusmiR-450b1 AUUGGGAACAUUUUGCAUGCAU AUUGGGAACAUUUUGCAUGCAU 20 1 22 22 95.5% Un.003.104 - 440337 440358 22
browser details MusmusculusmiR-450b1 20 1 22 22 95.5% Un.003.104 - 440652 440673 22

This is something I can live with, unless there is some easy way to fix it. I am going to import the whole thing into Access for a database when I am through.

Thanks again,
Mark
Do you have multiple occurrences of the key 'MusmusculusmiR-450b1' in the string? That would explain the double values. Try this:
Expand|Select|Wrap|Line Numbers
  1. print dd
  2.  
  3. import re
  4.  
  5. s1 = "browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21\nbrowser details MusmusculusmiR-314-5p 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details MusmusculusmiR-31 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details MusmusculusmiR-31 21 1 21 21 100.0% 22 + 46884872 46884892 21"
  6. patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b              # Matches "Musmusculuslet-" followed by alphanumeric
  7.                                                                 # characters at word borderlines
  8.                       |\bMusmusculusmiR-[0-9a-z\-]+\b           # Matches "MusmusculusmiR-" followed by alphanumeric
  9.                                                                 # characters or dashes at word borderlines
  10.                       ''', re.VERBOSE)
  11.  
  12. sList = s1.split('\n')
  13. outList = []
  14. for item in sList:
  15.     tem = patt.search(item)
  16.     if tem:
  17.         if dd.has_key(tem.group(0)):
  18.             item = item.replace(tem.group(0), '%s %s' % (tem.group(0), dd[tem.group(0)]))
  19.     outList.append(item)
  20.  
  21. s2 = '\n'.join(outList)
  22. print s2
>>> {'MusmusculusmiR-1': 'UGGAAUGUAAAGAAGUAUGUA', 'MusmusculusmiR-314-5p': 'UGAGGUAGUAGUUUGUACAGU', 'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU', 'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU', 'MusmusculusmiR-31': 'UGGAAUGUAAAGAAGUAUGUA'}
browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i UGAGGUAGUAGUUUGUGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21
browser details MusmusculusmiR-314-5p UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details MusmusculusmiR-31 UGGAAUGUAAAGAAGUAUGUA 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details MusmusculusmiR-31 UGGAAUGUAAAGAAGUAUGUA 21 1 21 21 100.0% 22 + 46884872 46884892 21
>>>
Aug 6 '07 #13

P: 19
That has done the trick! Thanks for all of the help, I didn't even know about Python having a regex module. Still wondering why the triple quotes, but I will go to python.org and read up on it.

Mark
Aug 6 '07 #14

Post your reply

Sign in to post your reply or Sign up for a free account.