473,379 Members | 1,185 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,379 software developers and data experts.

Iterating over a string

Hi everybody,

Does anyone know if the adict.has_key(k) command can be used to match a string against a dictionary key? I'm trying to append a value from my dictionary to a string when it is found.

String example:

browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21

Dictionary example:

'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU'
'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU'

What I want:

browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
etc.

Thanks,

Mark
Aug 3 '07 #1
13 1781
ilikepython
844 Expert 512MB
Hi everybody,

Does anyone know if the adict.has_key(k) command can be used to match a string against a dictionary key? I'm trying to append a value from my dictionary to a string when it is found.

String example:

browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21

Dictionary example:

'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU'
'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU'

What I want:

browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
etc.

Thanks,

Mark
I'm not sure if this is exactly what you need:
Expand|Select|Wrap|Line Numbers
  1. import re
  2. patt = re.compile("Musmusculuslet-..")
  3.  
  4. teststr = "browser details Musmusculuslet-7g    21     1    21    21 100.0%    22   +   46884872  46884892     21"
  5. match = patt.findall(teststr)
  6.  
  7. if match:
  8.     if adict.has_key(match[0]):
  9.         ind = teststr.index(match[0])
  10.         finalstring = "%s%s%s" % (teststr[:ind+len(match[0])], adict[match[0]], teststr[ind+len(match[0]):])
  11.  
Aug 4 '07 #2
bvdet
2,851 Expert Mod 2GB
Hi everybody,

Does anyone know if the adict.has_key(k) command can be used to match a string against a dictionary key? I'm trying to append a value from my dictionary to a string when it is found.

String example:

browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21

Dictionary example:

'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU'
'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU'

What I want:

browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
etc.

Thanks,

Mark
Following are a couple of ways:
Expand|Select|Wrap|Line Numbers
  1. print dd
  2. import re
  3.  
  4. s1 = "browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21"
  5. patt = re.compile(r'Musmusculuslet-[0-9a-z]+|MusmusculusmiR-\d+')
  6. strList = patt.findall(s1)
  7. s2 = s1
  8. for item in strList:
  9.     if dd.has_key(item):
  10.         s2 = s2.replace(item, '%s %s' % (item, dd[item]))
  11.  
  12. print s2
  13.  
  14. print
  15.  
  16. s3 = s1
  17. for key in dd:
  18.     if key in s3:
  19.         s3 = s3.replace(key, '%s %s' % (key, dd[key]))
  20.  
  21. print s3
Output:
>>> {'MusmusculusmiR-1': 'UGGAAUGUAAAGAAGUAUGUA', 'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU', 'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU'}
browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i UGAGGUAGUAGUUUGUGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21

browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i UGAGGUAGUAGUUUGUGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21
>>>
Aug 4 '07 #3
Thanks for the help ilikepython and bvdet. I'm running into only one problem. I am getting multiple matches for certain strings, e.g. the key
MusmusculusmiR-1 also matches with MusmusculusmiR-146b, so I get the following output:

browser details MusmusculusmiR-1 UGGAAUGUAAAGAAGUAUGUA46b UGAGAACUGAAUUCCAUAGGCU 22 1 22 22 100.0% 26 - 20924724 20924745 22

when the original string is:

browser details MusmusculusmiR-146b 22 1 22 22 100.0% 26 - 20924724 20924745 22

is there a way to prevent this?

my full list of keys is:
['MusmusculusmiR-106a', 'MusmusculusmiR-433-3p', 'MusmusculusmiR-126-5p', 'MusmusculusmiR-106b', 'MusmusculusmiR-216a', 'MusmusculusmiR-324-5p', 'MusmusculusmiR-762', 'MusmusculusmiR-7121', 'MusmusculusmiR-760', 'MusmusculusmiR-200b', 'MusmusculusmiR-200c', 'MusmusculusmiR-200a', 'MusmusculusmiR-241', 'MusmusculusmiR-30a-5p', 'MusmusculusmiR-802', 'MusmusculusmiR-801', 'MusmusculusmiR-805', 'MusmusculusmiR-804', 'MusmusculusmiR-216b', 'MusmusculusmiR-667', 'MusmusculusmiR-666', 'MusmusculusmiR-665', 'MusmusculusmiR-741', 'MusmusculusmiR-742', 'MusmusculusmiR-668', 'MusmusculusmiR-744', 'MusmusculusmiR-1401', 'MusmusculusmiR-34a', 'MusmusculusmiR-34b', 'MusmusculusmiR-34c', 'MusmusculusmiR-592', 'MusmusculusmiR-455-5p', 'MusmusculusmiR-698', 'MusmusculusmiR-376a1', 'MusmusculusmiR-344', 'MusmusculusmiR-697', 'MusmusculusmiR-694', 'MusmusculusmiR-695', 'MusmusculusmiR-340', 'MusmusculusmiR-341', 'MusmusculusmiR-342', 'MusmusculusmiR-691', 'MusmusculusmiR-542-5p', 'MusmusculusmiR-764-5p', 'MusmusculusmiR-122a', 'MusmusculusmiR-142-5p', 'MusmusculusmiR-449', 'MusmusculusmiR-448', 'MusmusculusmiR-23a', 'MusmusculusmiR-23b', 'MusmusculusmiR-6741', 'MusmusculusmiR-135b', 'MusmusculusmiR-135a', 'MusmusculusmiR-301b', 'MusmusculusmiR-129-5p', 'MusmusculusmiR-30b', 'MusmusculusmiR-30c', 'MusmusculusmiR-30d', 'MusmusculusmiR-30e', 'MusmusculusmiR-292-3p', 'MusmusculusmiR-713', 'MusmusculusmiR-499', 'MusmusculusmiR-711', 'MusmusculusmiR-710', 'MusmusculusmiR-717', 'MusmusculusmiR-715', 'MusmusculusmiR-714', 'MusmusculusmiR-490', 'MusmusculusmiR-491', 'MusmusculusmiR-719', 'MusmusculusmiR-718', 'MusmusculusmiR-494', 'MusmusculusmiR-495', 'MusmusculusmiR-496', 'MusmusculusmiR-497', 'MusmusculusmiR-297b', 'MusmusculusmiR-485-5p', 'MusmusculusmiR-300', 'MusmusculusmiR-301', 'MusmusculusmiR-302', 'MusmusculusmiR-422b', 'MusmusculusmiR-33', 'MusmusculusmiR-32', 'MusmusculusmiR-31', 'MusmusculusmiR-181d', 'MusmusculusmiR-27a', 'MusmusculusmiR-27b', 'MusmusculusmiR-450b1', 'MusmusculusmiR-551b', 'MusmusculusmiR-302b1', 'MusmusculusmiR-155', 'MusmusculusmiR-154', 'MusmusculusmiR-151', 'MusmusculusmiR-150', 'MusmusculusmiR-153', 'MusmusculusmiR-152', 'MusmusculusmiR-409', 'MusmusculusmiR-470', 'MusmusculusmiR-471', 'MusmusculusmiR-15a', 'MusmusculusmiR-15b', 'MusmusculusmiR-675-3p', 'MusmusculusmiR-712', 'MusmusculusmiR-199a', 'MusmusculusmiR-199b', 'MusmusculusmiR-148b', 'MusmusculusmiR-148a', 'MusmusculusmiR-615', 'MusmusculusmiR-759', 'MusmusculusmiR-758', 'MusmusculusmiR-30e1', 'MusmusculusmiR-374-3p', 'MusmusculusmiR-291a-5p', 'MusmusculusmiR-488', 'MusmusculusmiR-689', 'MusmusculusmiR-688', 'MusmusculusmiR-685', 'MusmusculusmiR-684', 'MusmusculusmiR-687', 'MusmusculusmiR-686', 'MusmusculusmiR-681', 'MusmusculusmiR-680', 'MusmusculusmiR-683', 'MusmusculusmiR-682', 'MusmusculusmiR-351', 'MusmusculusmiR-350', 'MusmusculusmiR-720', 'MusmusculusmiR-721', 'MusmusculusmiR-4671', 'MusmusculusmiR-181a1', 'MusmusculusmiR-7b', 'MusmusculusmiR-130a', 'MusmusculusmiR-130b', 'MusmusculusmiR-4881', 'MusmusculusmiR-380-5p', 'MusmusculusmiR-127', 'MusmusculusmiR-467b', 'MusmusculusmiR-467a', 'MusmusculusmiR-431', 'MusmusculusmiR-291b-5p', 'MusmusculusmiR-532', 'MusmusculusmiR-539', 'MusmusculusmiR-128a', 'MusmusculusmiR-128b', 'MusmusculusmiR-543', 'MusmusculusmiR-540', 'MusmusculusmiR-542-3p', 'MusmusculusmiR-546', 'MusmusculusmiR-547', 'MusmusculusmiR-223', 'MusmusculusmiR-222', 'MusmusculusmiR-693-5p', 'MusmusculusmiR-224', 'MusmusculusmiR-91', 'MusmusculusmiR-93', 'MusmusculusmiR-92', 'MusmusculusmiR-96', 'MusmusculusmiR-98', 'MusmusculusmiR-99b', 'MusmusculusmiR-17-5p', 'MusmusculusmiR-434-3p', 'MusmusculusmiR-770-3p', 'MusmusculusmiR-763', 'MusmusculusmiR-489', 'MusmusculusmiR-761', 'MusmusculusmiR-486', 'MusmusculusmiR-484', 'MusmusculusmiR-483', 'MusmusculusmiR-652', 'MusmusculusmiR-21', 'MusmusculusmiR-22', 'MusmusculusmiR-24', 'MusmusculusmiR-25', 'MusmusculusmiR-146b', 'MusmusculusmiR-28', 'MusmusculusmiR-362', 'MusmusculusmiR-363', 'MusmusculusmiR-361', 'MusmusculusmiR-367', 'MusmusculusmiR-365', 'MusmusculusmiR-302c1', 'MusmusculusmiR-692', 'MusmusculusmiR-182', 'MusmusculusmiR-183', 'MusmusculusmiR-186', 'MusmusculusmiR-187', 'MusmusculusmiR-184', 'MusmusculusmiR-185', 'MusmusculusmiR-324-3p', 'MusmusculusmiR-188', 'MusmusculusmiR-124a', 'MusmusculusmiR-463', 'MusmusculusmiR-464', 'MusmusculusmiR-466', 'MusmusculusmiR-469', 'MusmusculusmiR-468', 'MusmusculusmiR-505', 'MusmusculusmiR-503', 'MusmusculusmiR-500', 'MusmusculusmiR-501', 'MusmusculusmiR-212', 'MusmusculusmiR-210', 'MusmusculusmiR-211', 'MusmusculusmiR-26b', 'MusmusculusmiR-26a', 'MusmusculusmiR-215', 'MusmusculusmiR-218', 'MusmusculusmiR-219', 'MusmusculusmiR-465-3p', 'MusmusculusmiR-376a', 'MusmusculusmiR-376b', 'MusmusculusmiR-376c', 'MusmusculusmiR-369-5p', 'MusmusculusmiR-133a', 'MusmusculusmiR-133b', 'MusmusculusmiR-6761', 'MusmusculusmiR-9', 'MusmusculusmiR-129-3p', 'MusmusculusmiR-1', 'MusmusculusmiR-7', 'MusmusculusmiR-675-5p', 'MusmusculusmiR-101a', 'MusmusculusmiR-101b', 'MusmusculusmiR-217', 'MusmusculusmiR-214', 'MusmusculusmiR-699', 'MusmusculusmiR-326', 'MusmusculusmiR-696', 'MusmusculusmiR-325', 'MusmusculusmiR-322', 'MusmusculusmiR-323', 'MusmusculusmiR-320', 'MusmusculusmiR-345', 'MusmusculusmiR-346', 'MusmusculusmiR-328', 'MusmusculusmiR-329', 'MusmusculusmiR-18', 'MusmusculusmiR-764-3p', 'MusmusculusmiR-16', 'MusmusculusmiR-690', 'MusmusculusmiR-429', 'MusmusculusmiR-425', 'MusmusculusmiR-424', 'MusmusculusmiR-423', 'MusmusculusmiR-132', 'MusmusculusmiR-137', 'MusmusculusmiR-136', 'MusmusculusmiR-134', 'MusmusculusmiR-139', 'MusmusculusmiR-138', 'MusmusculusmiR-30a-3p', 'MusmusculusmiR-541', 'MusmusculusmiR-199a1', 'MusmusculusmiR-291b-3p', 'MusmusculusmiR-221', 'MusmusculusmiR-292-5p', 'MusmusculusmiR-450b', 'MusmusculusmiR-455-3p', 'MusmusculusmiR-181b', 'MusmusculusmiR-708', 'MusmusculusmiR-709', 'MusmusculusmiR-704', 'MusmusculusmiR-705', 'MusmusculusmiR-376b1', 'MusmusculusmiR-706', 'MusmusculusmiR-291a-3p', 'MusmusculusmiR-700', 'MusmusculusmiR-701', 'MusmusculusmiR-485-3p', 'MusmusculusmiR-678', 'MusmusculusmiR-679', 'MusmusculusmiR-674', 'MusmusculusmiR-676', 'MusmusculusmiR-677', 'MusmusculusmiR-670', 'MusmusculusmiR-671', 'MusmusculusmiR-672', 'MusmusculusmiR-673', 'MusmusculusmiR-19a', 'MusmusculusmiR-19b', 'MusmusculusmiR-379', 'MusmusculusmiR-378', 'MusmusculusmiR-29b', 'MusmusculusmiR-370', 'MusmusculusmiR-29a', 'MusmusculusmiR-375', 'MusmusculusmiR-377', 'MusmusculusmiR-10b', 'MusmusculusmiR-10a', 'MusmusculusmiR-487b', 'MusmusculusmiR-702', 'MusmusculusmiR-191', 'MusmusculusmiR-190', 'MusmusculusmiR-193', 'MusmusculusmiR-192', 'MusmusculusmiR-195', 'MusmusculusmiR-194', 'MusmusculusmiR-380-3p', 'MusmusculusmiR-450', 'MusmusculusmiR-451', 'MusmusculusmiR-452', 'MusmusculusmiR-126-3p', 'MusmusculusmiR-103', 'MusmusculusmiR-100', 'MusmusculusmiR-107', 'MusmusculusmiR-133a1', 'MusmusculusmiR-298', 'MusmusculusmiR-299', 'MusmusculusmiR-293', 'MusmusculusmiR-290', 'MusmusculusmiR-296', 'MusmusculusmiR-297', 'MusmusculusmiR-294', 'MusmusculusmiR-295', 'MusmusculusmiR-743', 'MusmusculusmiR-201', 'MusmusculusmiR-203', 'MusmusculusmiR-202', 'MusmusculusmiR-205', 'MusmusculusmiR-204', 'MusmusculusmiR-207', 'MusmusculusmiR-206', 'MusmusculusmiR-208', 'MusmusculusmiR-433-5p', 'MusmusculusmiR-693-3p', 'Musmusculuslet-7d1', 'MusmusculusmiR-125b', 'MusmusculusmiR-125a', 'MusmusculusmiR-381', 'MusmusculusmiR-99a', 'MusmusculusmiR-434-5p', 'MusmusculusmiR-17-3p', 'MusmusculusmiR-5011', 'MusmusculusmiR-374-5p', 'MusmusculusmiR-465-5p', 'MusmusculusmiR-142-3p', 'MusmusculusmiR-20a', 'MusmusculusmiR-20b', 'MusmusculusmiR-146', 'MusmusculusmiR-144', 'MusmusculusmiR-335', 'MusmusculusmiR-181a', 'MusmusculusmiR-337', 'MusmusculusmiR-181c', 'MusmusculusmiR-331', 'MusmusculusmiR-330', 'MusmusculusmiR-669c', 'MusmusculusmiR-669b', 'MusmusculusmiR-669a', 'MusmusculusmiR-707', 'MusmusculusmiR-339', 'MusmusculusmiR-338', 'MusmusculusmiR-369-3p', 'MusmusculusmiR-703', 'MusmusculusmiR-302c', 'MusmusculusmiR-302b', 'MusmusculusmiR-141', 'MusmusculusmiR-302d', 'Musmusculuslet-7b', 'Musmusculuslet-7c', 'Musmusculuslet-7a', 'Musmusculuslet-7f', 'Musmusculuslet-7g', 'Musmusculuslet-7d', 'Musmusculuslet-7e', 'Musmusculuslet-7i', 'MusmusculusmiR-449b', 'MusmusculusmiR-382', 'MusmusculusmiR-383', 'MusmusculusmiR-384', 'MusmusculusmiR-410', 'MusmusculusmiR-411', 'MusmusculusmiR-412', 'MusmusculusmiR-145', 'MusmusculusmiR-143', 'MusmusculusmiR-140', 'MusmusculusmiR-29c', 'MusmusculusmiR-196a', 'MusmusculusmiR-196b', 'MusmusculusmiR-149'

thanks,

Mark
Aug 4 '07 #4
ilikepython
844 Expert 512MB
Thanks for the help ilikepython and bvdet. I'm running into only one problem. I am getting multiple matches for certain strings, e.g. the key
MusmusculusmiR-1 also matches with MusmusculusmiR-146b, so I get the following output:

browser details MusmusculusmiR-1 UGGAAUGUAAAGAAGUAUGUA46b UGAGAACUGAAUUCCAUAGGCU 22 1 22 22 100.0% 26 - 20924724 20924745 22

when the original string is:

browser details MusmusculusmiR-146b 22 1 22 22 100.0% 26 - 20924724 20924745 22

is there a way to prevent this?

my full list of keys is:
['MusmusculusmiR-106a', 'MusmusculusmiR-433-3p', 'MusmusculusmiR-126-5p', 'MusmusculusmiR-106b', 'MusmusculusmiR-216a', 'MusmusculusmiR-324-5p', 'MusmusculusmiR-762', 'MusmusculusmiR-7121', 'MusmusculusmiR-760', 'MusmusculusmiR-200b', 'MusmusculusmiR-200c', 'MusmusculusmiR-200a', 'MusmusculusmiR-241', 'MusmusculusmiR-30a-5p',
<CLIPPED>
'MusmusculusmiR-143', 'MusmusculusmiR-140', 'MusmusculusmiR-29c', 'MusmusculusmiR-196a', 'MusmusculusmiR-196b', 'MusmusculusmiR-149'

thanks,

Mark
This is similar to Bv's second way:
Expand|Select|Wrap|Line Numbers
  1. teststr = "browser details MusmusculusmiR-146b    22     1    22    22 100.0%    26   -   20924724  20924745     22"
  2. words = teststr.split()
  3.  
  4. key = words[2]    # will the key always be the second word?
  5. if key in adict.keys():      
  6.     finalstring = teststr.replace(key, "%s %s" % (key, adict[key])
  7.  
If the key is not always the second word you could check every word if there is only one key per string.
Aug 5 '07 #5
I tried your suggestion but recieved the same result. Is there a statement I could write that checks each line for capital A,T,C, or G? If I could put that into an 'if' statement then maybe it wouldn't re-format a line that has already been formatted. Of course then there would be the problem of did it replace it with Mus..R-1, or with Mus..R-106a, etc. Is there an order, or is it random because I am using a dictionary?

Mark
Aug 5 '07 #6
ilikepython
844 Expert 512MB
I tried your suggestion but recieved the same result. Is there a statement I could write that checks each line for capital A,T,C, or G? If I could put that into an 'if' statement then maybe it wouldn't re-format a line that has already been formatted. Of course then there would be the problem of did it replace it with Mus..R-1, or with Mus..R-106a, etc. Is there an order, or is it random because I am using a dictionary?

Mark
I'm not really sure what you mean. Are you checking each string more than once? Everytime you finish formatting a string you can append it to a list and the next time, if it is in the list, don't format it. I don't think you should have a problem with matching the wrong key. Could you post the code you used?
Aug 5 '07 #7
bvdet
2,851 Expert Mod 2GB
I tried your suggestion but recieved the same result. Is there a statement I could write that checks each line for capital A,T,C, or G? If I could put that into an 'if' statement then maybe it wouldn't re-format a line that has already been formatted. Of course then there would be the problem of did it replace it with Mus..R-1, or with Mus..R-106a, etc. Is there an order, or is it random because I am using a dictionary?

Mark
Try this regex solution to see if it works for you. It matches the empty string at the beginning or end of a word. Then the string is split on the space character and should replace only on a full match:
Expand|Select|Wrap|Line Numbers
  1. print dd
  2.  
  3. import re
  4.  
  5. s1 = "browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21\nbrowser details MusmusculusmiR-314-5p 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details MusmusculusmiR-31 21 1 21 21 100.0% 22 + 46884872 46884892 21"
  6. patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b              # Matches "Musmusculuslet-" followed by alphanumeric
  7.                                                                 # characters at word borderlines
  8.                       |\bMusmusculusmiR-[0-9a-z\-]+\b           # Matches "MusmusculusmiR-" followed by alphanumeric
  9.                                                                 # characters or dashes at word borderlines
  10.                       ''', re.VERBOSE)
  11.  
  12. strList = patt.findall(s1)
  13. s2 = s1
  14. for item in strList:
  15.     if dd.has_key(item):
  16.         s2List = s2.split(' ')
  17.         idx = s2List.index(item)
  18.         s2List[idx] = '%s %s' % (item, dd[item])
  19.         s2 = ' '.join(s2List)
  20.  
  21. print s2
Output:
>>> {'MusmusculusmiR-1': 'UGGAAUGUAAAGAAGUAUGUA', 'MusmusculusmiR-314-5p': 'UGAGGUAGUAGUUUGUACAGU', 'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU', 'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU', 'MusmusculusmiR-31': 'UGGAAUGUAAAGAAGUAUGUA'}
browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i UGAGGUAGUAGUUUGUGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21
browser details MusmusculusmiR-314-5p UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details MusmusculusmiR-31 UGGAAUGUAAAGAAGUAUGUA 21 1 21 21 100.0% 22 + 46884872 46884892 21

>>>
Aug 5 '07 #8
The code I am currently using and still getting the same problem:

Expand|Select|Wrap|Line Numbers
  1. def EditFile ( s1, dd ):
  2.     print dd
  3.     import re
  4.         patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
  5.     strList = patt.findall(s1)
  6.     s2 = s1
  7.     for item in strList:
  8.         if dd.has_key(item):
  9.             s2List = s2.split(' ')
  10.             idx = s2List.index(item)
  11.             s2List[idx] = '%s %s' % (item, dd[item]))
  12.             s2 = ' '.join(s2List)
  13.     print s2
  14. ##      print
  15. ##    s3 = s1
  16. ##    words = s3.split()
  17. ##    key = words[2]
  18. ##    for key in dd:
  19. ##        if key in s3:
  20. ##            s3 = s3.replace(key, '%s %s' % (key, dd[key]))
  21. ##    print s3
  22.     f = open('editted BLAT Search Results-Mouse.txt', 'w')
  23.     f.writelines(s2)
  24.     f.close()
  25.     return s2
  26.  
  27.  
It seems to choke on the following matches:

MusmusculusmiR-1 is read when it reads MusmusculusmiR-124a, thus it gets written twice with two separate values from two separate keys:

UGGAAUGUAAAGAAGUAUGUA24a
followed by UAAGGCACGCGGUGAAUGCC

The first is the value for key MusmusculusmiR-1(without that 24a that it at the end), the second is the value for key MusmusculusmiR-124a

It is also still choking on the following matches:

MusmusculusmiR-126-5p (weird, since it doesn't mind MusmusculusmiR-126-3p)
MusmusculusmiR-127, MusmusculusmiR-128a, MusmusculusmiR-130, MusmusculusmiR-129-5p
and MusmusculusmiR-324-3p because there is a MusmusculusmiR-32.

I ran the above code and got the same results I did with the previous code, which is strange. Did I miss something in my transcription? what exactly does the \b do in your code?

Mark
Aug 6 '07 #9
Okay, now I am getting a new error:

Traceback (most recent call last):
File "<pyshell#28>", line 1, in <module>
newfile = EditFile ( data, mouse )
File "BatchEditor.py", line 45, in EditFile
patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
File "C:\Python25\lib\re.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python25\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: nothing to repeat

I edited the code, so it is now like this:

Expand|Select|Wrap|Line Numbers
  1. def EditFile ( s1, dd ):
  2.  
  3.     #print dd
  4.     import re
  5.     patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
  6.     strList = patt.findall(s1)
  7.     s2 = s1
  8.     print strList
  9.     for item in strList:
  10.         if dd.has_key(item):
  11.             s2List = s2.split(' ')
  12.             idx = s2List.index(item)
  13.             s2List[idx] = '%s %s' % (item, dd[item])
  14.             s2 = ' '.join(s2List)
  15.     print s2
  16. ##      print
  17. ##    s3 = s1
  18. ##    words = s3.split()
  19. ##    key = words[2]
  20. ##    for key in dd:
  21. ##        if key in s3:
  22. ##            s3 = s3.replace(key, '%s %s' % (key, dd[key]))
  23. ##    print s3
  24.     f = open('editted BLAT Search Results-Mouse.txt', 'w')
  25.     f.writelines(s2)
  26.     f.close()
  27.     return s2
  28.  
I should ask is that a single quote followed by a double-quote at the beginning and end of the re.compile statement? I had it set as three single-quotes and then realized that is probably wrong.

Mark
Aug 6 '07 #10
bvdet
2,851 Expert Mod 2GB
Okay, now I am getting a new error:

Traceback (most recent call last):
File "<pyshell#28>", line 1, in <module>
newfile = EditFile ( data, mouse )
File "BatchEditor.py", line 45, in EditFile
patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
File "C:\Python25\lib\re.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python25\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: nothing to repeat

I edited the code, so it is now like this:

Expand|Select|Wrap|Line Numbers
  1. def EditFile ( s1, dd ):
  2.  
  3.     #print dd
  4.     import re
  5.     patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b+|\bMusmusculusmiR-[0-9a-z\-]+\b''', re.VERBOSE)
  6.     strList = patt.findall(s1)
  7.     s2 = s1
  8.     print strList
  9.     for item in strList:
  10.         if dd.has_key(item):
  11.             s2List = s2.split(' ')
  12.             idx = s2List.index(item)
  13.             s2List[idx] = '%s %s' % (item, dd[item])
  14.             s2 = ' '.join(s2List)
  15.     print s2
  16. ##      print
  17. ##    s3 = s1
  18. ##    words = s3.split()
  19. ##    key = words[2]
  20. ##    for key in dd:
  21. ##        if key in s3:
  22. ##            s3 = s3.replace(key, '%s %s' % (key, dd[key]))
  23. ##    print s3
  24.     f = open('editted BLAT Search Results-Mouse.txt', 'w')
  25.     f.writelines(s2)
  26.     f.close()
  27.     return s2
  28.  
I should ask is that a single quote followed by a double-quote at the beginning and end of the re.compile statement? I had it set as three single-quotes and then realized that is probably wrong.

Mark
The error you received is caused by an additional '+' character after '\b'. Since '\b' just matches the whitespace between words, there is nothing to repeat.

Three single quotes or three double quotes would be correct.
Aug 6 '07 #11
I re-copied and re-pasted the code, and it is working much better now. The program is no longer splitting the keys, but it is pasting multiple values back-to-back instead of next to the key for multiple matches:

browser details MusmusculusmiR-450b1 AUUGGGAACAUUUUGCAUGCAU AUUGGGAACAUUUUGCAUGCAU 20 1 22 22 95.5% Un.003.104 - 440337 440358 22
browser details MusmusculusmiR-450b1 20 1 22 22 95.5% Un.003.104 - 440652 440673 22

This is something I can live with, unless there is some easy way to fix it. I am going to import the whole thing into Access for a database when I am through.

Thanks again,
Mark
Aug 6 '07 #12
bvdet
2,851 Expert Mod 2GB
I re-copied and re-pasted the code, and it is working much better now. The program is no longer splitting the keys, but it is pasting multiple values back-to-back instead of next to the key for multiple matches:

browser details MusmusculusmiR-450b1 AUUGGGAACAUUUUGCAUGCAU AUUGGGAACAUUUUGCAUGCAU 20 1 22 22 95.5% Un.003.104 - 440337 440358 22
browser details MusmusculusmiR-450b1 20 1 22 22 95.5% Un.003.104 - 440652 440673 22

This is something I can live with, unless there is some easy way to fix it. I am going to import the whole thing into Access for a database when I am through.

Thanks again,
Mark
Do you have multiple occurrences of the key 'MusmusculusmiR-450b1' in the string? That would explain the double values. Try this:
Expand|Select|Wrap|Line Numbers
  1. print dd
  2.  
  3. import re
  4.  
  5. s1 = "browser details Musmusculuslet-7g 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details Musmusculuslet-7i 21 1 21 21 100.0% 5 + 50605174 50605194 21\nbrowser details MusmusculusmiR-314-5p 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details MusmusculusmiR-31 21 1 21 21 100.0% 22 + 46884872 46884892 21\nbrowser details MusmusculusmiR-31 21 1 21 21 100.0% 22 + 46884872 46884892 21"
  6. patt = re.compile(r'''\bMusmusculuslet-[0-9a-z]+\b              # Matches "Musmusculuslet-" followed by alphanumeric
  7.                                                                 # characters at word borderlines
  8.                       |\bMusmusculusmiR-[0-9a-z\-]+\b           # Matches "MusmusculusmiR-" followed by alphanumeric
  9.                                                                 # characters or dashes at word borderlines
  10.                       ''', re.VERBOSE)
  11.  
  12. sList = s1.split('\n')
  13. outList = []
  14. for item in sList:
  15.     tem = patt.search(item)
  16.     if tem:
  17.         if dd.has_key(tem.group(0)):
  18.             item = item.replace(tem.group(0), '%s %s' % (tem.group(0), dd[tem.group(0)]))
  19.     outList.append(item)
  20.  
  21. s2 = '\n'.join(outList)
  22. print s2
>>> {'MusmusculusmiR-1': 'UGGAAUGUAAAGAAGUAUGUA', 'MusmusculusmiR-314-5p': 'UGAGGUAGUAGUUUGUACAGU', 'Musmusculuslet-7i': 'UGAGGUAGUAGUUUGUGCUGU', 'Musmusculuslet-7g': 'UGAGGUAGUAGUUUGUACAGU', 'MusmusculusmiR-31': 'UGGAAUGUAAAGAAGUAUGUA'}
browser details Musmusculuslet-7g UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details Musmusculuslet-7i UGAGGUAGUAGUUUGUGCUGU 21 1 21 21 100.0% 5 + 50605174 50605194 21
browser details MusmusculusmiR-314-5p UGAGGUAGUAGUUUGUACAGU 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details MusmusculusmiR-31 UGGAAUGUAAAGAAGUAUGUA 21 1 21 21 100.0% 22 + 46884872 46884892 21
browser details MusmusculusmiR-31 UGGAAUGUAAAGAAGUAUGUA 21 1 21 21 100.0% 22 + 46884872 46884892 21
>>>
Aug 6 '07 #13
That has done the trick! Thanks for all of the help, I didn't even know about Python having a regex module. Still wondering why the triple quotes, but I will go to python.org and read up on it.

Mark
Aug 6 '07 #14

Sign in to post your reply or Sign up for a free account.

Similar topics

12
by: Matthew Wilson | last post by:
I'm playing around with genetic algorithms and I want to write a function that mutates an integer by iterating across the bits, and about 1 in 10 times, it should switch a zero to a one, or a one...
7
by: Dave Hansen | last post by:
OK, first, I don't often have the time to read this group, so apologies if this is a FAQ, though I couldn't find anything at python.org. Second, this isn't my code. I wouldn't do this. But a...
6
by: Gustaf Liljegren | last post by:
I ran into this problem today: I got an array with Account objects. I need to iterate through this array to supplement the accounts in the array with more data. But the compiler complains when I...
2
by: Nick | last post by:
Hi all, Just a quick question. I have a class that exposes a number of fields (which are themselves custom types) through public properties. At run time, I have an object whom I'd like to...
13
by: kj | last post by:
Is there a special pythonic idiom for iterating over a list (or tuple) two elements at a time? I mean, other than for i in range(0, len(a), 2): frobnicate(a, a) ?
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.