By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,737 Members | 1,280 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,737 IT Pros & Developers. It's quick & easy.

Matching parts of lines in a file (python)

P: 3
I currently have a list of genes in a file. Each line has a chromosome with it's information. Such an entry appears as:

NM_198212 chr7 + 115926679 115935830 115927071 11593344 2 115926679,'115933260', 115927221,'115935830',

The sequence for the chromosome starts at base 115926679 and continues up to(but not including) base 115935830

If we want the spliced sequence, we use the exons.The first extends from 115926679 to 155927221, and the second goes from '115933260' to '115935830'

However, I have run across a problem when on a complementary sequence such as:

NM_001005286 chr1 - 245941755 245942680 245941755 245942680 1 245941755, '245942680'

Since column 3 is a '-', these coordinates are in reference to the anti-sense strand (the complement to the strand). The first base (in bold) matches the last base on the sense strand (in italics). Since the file only has the sense stand, I need to try to translate coordinates on the anti-sense strand to the sense strand, pick out the right sequence and then reverse-complement it.

That said, I have only been programming for about half a year and and not sure how to starts going about doing this.

I have written a regular expression:
'(NM_\d+)\s+(chr\d+)([(\+)|(-)])\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+), (\d+),s+(\d+),(\d+),'

I just made some bold, some italics, and some in quotes to show the different parts I was trying to use

but am now unsure as to how to start this function... If anyone can help me get started at all on this, perhaps making me see how to do this, I would very much appreciate it.
Apr 21 '12 #1
Share this Question
Share on Google+
1 Reply

Expert 100+
P: 626
You want to split on the empty space and go from there.
Expand|Select|Wrap|Line Numbers
  1. rec="NM_198212 chr7 + 115926679 115935830 115927071 11593344 2"
  2. print rec.split() 
Tutorial on using lists.
Apr 22 '12 #2

Post your reply

Sign in to post your reply or Sign up for a free account.