473,499 Members | 1,598 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Checking for bad dna sequences

7 New Member
hi all,
I have an assignment were I have to check multiple human dna sequences that look like this

ORIGIN
Expand|Select|Wrap|Line Numbers
  1. 1 ttgctgcaga cgctcacccc agacactcac tgcaccggag tgagcgcgac catcatgtcc
  2. 61 atgctcgtgg tctttctctt gctgtggggt gtcacctggg gcccagtgac agaagcagcc
  3. 121 atattttatg agacgcagcc cagcctgtgg gcagagtccg aatcactgct gaaacccttg
  4. 181 gccaatgtga cgctgacgtg ccaggcccac ctggagactc cagacttcca gctgttcaag
  5. 241 aatggggtgg cccaggagcc tgtgcacctt gactcacctg ccatcaagca ccagttcctg
  6. 301 ctgacgggtg acacccaggg ccgctaccgc tgccgctcgg gcttgtccac aggatggacc //
the dan starts at origin and ends at // ,
What I have to find are sequences that represent diseases like for example this one :
gcttgtccac atattttatg agacgcagcc

Now this isn't that hard but the problem is that the to be found sequences can be expanded over several lines and i have to be able to return in witch lines they were encountered and at witch position.

thanx in advance

adriaan
Dec 15 '07 #1
6 1234
eWish
971 Recognized Expert Contributor
Welcome to TSDN!

Is this homework? If so,please read our Posting Guidlines. Please post your code that you have tried. Also, have a look at CPAN for a module to assist in what you are doing.

--Kevin
Dec 15 '07 #2
adriaan
7 New Member
thanks for the reply
youre link to
http://serach.cpan.org/
doesn't seem to work so I don't know what you mean with the module thing.
It's not really homework, it's more of an exercise I was advised to try out
on.
this the code i've written to get the evil sequences out of the database,
I don't really have any usefull code yet on the analyzing section as Im still trying to figure out how to do it

Expand|Select|Wrap|Line Numbers
  1. sub haalDataBaseOp
  2. {
  3.  
  4.     open (DataBase,"database.txt");
  5.  
  6.     @data = <DataBase>;
  7.  
  8.     foreach $ziekte (@data)
  9.     {
  10.  
  11.  
  12.         # steek alle ziekte codes in een array
  13.         if($ziekte =~ m/(\b[ctga]+\b)(.*)/)
  14.         {
  15.  
  16.             $code = $1.$2; 
  17.  
  18.             #    print $code."\n";
  19.  
  20.         }
  21.  
  22.         # haal het nummer en de naam uit de string
  23.         if($ziekte =~ m/(\d+)(.*?)(\b[gtac]+\b)/)
  24.         {
  25.  
  26.             # nu maken we een hash met het nummer als keyword naar de ziekte naam
  27.             $ziektenaam{$1} = $2;
  28.  
  29.             print $1."\n";
  30.  
  31.             print $code."\n";
  32.  
  33.             print $2."\n";
  34.  
  35.             # we maken ook een hash waarbij het nummer verwijst naar de gevonden ziekte codes
  36.             $ziektecode{$1} = $code;
  37.  
  38.         }
  39.  
  40.     }
  41.  
  42. }
oh yes I'm a Belgian, so I nativly speak dutch and use that in my comments
Dec 15 '07 #3
eWish
971 Recognized Expert Contributor
Sorry, about the link to CPAN. I have corrected it. There are serveral bioinformatics modules available that would be designed to handle your request. Also, check out BioPerl.org, in the long run it will be a better solution.

--Kevin
Dec 15 '07 #4
nithinpes
410 Recognized Expert Contributor
As a reply to your initial posting where you wanted to search the pattern:
gcttgtccac atattttatg agacgcagcc (e.g) which can extend across multpile lines and to return the line number and position, the following code works:
Expand|Select|Wrap|Line Numbers
  1. $/ ="//";  ## input record separator: each sequence ends with //
  2. open(DB,"database.txt") or die "sorry:$!";
  3. $pos=0;
  4. $line=1;
  5. while(<DB>)
  6. {
  7.    ## \1 is to back refer pattern inside parantheses, which searches for 
  8.    # newline  followed by digits 
  9.    while(/\bgcttgtccac\b(\s*\n\d+)?\s+\batattttatg\b\1?\s+\bagacgcagcc\b/g)
  10. {
  11.     $prev=$`;                              # get the pattern preceeding your match
  12.     $line++ while($prev=~/(\n)/g);  # increment whenever newline occurs
  13.     @pos= split//,$prev;
  14.     foreach (@pos)
  15.         {$pos++ if(/[atgc]/);}         # get the number of residues preceeding match
  16.     print "\n line:$line";
  17.     print "\n position: $pos";
  18.     $line=1;   $pos=0;                # reinitialize variables
  19. }
  20.  
  21. }
  22.  
Regards,
Nithin
Dec 24 '07 #5
numberwhun
3,509 Recognized Expert Moderator Specialist
As a reply to your initial posting where you wanted to search the pattern:
gcttgtccac atattttatg agacgcagcc (e.g) which can extend across multpile lines and to return the line number and position, the following code works:
Expand|Select|Wrap|Line Numbers
  1. $/ ="//";  ## input record separator: each sequence ends with //
  2. open(DB,"database.txt") or die "sorry:$!";
  3. $pos=0;
  4. $line=1;
  5. while(<DB>)
  6. {
  7.    ## \1 is to back refer pattern inside parantheses, which searches for 
  8.    # newline  followed by digits 
  9.    while(/\bgcttgtccac\b(\s*\n\d+)?\s+\batattttatg\b\1?\s+\bagacgcagcc\b/g)
  10. {
  11.     $prev=$`;                              # get the pattern preceeding your match
  12.     $line++ while($prev=~/(\n)/g);  # increment whenever newline occurs
  13.     @pos= split//,$prev;
  14.     foreach (@pos)
  15.         {$pos++ if(/[atgc]/);}         # get the number of residues preceeding match
  16.     print "\n line:$line";
  17.     print "\n position: $pos";
  18.     $line=1;   $pos=0;                # reinitialize variables
  19. }
  20.  
  21. }
  22.  
Regards,
Nithin
First, when posting code into the forums, please be sure and use the proper code tags. That way, we moderators don't have to clean up behind you and add them to what you just posted. (As I have done here).

Next, just out of curiosity, have you checked out the bioperl website? I have seen this site referenced to others working with genomics and such and they ahve always said it was very helpful.

Regards,

Jeff
Dec 24 '07 #6
nithinpes
410 Recognized Expert Contributor
Hi Jeff,

I'm sorry for that. I have checked bioperl website, that's indeed very helpful in the long run.

Regards,
Nithin
Dec 26 '07 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

3
1500
by: Harry Pehkonen | last post by:
In order to leave my classes open to receiving objects that are string-like, list-like, dictionary-like, etc, and not necessarily _exactly_ the built-in string, list, dictionary, etc types, I have...
4
2304
by: temp | last post by:
Hi All, I wonder could someone help me with this? What I want to do is search through a list of letters and look for adjacent groups of letters that form sequences, not in the usual way of...
10
2753
by: Vilson farias | last post by:
Greetings, I'm getting a big performance problem and I would like to ask you what would be the reason, but first I need to explain how it happens. Let's suppose I can't use sequences (it seams...
4
51833
by: Hemant Shah | last post by:
Folks, I have created bunch of sequences in DB2 7.2. How do I list all the defined sequences and if possible their current values? Thanks. --
6
2375
by: RoSsIaCrIiLoIA | last post by:
Do you know how to write a self-checking program in standard C? Do I can think that if I write in a file.c static g="1234567"; in the file.exe (or file) there is in some place...
3
5706
by: Ken | last post by:
HI: I'm reading a string that will be displayed in a MessageBox from a resource file. The string in the resource file contains escape sequences so they will be broken up into multiple lines. ...
18
3947
by: Bruno Baguette | last post by:
Hello, I have to design a table wich will store some action reports. Each report have an ID like this 1/2004, 2/2004, ... and each years, they restart to 1 (1/2004, 1/2005, 1/2006,...). So, I...
5
4956
by: Michael Fuhr | last post by:
I'd like to propose that certain GRANTs on a table cascade to the table's implicit sequences. In the current implementation (as of 7.4.5 and 8.0.0beta3), a table owner must typically issue GRANT...
8
1869
by: Brendan | last post by:
There must be an easy way to do this: For classes that contain very simple data tables, I like to do something like this: class Things(Object): def __init__(self, x, y, z): #assert that x,...
0
7171
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7220
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6893
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7386
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5468
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
4918
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4599
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3090
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
295
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.