hi all,
I have an assignment were I have to check multiple human dna sequences that look like this
ORIGIN - 1 ttgctgcaga cgctcacccc agacactcac tgcaccggag tgagcgcgac catcatgtcc
-
61 atgctcgtgg tctttctctt gctgtggggt gtcacctggg gcccagtgac agaagcagcc
-
121 atattttatg agacgcagcc cagcctgtgg gcagagtccg aatcactgct gaaacccttg
-
181 gccaatgtga cgctgacgtg ccaggcccac ctggagactc cagacttcca gctgttcaag
-
241 aatggggtgg cccaggagcc tgtgcacctt gactcacctg ccatcaagca ccagttcctg
-
301 ctgacgggtg acacccaggg ccgctaccgc tgccgctcgg gcttgtccac aggatggacc //
the dan starts at origin and ends at // ,
What I have to find are sequences that represent diseases like for example this one :
gcttgtccac atattttatg agacgcagcc
Now this isn't that hard but the problem is that the to be found sequences can be expanded over several lines and i have to be able to return in witch lines they were encountered and at witch position.
thanx in advance
adriaan
6 1234 eWish 971
Recognized Expert Contributor
Welcome to TSDN!
Is this homework? If so,please read our Posting Guidlines. Please post your code that you have tried. Also, have a look at CPAN for a module to assist in what you are doing.
--Kevin
thanks for the reply
youre link to http://serach.cpan.org/
doesn't seem to work so I don't know what you mean with the module thing.
It's not really homework, it's more of an exercise I was advised to try out
on.
this the code i've written to get the evil sequences out of the database,
I don't really have any usefull code yet on the analyzing section as Im still trying to figure out how to do it - sub haalDataBaseOp
-
{
-
-
open (DataBase,"database.txt");
-
-
@data = <DataBase>;
-
-
foreach $ziekte (@data)
-
{
-
-
-
# steek alle ziekte codes in een array
-
if($ziekte =~ m/(\b[ctga]+\b)(.*)/)
-
{
-
-
$code = $1.$2;
-
-
# print $code."\n";
-
-
}
-
-
# haal het nummer en de naam uit de string
-
if($ziekte =~ m/(\d+)(.*?)(\b[gtac]+\b)/)
-
{
-
-
# nu maken we een hash met het nummer als keyword naar de ziekte naam
-
$ziektenaam{$1} = $2;
-
-
print $1."\n";
-
-
print $code."\n";
-
-
print $2."\n";
-
-
# we maken ook een hash waarbij het nummer verwijst naar de gevonden ziekte codes
-
$ziektecode{$1} = $code;
-
-
}
-
-
}
-
-
}
oh yes I'm a Belgian, so I nativly speak dutch and use that in my comments
eWish 971
Recognized Expert Contributor
Sorry, about the link to CPAN. I have corrected it. There are serveral bioinformatics modules available that would be designed to handle your request. Also, check out BioPerl.org, in the long run it will be a better solution.
--Kevin
As a reply to your initial posting where you wanted to search the pattern:
gcttgtccac atattttatg agacgcagcc (e.g) which can extend across multpile lines and to return the line number and position, the following code works: -
$/ ="//"; ## input record separator: each sequence ends with //
-
open(DB,"database.txt") or die "sorry:$!";
-
$pos=0;
-
$line=1;
-
while(<DB>)
-
{
-
## \1 is to back refer pattern inside parantheses, which searches for
-
# newline followed by digits
-
while(/\bgcttgtccac\b(\s*\n\d+)?\s+\batattttatg\b\1?\s+\bagacgcagcc\b/g)
-
{
-
$prev=$`; # get the pattern preceeding your match
-
$line++ while($prev=~/(\n)/g); # increment whenever newline occurs
-
@pos= split//,$prev;
-
foreach (@pos)
-
{$pos++ if(/[atgc]/);} # get the number of residues preceeding match
-
print "\n line:$line";
-
print "\n position: $pos";
-
$line=1; $pos=0; # reinitialize variables
-
}
-
-
}
-
Regards,
Nithin
numberwhun 3,509
Recognized Expert Moderator Specialist
As a reply to your initial posting where you wanted to search the pattern:
gcttgtccac atattttatg agacgcagcc (e.g) which can extend across multpile lines and to return the line number and position, the following code works: -
$/ ="//"; ## input record separator: each sequence ends with //
-
open(DB,"database.txt") or die "sorry:$!";
-
$pos=0;
-
$line=1;
-
while(<DB>)
-
{
-
## \1 is to back refer pattern inside parantheses, which searches for
-
# newline followed by digits
-
while(/\bgcttgtccac\b(\s*\n\d+)?\s+\batattttatg\b\1?\s+\bagacgcagcc\b/g)
-
{
-
$prev=$`; # get the pattern preceeding your match
-
$line++ while($prev=~/(\n)/g); # increment whenever newline occurs
-
@pos= split//,$prev;
-
foreach (@pos)
-
{$pos++ if(/[atgc]/);} # get the number of residues preceeding match
-
print "\n line:$line";
-
print "\n position: $pos";
-
$line=1; $pos=0; # reinitialize variables
-
}
-
-
}
-
Regards,
Nithin
First, when posting code into the forums, please be sure and use the proper code tags. That way, we moderators don't have to clean up behind you and add them to what you just posted. (As I have done here).
Next, just out of curiosity, have you checked out the bioperl website? I have seen this site referenced to others working with genomics and such and they ahve always said it was very helpful.
Regards,
Jeff
Hi Jeff,
I'm sorry for that. I have checked bioperl website, that's indeed very helpful in the long run.
Regards,
Nithin
Sign in to post your reply or Sign up for a free account.
Similar topics |
by: Harry Pehkonen |
last post by:
In order to leave my classes open to receiving objects that are
string-like, list-like, dictionary-like, etc, and not necessarily
_exactly_ the built-in string, list, dictionary, etc types, I have...
|
by: temp |
last post by:
Hi All,
I wonder could someone help me with this?
What I want to do is search through a list of letters and look for
adjacent groups of letters that form sequences, not in the usual way of...
|
by: Vilson farias |
last post by:
Greetings,
I'm getting a big performance problem and I would like to ask you what
would be the reason, but first I need to explain how it happens.
Let's suppose I can't use sequences (it seams...
|
by: Hemant Shah |
last post by:
Folks,
I have created bunch of sequences in DB2 7.2.
How do I list all the defined sequences and if possible their current values?
Thanks.
--
|
by: RoSsIaCrIiLoIA |
last post by:
Do you know how to write a self-checking program in standard C?
Do I can think that if I write in a file.c
static g="1234567";
in the file.exe (or file) there is in some place...
| |
by: Ken |
last post by:
HI:
I'm reading a string that will be displayed in a MessageBox from a resource
file. The string in the resource file contains escape sequences so they
will be broken up into multiple lines. ...
|
by: Bruno Baguette |
last post by:
Hello,
I have to design a table wich will store some action reports. Each
report have an ID like this 1/2004, 2/2004, ... and each years, they
restart to 1 (1/2004, 1/2005, 1/2006,...).
So, I...
|
by: Michael Fuhr |
last post by:
I'd like to propose that certain GRANTs on a table cascade to the
table's implicit sequences. In the current implementation (as of
7.4.5 and 8.0.0beta3), a table owner must typically issue GRANT...
|
by: Brendan |
last post by:
There must be an easy way to do this:
For classes that contain very simple data tables, I like to do
something like this:
class Things(Object):
def __init__(self, x, y, z):
#assert that x,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |