424,952 Members | 1,438 Online Need help? Post your question and get tips & solutions from a community of 424,952 IT Pros & Developers. It's quick & easy.

Looping two files and count string occurrences of 2nd file in lines of first file

 P: 1 I need to generate permutation of some words (A T G C ) actually nucleotides for di-composition (eg AA AT AG AC), tri-composition (AAA AAT AAC AAG), tetra, penta etc (one at a time) and then check in the other file that contains sequences with some values the count of occurrences of each permutation. I generated the permutation list. Now I need to loop through the sequences only (splitting the sequences from values) for counting each of the permutation generated above and get the output in new file. But I'm getting the answer for only one sequence and not for the other sequences. Logic of the programme i tried to follow is : Generate the permutations of ATCG in a file1 (e.g. AT AG AC AA ...) Read the generated file1 and sequence#value file (DNA_seq_val.txt) Read the sequences and separate the sequences form values Loop through the sequences for the permutations and print their occurrence with values (each separated with comma) in results file. Input test file name is DNA_seq_val.txt AAAATTTT#99 CCCCGGGG#77 ATATATCGCGCG#88 *Output I got is -- 2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT 77 CCCCGGGG 88 ATATATCGCGCG Output Needed is 2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,77 CCCCGGGGx x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,88 ATATATCGCGCG (where x= corresponding counts as in first line) Expand|Select|Wrap|Line Numbers from itertools import product import os   f2 = open('TRYYY', 'a')   #********Generate the permutations start******** per = product('ACGT', repeat=2)    # ATGC =nucleotides; 2= for di ntd(replace 2 with 3 fir tri ntds and so on) f = open('myfile', 'w') p = "" for p in per:     p = "".join(p)     f.write(p + "\n") f.close()   #********Generate the permutations ENDS********   with open('DNA_seq_val.txt', 'r+') as SEQ, open('myfile', 'r+') as TET: #open two files     SEQ_lines = sum(1 for line in open('DNA_seq_val.txt'))        #count lines in sequences file     #print (SEQ_lines)     compo_lines = sum(1 for line in open('myfile'))        #count lines in composition     #print (compo_lines)     for lines in SEQ:         line,val1 = lines.split("#")         val2 = val1.rstrip('\n')         val = str(val2)         line = line.rstrip('\n')         length =len(line)         #print (line)                 #print (val)         LIN = line, val         #print (LIN)         newstr = "".join((line))         print (newstr)         #while True:        # infinte loop         for PER in TET:             #print (line)             PER = PER.rstrip('\n')             length2 =len(PER)             #print (length2)             #print (line) #            print (PER)             C_PER  = str(line.count(PER)) #            print (C_PER)             for R in C_PER:                 R1 = "".join(R)                 f2.write(R1+ ",")         f2.write(val,)         f2.write('\t')         f2.write(line)         f2.write('\n')     #exit()   Mar 1 '18 #1 