471,092 Members | 1,098 Online

# Looping two files and count string occurrences of 2nd file in lines of first file 1
I need to generate permutation of some words (A T G C ) actually nucleotides for di-composition (eg AA AT AG AC), tri-composition (AAA AAT AAC AAG), tetra, penta etc (one at a time) and then check in the other file that contains sequences with some values the count of occurrences of each permutation. I generated the permutation list. Now I need to loop through the sequences only (splitting the sequences from values) for counting each of the permutation generated above and get the output in new file. But I'm getting the answer for only one sequence and not for the other sequences.

Logic of the programme i tried to follow is :

Generate the permutations of ATCG in a file1 (e.g. AT AG AC AA ...)
Read the generated file1 and sequence#value file (DNA_seq_val.txt)
Read the sequences and separate the sequences form values
Loop through the sequences for the permutations and print their occurrence with values (each separated with comma) in results file.
Input test file name is DNA_seq_val.txt
AAAATTTT#99
CCCCGGGG#77
ATATATCGCGCG#88

*Output I got is --
2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT
77 CCCCGGGG
88 ATATATCGCGCG

Output Needed is 2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT
x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,77 CCCCGGGGx
x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,88 ATATATCGCGCG
(where x= corresponding counts as in first line)

Expand|Select|Wrap|Line Numbers
1. from itertools import product
2. import os
3.
4. f2 = open('TRYYY', 'a')
5.
6. #********Generate the permutations start********
7. per = product('ACGT', repeat=2)    # ATGC =nucleotides; 2= for di ntd(replace 2 with 3 fir tri ntds and so on)
8. f = open('myfile', 'w')
9. p = ""
10. for p in per:
11.     p = "".join(p)
12.     f.write(p + "\n")
13. f.close()
14.
15. #********Generate the permutations ENDS********
16.
17. with open('DNA_seq_val.txt', 'r+') as SEQ, open('myfile', 'r+') as TET: #open two files
18.     SEQ_lines = sum(1 for line in open('DNA_seq_val.txt'))        #count lines in sequences file
19.     #print (SEQ_lines)
20.     compo_lines = sum(1 for line in open('myfile'))        #count lines in composition
21.     #print (compo_lines)
22.     for lines in SEQ:
23.         line,val1 = lines.split("#")
24.         val2 = val1.rstrip('\n')
25.         val = str(val2)
26.         line = line.rstrip('\n')
27.         length =len(line)
28.         #print (line)
29.         #print (val)
30.         LIN = line, val
31.         #print (LIN)
32.         newstr = "".join((line))
33.         print (newstr)
34.         #while True:        # infinte loop
35.         for PER in TET:
36.             #print (line)
37.             PER = PER.rstrip('\n')
38.             length2 =len(PER)
39.             #print (length2)
40.             #print (line)
41. #            print (PER)
42.             C_PER  = str(line.count(PER))
43. #            print (C_PER)
44.             for R in C_PER:
45.                 R1 = "".join(R)
46.                 f2.write(R1+ ",")
47.         f2.write(val,)
48.         f2.write('\t')
49.         f2.write(line)
50.         f2.write('\n')
51.     #exit()
52.
Mar 1 '18 #1
1 1069 dwblas
626 Expert 512MB
*Output I got is --
2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT
77 CCCCGGGG
88 ATATATCGCGCG

Output Needed is 2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT
x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,77 CCCCGGGGx
x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,88 ATATATCGCGCG
(where x= corresponding counts as in first line)
That's nice, but how are we to help you get this from an unknown input and what do all these numbers mean, 2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2, and what about x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x??? Counting occurrences is relatively simple but there just isn't enough info here.
Mar 1 '18 #2

 3 posts views Thread by Pernell Williams | last post: by 12 posts views Thread by Woodster | last post: by 6 posts views Thread by vasilijepetkovic | last post: by 9 posts views Thread by Paul Kuebler | last post: by 2 posts views Thread by OutdoorGuy | last post: by 2 posts views Thread by Jeff Kish | last post: by 9 posts views Thread by Morris Neuman | last post: by 6 posts views Thread by notnorwegian | last post: by 5 posts views Thread by DeepNik | last post: by 1 post views Thread by Ormazd | last post: by reply views Thread by Trc0g | last post: by reply views Thread by MartianBanks | last post: by reply views Thread by anoble1 | last post: by reply views Thread by vishwasr | last post: by reply views Thread by autodeveloper | last post: by 1 post views Thread by Chris3020 | last post: by 1 post views Thread by Osama3bdelwahab | last post: by reply views Thread by Synco | last post: by 6 posts views Thread by jithb4u | last post: by