472,123 Members | 1,341 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,123 software developers and data experts.

how to access the individual elements of a matrix in python

111 100+
my file is of the form

01 "\t" 10.19 "\t" 0.00 "\t" 10.65
02 "\t" 11.19 "\t" 10.12 "\t" 99.99


and i need to access the individual floating point numbers from it!
say for ex. the first no is 10.19.. i want to access this and add one to it.
Expand|Select|Wrap|Line Numbers
  1. filename=open("half.transfac","r")
  2. file_content=filename.readlines()
  3. sam=""
  4. for line in file_content:
  5.     for char in line:
  6.         if char=="\tchar\t\n":
  7.             sam+=char
  8.             print sam
  9.  
for char accesss every digit and not the numbers{"10.19","0.00")etc.. how do i do this..help
Jul 5 '07
60 4319
aboxylica
111 100+
Thanks for that.Now what I am trying to do is that instead of the sequence which i had.. i am generting a random sequence and calculating the score for that.
what exactly should happen is that
supposing my sequence contains is 50 alphabets..for each iteration it should consider 16 alphabets..so the for the first iteration it should be for first 16 alphabets,then it should be(leaving the first) the next sixteen..and so on..until it the sequence remains of the length sixteen(length less than sixteen is to be omitted)
so i have the program which is goin to calculate the score(the same thing i kept calculating using my input file).this score is called "res" in my code.
and for the same sequence I am calculating another "score" by giving specific values for each alphabet,then i am calculating the log(res/score)..I get an error which doesnt make any sense to me!please tell me what change i should do
here is my code:
Expand|Select|Wrap|Line Numbers
  1. from math import *
  2. import random
  3. f=open("deeps1.txt","r")
  4. line=f.next()
  5. while not line.startswith('PO'):
  6.     line=f.next()
  7.  
  8. headerlist=line.strip().split()[1:]
  9. linelist=[]
  10.  
  11.  
  12. line=f.next().strip()
  13. while not line.startswith('/'):
  14.     if line != '':
  15.         linelist.append(line.strip().split())
  16.     line=f.next().strip()
  17.  
  18. keys=[i[0] for i in linelist]
  19. values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
  20. array={}
  21. linedict=dict(zip(keys,values))
  22. keys = linedict.keys()
  23. keys.sort()
  24. for key in keys:
  25.     array=[key,linedict[key]]
  26.  
  27. datadict={}
  28. datadict1={}
  29. for i,item in enumerate(headerlist):
  30.     datadict[item]={}
  31.     for key_ in linedict:
  32.         datadict[item][key_]=linedict[key_][i]
  33.  
  34. for keymain in datadict:
  35.     for keysub in datadict[keymain]:
  36.         datadict[keymain][keysub]+=1.0
  37.  
  38. def random_seq():
  39.     seq=""
  40.     ch=""
  41.     for i in range(0,1000):
  42.         ch=random.choice(("ATGC"))
  43.         seq=seq+ch
  44.     return seq
  45.  
  46.  
  47. p=random_seq()
  48.  
  49. #def my_rand():
  50.  #   
  51.     #print p
  52.   #  part=""
  53.    # q=len(p)
  54.    # seqq=""
  55.  
  56.    # for i in range(0,q):
  57.     #    part= p[i:i+16]
  58.     #    if len(part)==16:
  59.      #       seqq=part
  60.       #      return seqq
  61.  
  62.  
  63.  
  64. #my_seq=my_rand()
  65. #print len(my_seq)
  66.  
  67.  
  68.  
  69.  
  70. res=1
  71. part=""
  72. q=len(p)
  73. seqq=""
  74. for i in range(0,q):
  75.     part=p[i:i+16]
  76.     if len(part)==16:
  77.         seqq=part
  78.         for i in range(0,16):
  79.             key=p[i]
  80.             print p[i]
  81.             res*=datadict[key]["%02d"%(i+1)]
  82.         print res,"&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&"
  83.     #score=1
  84.     #value={"A":"0.3","T":"0.3","C":"0.2","G":"0.2"}
  85.   #  for it in value:
  86.    #     for key in p:
  87.    #         if it==key:
  88.     #            score=score*float(value[it])
  89. #log_ratio=(res/score)
  90. #print log(log_ratio)
  91.  
my error says instea of printing some value of res,prints something like
inf &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
i think there is an error in this line
res*=datadict[key]["%02d"%(i+1)]
please help
waiting for ur reply,
cheers!
Jul 11 '07 #51
aboxylica
111 100+
hey,
sorry,i found the mistake and got the output!:)
cheers!!
Jul 11 '07 #52
aboxylica
111 100+
There is one thing i have got to do to my code and i donno how to do that.
After adding one to every element in the input file(until this i have already done).i have to normalize the rows as in every element should be divided by sum of the elements of that row
my input file is :
NA bap
PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00
//
//
A[01]=1.0(this is because a have already added one to the element)/[1.0+4.67+1.0+1.0]
similarly it has to be done for every element in every row.

the basic formula
formula=element/(sum of the elements of that row)
My code with one already added is
Expand|Select|Wrap|Line Numbers
  1. from math import *
  2. import random
  3. f=open("deeps1.txt","r")
  4. line=f.next()
  5. while not line.startswith('PO'):
  6.     line=f.next()
  7.  
  8. headerlist=line.strip().split()[1:]
  9. linelist=[]
  10.  
  11.  
  12. line=f.next().strip()
  13. while not line.startswith('/'):
  14.     if line != '':
  15.         linelist.append(line.strip().split())
  16.     line=f.next().strip()
  17.  
  18. keys=[i[0] for i in linelist]
  19. values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
  20. array={}
  21. linedict=dict(zip(keys,values))
  22. keys = linedict.keys()
  23. keys.sort()
  24. for key in keys:
  25.     array=[key,linedict[key]]
  26.  
  27. datadict={}
  28. datadict1={}
  29. for i,item in enumerate(headerlist):
  30.     datadict[item]={}
  31.     for key_ in linedict:
  32.         datadict[item][key_]=linedict[key_][i]
  33.  
  34. for keymain in datadict:
  35.     for keysub in datadict[keymain]:
  36.         datadict[keymain][keysub]+=1.0
  37. # here one has been added to all elements now how do i normalize it?
  38.  
waiting for ur reply,
cheers!
Jul 12 '07 #53
bvdet
2,851 Expert Mod 2GB
There is one thing i have got to do to my code and i donno how to do that.
After adding one to every element in the input file(until this i have already done).i have to normalize the rows as in every element should be divided by sum of the elements of that row
my input file is :
NA bap
PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00
//
//
A[01]=1.0(this is because a have already added one to the element)/[1.0+4.67+1.0+1.0]
similarly it has to be done for every element in every row.

the basic formula
formula=element/(sum of the elements of that row)
My code with one already added is
Expand|Select|Wrap|Line Numbers
  1. from math import *
  2. import random
  3. f=open("deeps1.txt","r")
  4. line=f.next()
  5. while not line.startswith('PO'):
  6.     line=f.next()
  7.  
  8. headerlist=line.strip().split()[1:]
  9. linelist=[]
  10.  
  11.  
  12. line=f.next().strip()
  13. while not line.startswith('/'):
  14.     if line != '':
  15.         linelist.append(line.strip().split())
  16.     line=f.next().strip()
  17.  
  18. keys=[i[0] for i in linelist]
  19. values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
  20. array={}
  21. linedict=dict(zip(keys,values))
  22. keys = linedict.keys()
  23. keys.sort()
  24. for key in keys:
  25.     array=[key,linedict[key]]
  26.  
  27. datadict={}
  28. datadict1={}
  29. for i,item in enumerate(headerlist):
  30.     datadict[item]={}
  31.     for key_ in linedict:
  32.         datadict[item][key_]=linedict[key_][i]
  33.  
  34. for keymain in datadict:
  35.     for keysub in datadict[keymain]:
  36.         datadict[keymain][keysub]+=1.0
  37. # here one has been added to all elements now how do i normalize it?
  38.  
waiting for ur reply,
cheers!
Create a new list of the sums of the items on each row of the original data:
Expand|Select|Wrap|Line Numbers
  1. valueSums = [sum(item)+4 for item in values]
Since there are 16 lines in the first data set, there should be 16 elements. Keep in mind lists are ordered and dictionaries are not. Iterate on each subdictionary of dataDict, create a sorted list of subdictionary keys, iterate (use enumerate) on the sorted list of keys, and update each element using the indexing operator.
Jul 12 '07 #54
elbin
27
Create a new list of the sums of the items on each row of the original data:
Expand|Select|Wrap|Line Numbers
  1. valueSums = [sum(item)+4 for item in values]
Since there are 16 lines in the first data set, there should be 16 elements. Keep in mind lists are ordered and dictionaries are not. Iterate on each subdictionary of dataDict, create a sorted list of subdictionary keys, iterate (use enumerate) on the sorted list of keys, and update each element using the indexing operator.
Or much easier:

Expand|Select|Wrap|Line Numbers
  1. datadict1 = datadict.copy()
  2. for keymain in datadict:
  3.     for keysub in datadict[keymain]:
  4.         datadict1[keymain][keysub] = datadict[keymain][keysub] / (sum(values[int(keysub) - 1]) + 4)
I take it the second dictionary is for the normalized values, but this does not change anything. You use the old keys and subkeys from dictdata directly and fill out the copy. If you fill it in the same dictionary you don't need the copy.
Jul 12 '07 #55
aboxylica
111 100+
Expand|Select|Wrap|Line Numbers
  1. from math import *
  2. import random
  3. f=open("deeps1.txt","r")
  4. line=f.next()
  5. while not line.startswith('PO'):
  6.     line=f.next()
  7.  
  8. headerlist=line.strip().split()[1:]
  9. linelist=[]
  10.  
  11.  
  12. line=f.next().strip()
  13. while not line.startswith('/'):
  14.     if line != '':
  15.         linelist.append(line.strip().split())
  16.     line=f.next().strip()
  17.  
  18. keys=[i[0] for i in linelist]
  19. values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
  20.  
  21. array={}
  22. linedict=dict(zip(keys,values))
  23. keys = linedict.keys()
  24. keys.sort()
  25. for key in keys:
  26.     array=[key,linedict[key]]
  27.  
  28. datadict={}
  29. datadict1={}
  30. for i,item in enumerate(headerlist):
  31.     datadict[item]={}
  32.     for key_ in linedict:
  33.         datadict[item][key_]=linedict[key_][i]
  34.  
  35.  
  36. for keymain in datadict:
  37.     for keysub in datadict[keymain]:
  38.         datadict[keymain][keysub]+=1.0
  39.         datadict1=datadict.copy()
  40.         for keysub in datadict:
  41.             for keysub in datadict[keymain]:
  42.                 datadict1[keymain][keysub]=datadict[keymain][keysub]/(sum(values[int(keysub)-1])+4)
  43.  
  44. def random_seq(nchars,insertat,astring):
  45.     seq=""
  46.  
  47.     for i in range(nchars):
  48.       if i== insertat:
  49.           seq+=astring
  50.       ch=random.choice(("ATGC"))
  51.       seq+=ch
  52.     print seq
  53.     return seq
  54. thestring="CGTCAAGTTCAAGTGCAAAA"
  55. count=50-len(thestring)
  56. p=random_seq(count,15,thestring)
  57. file=open("temp.txt",'w')
  58. #consensus="CGTCAAGTTCAAGTGCAAAA"
  59. #file.write(consensus)
  60. file.write(str(p))
  61. file.close()
  62.  
  63. def file_chk():
  64.     file=open("temp.txt","r")
  65.     file_content=file.read()
  66.     return file_content
  67.  
  68.  
  69.  
  70.  
  71.  
  72. #p=file_chk()
  73.  
  74.  
  75. #def my_rand():
  76.  #   
  77.     #print p
  78.   #  part=""
  79.    # q=len(p)
  80.    # seqq=""
  81.  
  82.    # for i in range(0,q):
  83.     #    part= p[i:i+16]
  84.     #    if len(part)==16:
  85.      #       seqq=part
  86.       #      return seqq
  87.  
  88.  
  89.  
  90. #my_seq=my_rand()
  91. #print len(my_seq)
  92.  
  93.  
  94.  
  95.  
  96. res=1
  97. part=""
  98. q=len(p)
  99. seqq=""
  100. for i in range(0,q):
  101.     part=p[i:i+16]
  102.     if len(part)==16:
  103.         seqq=part
  104.         res=1
  105.         for j in range(0,16):
  106.             key=seqq[j]
  107.             res=res*datadict[key]["%02d"%(j+1)]
  108.             print res
  109.             score=1
  110.             value={"A":"0.3","T":"0.3","C":"0.2","G":"0.2"}
  111.             for it in value:
  112.                 for key in seqq:
  113.                     if it==key:
  114.                         score=score*float(value[it])
  115.         #print score,"*******************",res
  116.         log_ratio=log10(res/score)
  117.         #print i,log_ratio
  118.  
this is my full code where i am calculating the scores dividing by another background value and ultimately taking a log. because of this normalisation some values are becomin zero.
like when i print the normalised values some values are becoming zero.
sorry but am not able to paste my o/p file..
but donno why this is happening
Jul 12 '07 #56
bvdet
2,851 Expert Mod 2GB
Or much easier:

Expand|Select|Wrap|Line Numbers
  1. datadict1 = datadict.copy()
  2. for keymain in datadict:
  3.     for keysub in datadict[keymain]:
  4.         datadict1[keymain][keysub] = datadict[keymain][keysub] / (sum(values[int(keysub) - 1]) + 4)
I take it the second dictionary is for the normalized values, but this does not change anything. You use the old keys and subkeys from dictdata directly and fill out the copy. If you fill it in the same dictionary you don't need the copy.
Yep, you can index on int(keySub)-1:
Expand|Select|Wrap|Line Numbers
  1. valueSums = [sum(item)+4 for item in values]
  2.  
  3. for keyMain in dataDict:
  4.     for keySub in dataDict[keyMain]:
  5.         dataDict[keyMain][keySub] /= valueSums[int(keySub)-1]
Jul 12 '07 #57
aboxylica
111 100+
Expand|Select|Wrap|Line Numbers
  1. from math import *
  2. import random
  3. f=open("deeps1.txt","r")
  4. line=f.next()
  5. while not line.startswith('PO'):
  6.     line=f.next()
  7.  
  8. headerlist=line.strip().split()[1:]
  9. linelist=[]
  10.  
  11.  
  12. line=f.next().strip()
  13. while not line.startswith('/'):
  14.     if line != '':
  15.         linelist.append(line.strip().split())
  16.     line=f.next().strip()
  17.  
  18. keys=[i[0] for i in linelist]
  19. values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
  20. valueSums = [sum(item)+4 for item in values]
  21.  
  22. array={}
  23. linedict=dict(zip(keys,values))
  24. keys = linedict.keys()
  25. keys.sort()
  26. for key in keys:
  27.     array=[key,linedict[key]]
  28.  
  29. datadict={}
  30. datadict1={}
  31. for i,item in enumerate(headerlist):
  32.     datadict[item]={}
  33.     for key_ in linedict:
  34.         datadict[item][key_]=linedict[key_][i]
  35.  
  36.  
  37. for keymain in datadict:
  38.      for keysub in datadict[keymain]:
  39.         datadict[keymain][keysub]+=1.0
  40.         for keyMain in datadict:
  41.             for keySub in datadict[keyMain]:
  42.                 datadict[keyMain][keySub] /= valueSums[int(keySub)-1]
  43.  
  44.  
  45.  
  46. def random_seq(nchars,insertat,astring):
  47.     seq=""
  48.  
  49.     for i in range(nchars):
  50.       if i== insertat:
  51.           seq+=astring
  52.       ch=random.choice(("ATGC"))
  53.       seq+=ch
  54.     print seq
  55.     return seq
  56. thestring="CGTCAAGTTCAAGTGCAAAA"
  57. count=50-len(thestring)
  58. p=random_seq(count,15,thestring)
  59. file=open("temp.txt",'w')
  60. #consensus="CGTCAAGTTCAAGTGCAAAA"
  61. #file.write(consensus)
  62. file.write(str(p))
  63. file.close()
  64.  
  65. def file_chk():
  66.     file=open("temp.txt","r")
  67.     file_content=file.read()
  68.     return file_content
  69.  
  70.  
  71.  
  72.  
  73.  
  74. #p=file_chk()
  75.  
  76.  
  77. #def my_rand():
  78.  #   
  79.     #print p
  80.   #  part=""
  81.    # q=len(p)
  82.    # seqq=""
  83.  
  84.    # for i in range(0,q):
  85.     #    part= p[i:i+16]
  86.     #    if len(part)==16:
  87.      #       seqq=part
  88.       #      return seqq
  89.  
  90.  
  91.  
  92. #my_seq=my_rand()
  93. #print len(my_seq)
  94.  
  95.  
  96.  
  97.  
  98. res=1
  99. part=""
  100. q=len(p)
  101. seqq=""
  102. for i in range(0,q):
  103.     part=p[i:i+16]
  104.     if len(part)==16:
  105.         seqq=part
  106.         res=1
  107.         for j in range(0,16):
  108.             key=seqq[j]
  109.             res=res*datadict[key]["%02d"%(j+1)]
  110.             print res
  111.             score=1
  112.             value={"A":"0.3","T":"0.3","C":"0.2","G":"0.2"}
  113.             for it in value:
  114.                 for key in seqq:
  115.                     if it==key:
  116.                         score=score*float(value[it])
  117.         #print score,"*******************",res
  118.         log_ratio=log(res/score)
  119.         #print i,log_ratio
  120.  
  121.  
since we are adding one to each element.. i don think my res value could be zero.do u see any mistake.or is it because it is going negative?? please help
waiting for ur reply,
cheers!
Jul 12 '07 #58
aboxylica
111 100+
when i say print valuesums.
many of those values are zero.but this is not possible right??
Jul 12 '07 #59
elbin
27
this is my full code where i am calculating the scores dividing by another background value and ultimately taking a log. because of this normalisation some values are becomin zero.
like when i print the normalised values some values are becoming zero.
sorry but am not able to paste my o/p file..
but donno why this is happening
Expand|Select|Wrap|Line Numbers
  1. from math import *
  2. import random
  3. f=open("deeps1.txt","r")
  4. line=f.next()
  5. while not line.startswith('PO'):
  6.     line=f.next()
  7.  
  8. headerlist=line.strip().split()[1:]
  9. linelist=[]
  10.  
  11.  
  12. line=f.next().strip()
  13. while not line.startswith('/'):
  14.     if line != '':
  15.         linelist.append(line.strip().split())
  16.     line=f.next().strip()
  17.  
  18. keys=[i[0] for i in linelist]
  19. values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
  20.  
  21. array={}
  22. linedict=dict(zip(keys,values))
  23. keys = linedict.keys()
  24. keys.sort()
  25. for key in keys:
  26.     array=[key,linedict[key]]
  27.  
  28. datadict={}
  29. datadict1={}
  30. for i,item in enumerate(headerlist):
  31.     datadict[item]={}
  32.     for key_ in linedict:
  33.         datadict[item][key_]=linedict[key_][i]
  34.  
  35.  
  36. for keymain in datadict:
  37.     for keysub in datadict[keymain]:
  38.         datadict[keymain][keysub]+=1.0
  39.  
  40. datadict1=datadict.copy()
  41. for keysub in datadict:
  42.     for keysub in datadict[keymain]:
  43.         datadict1[keymain][keysub]=datadict[keymain][keysub]/(sum(values[int(keysub)-1])+4)
  44.  
  45.  
  46. def random_seq(nchars,insertat,astring):
  47.     seq=""
  48.     for i in range(nchars):
  49.       if i== insertat:
  50.           seq+=astring
  51.       ch=random.choice(("ATGC"))
  52.       seq+=ch
  53.     print seq
  54.     return seq
  55.  
  56. thestring="CGTCAAGTTCAAGTGCAAAA"
  57. count=50-len(thestring)
  58. p=random_seq(count,15,thestring)
  59. file=open("temp.txt",'w')
  60. ##consensus="CGTCAAGTTCAAGTGCAAAA"
  61. ##file.write(consensus)
  62. file.write(str(p))
  63. file.close()
  64.  
  65. def file_chk():
  66.     f=open("temp.txt","r")
  67.     file_content=f.read()
  68.     return file_content
  69.  
  70. #p=file_chk()
  71.  
  72.  
  73. #def my_rand():
  74.  #   
  75.     #print p
  76.   #  part=""
  77.    # q=len(p)
  78.    # seqq=""
  79.  
  80.    # for i in range(0,q):
  81.     #    part= p[i:i+16]
  82.     #    if len(part)==16:
  83.      #       seqq=part
  84.       #      return seqq
  85.  
  86.  
  87.  
  88. #my_seq=my_rand()
  89. #print len(my_seq)
  90.  
  91. res=1
  92. part=""
  93. q=len(p)
  94. seqq=""
  95.  
  96. value={"A":0.3,"T":0.3,"C":0.2,"G":0.2}
  97. for i in range(q-16):
  98.     part=p[i:i+16]
  99.     seqq=part
  100.     res=1
  101.     score=1
  102.     for j in range(16):
  103.         key=seqq[j]
  104.         res=res*datadict1[key]["%02d"%(j+1)]
  105.         #print res
  106.     for key in seqq:
  107.         score=score * value[key]
  108.     #print score,"*******************",res
  109.     log_ratio=log10(res/score)
  110.     print i,log_ratio
  111.  
I think you had some problems with indentation, and I simplified a lot of the last part with the score and log. I think it is ok now. I don't know why you got 0's, but I think I know what you want to do, so it looks good now.
Jul 12 '07 #60
aboxylica
111 100+
thanks a lot buddy!!:)
its working!!
:)
cheers!!
Jul 12 '07 #61

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

4 posts views Thread by Rick Brown | last post: by
4 posts views Thread by sitemap | last post: by
9 posts views Thread by sean.scanlon | last post: by
4 posts views Thread by deLenn | last post: by
3 posts views Thread by 8rea | last post: by
12 posts views Thread by Nezhate | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.