473,406 Members | 2,345 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

transfer genome text file into excel file and select each cell to blast in NCBI

I would like to know how to select each item of data from text file transfer to each column in excel file.
Expand|Select|Wrap|Line Numbers
  1.                      source          1..159662
  2.                      /organism="Candidatus Carsonella ruddii PV"
  3.                      /mol_type="genomic DNA"
  4.                      /strain="PV"
  5.                      /specific_host="Pachypsylla venusta"
  6.                      /db_xref="taxon:387662"
  7.      gene            1..1317
  8.                      /locus_tag="CRP_001"
  9.                      /db_xref="GeneID:4414829"
  10.      CDS             1..1317
  11.                      /locus_tag="CRP_001"
  12.                      /codon_start=1
  13.                      /transl_table=11
  14.                      /product="tRNA modification GTPase"
  15.                      /protein_id="YP_802398.1"
  16.                      /db_xref="GI:116334903"
  17.                      /db_xref="GeneID:4414829"
  18.  
After that, how can I select each gene (in each cell) from excel file to blast (align) in NCBI database.
Feb 21 '07 #1
7 2914
bartonc
6,596 Expert 4TB
I'm not sure if this is a python question. Are you using Visual Basic or Python?

I would like to know how to select each item of data from text file transfer to each column in excel file.
Expand|Select|Wrap|Line Numbers
  1.                      source          1..159662
  2.                      /organism="Candidatus Carsonella ruddii PV"
  3.                      /mol_type="genomic DNA"
  4.                      /strain="PV"
  5.                      /specific_host="Pachypsylla venusta"
  6.                      /db_xref="taxon:387662"
  7.      gene            1..1317
  8.                      /locus_tag="CRP_001"
  9.                      /db_xref="GeneID:4414829"
  10.      CDS             1..1317
  11.                      /locus_tag="CRP_001"
  12.                      /codon_start=1
  13.                      /transl_table=11
  14.                      /product="tRNA modification GTPase"
  15.                      /protein_id="YP_802398.1"
  16.                      /db_xref="GI:116334903"
  17.                      /db_xref="GeneID:4414829"
  18.  
After that, how can I select each gene (in each cell) from excel file to blast (align) in NCBI database.
Feb 21 '07 #2
I use python however I'm just the beginner of programing. Ok, I think my first problem is how can I select only each word that say "gene locus = xxx" from the data set in text file and make a new data set like this pattern => "gene locus1 =xxx", "gene locus2 =xxx", "gene locus3 =xxx", ......

I'm not sure if this is a python question. Are you using Visual Basic or Python?
Feb 21 '07 #3
bartonc
6,596 Expert 4TB
I use python however I'm just the beginner of programing. Ok, I think my first problem is how can I select only each word that say "gene locus = xxx" from the data set in text file and make a new data set like this pattern => "gene locus1 =xxx", "gene locus2 =xxx", "gene locus3 =xxx", ......
Did your post copy you text file format accurately? It looks a bit odd to me.
Feb 21 '07 #4
bvdet
2,851 Expert Mod 2GB
I use python however I'm just the beginner of programing. Ok, I think my first problem is how can I select only each word that say "gene locus = xxx" from the data set in text file and make a new data set like this pattern => "gene locus1 =xxx", "gene locus2 =xxx", "gene locus3 =xxx", ......
One option that should be straightforward is to read each line in your data file, extract the text you need, and write the rusults to a tab delimited file formatted into rows and columns. This file can be imported directly into Excel. If you want us to help you with this, you should post a portion of a representative file and the format you require for the output (rows and columns of the spreadsheet).
Feb 21 '07 #5
Thanks for the suggestion, here is part of my raw data from NCBI database.


gene 1..1317
/locus_tag="CRP_001"
/db_xref="GeneID:4414829"
CDS 1..1317
/locus_tag="CRP_001"
/codon_start=1
/transl_table=11
/product="tRNA modification GTPase"
/protein_id="YP_802398.1"
/db_xref="GI:116334903"
/db_xref="GeneID:4414829"
/translation="KNLKCFINKIVDNKDFSKNNYSDVKILFNKFSF"

gene 1314..2816
/locus_tag="CRP_002"
/db_xref="GeneID:4414830"
CDS 1314..2816
/locus_tag="CRP_002"
/codon_start=1
/transl_table=11
/product="glucose inhibited division protein A"
/protein_id="YP_802399.1"
/db_xref="GI:116334904"
/db_xref="GeneID:4414830"
/translation="KIKLFDNFYLFKLIIIMSKYYGYIKKKYFK"

gene 2785..3477
/locus_tag="CRP_003"
/db_xref="GeneID:4414831"
CDS 2785..3477
/locus_tag="CRP_003"
/codon_start=1
/transl_table=11
/product="F0F1-type ATP synthase A subunit"
/protein_id="YP_802400.1"
/db_xref="GI:116334905"
/db_xref="GeneID:4414831"
/translation="MVILKKNILNNFLNFKIIDLNLIILL"


And I need to transform it into the pattern as seen below.

No., gene, locus_tag, protein_id, GeneID,
1, 1..1317, CRP_001, YP_802398.1, 4414829,
2, 1314..2816, CRP_002, YP_802399.1, 4414830,
3, 2785..3477, CRP_003, YP_802400.1, 4414831,
4, ................ .............. .................. ...........

If it becomes like this pattern, I will be able to convert it into excel file.
Feb 21 '07 #6
bvdet
2,851 Expert Mod 2GB
Thanks for the suggestion, here is part of my raw data from NCBI database.


gene 1..1317
/locus_tag="CRP_001"
/db_xref="GeneID:4414829"
CDS 1..1317
/locus_tag="CRP_001"
/codon_start=1
/transl_table=11
/product="tRNA modification GTPase"
/protein_id="YP_802398.1"
/db_xref="GI:116334903"
/db_xref="GeneID:4414829"
/translation="KNLKCFINKIVDNKDFSKNNYSDVKILFNKFSF"

gene 1314..2816
/locus_tag="CRP_002"
/db_xref="GeneID:4414830"
CDS 1314..2816
/locus_tag="CRP_002"
/codon_start=1
/transl_table=11
/product="glucose inhibited division protein A"
/protein_id="YP_802399.1"
/db_xref="GI:116334904"
/db_xref="GeneID:4414830"
/translation="KIKLFDNFYLFKLIIIMSKYYGYIKKKYFK"

gene 2785..3477
/locus_tag="CRP_003"
/db_xref="GeneID:4414831"
CDS 2785..3477
/locus_tag="CRP_003"
/codon_start=1
/transl_table=11
/product="F0F1-type ATP synthase A subunit"
/protein_id="YP_802400.1"
/db_xref="GI:116334905"
/db_xref="GeneID:4414831"
/translation="MVILKKNILNNFLNFKIIDLNLIILL"


And I need to transform it into the pattern as seen below.

No., gene, locus_tag, protein_id, GeneID,
1, 1..1317, CRP_001, YP_802398.1, 4414829,
2, 1314..2816, CRP_002, YP_802399.1, 4414830,
3, 2785..3477, CRP_003, YP_802400.1, 4414831,
4, ................ .............. .................. ...........

If it becomes like this pattern, I will be able to convert it into excel file.
This is not elegant, but see if it works for you:
Expand|Select|Wrap|Line Numbers
  1. import os
  2.  
  3. def geneData(fn, fOut):
  4.     f = open(fn, 'r')
  5.     line_no = 1
  6.     lineLst = []
  7.     itemLst = []
  8.     labelLst = ['No.', 'gene', 'locus_tag', 'protein_id', 'GeneID']
  9.     protien_id = False
  10.     for line in f:
  11.         if 'gene' in line:
  12.             itemLst += [str(line_no), line.split()[1]]
  13.         elif 'locus_tag' in line:
  14.             itemLst.append(line.split('=')[1].strip('"\n'))
  15.         elif 'protein_id' in  line:
  16.             itemLst.append(line.split('=')[1].strip('"\n'))
  17.             protien_id = True
  18.         elif 'GeneID' in line and protien_id == True:
  19.             itemLst.append(line.split(':')[1].strip('"\n'))
  20.             lineLst.append(itemLst)
  21.             itemLst = []
  22.             line_no += 1
  23.             protien_id = False
  24.     f.close()
  25.     f = open(fOut, 'w')
  26.     f.write(','.join(labelLst)+'\n')
  27.     for line in lineLst:
  28.         f.write(','.join(line)+'\n')
  29.     f.close()           
  30.     return labelLst+lineLst
  31.  
  32.  
  33. if __name__ == '__main__':
  34.     geneData('your_in_file', 'your_out_file')
Feb 21 '07 #7
Thank you very much. I will try and learn it from your script.


This is not elegant, but see if it works for you:
Expand|Select|Wrap|Line Numbers
  1. import os
  2.  
  3. def geneData(fn, fOut):
  4.     f = open(fn, 'r')
  5.     line_no = 1
  6.     lineLst = []
  7.     itemLst = []
  8.     labelLst = ['No.', 'gene', 'locus_tag', 'protein_id', 'GeneID']
  9.     protien_id = False
  10.     for line in f:
  11.         if 'gene' in line:
  12.             itemLst += [str(line_no), line.split()[1]]
  13.         elif 'locus_tag' in line:
  14.             itemLst.append(line.split('=')[1].strip('"\n'))
  15.         elif 'protein_id' in  line:
  16.             itemLst.append(line.split('=')[1].strip('"\n'))
  17.             protien_id = True
  18.         elif 'GeneID' in line and protien_id == True:
  19.             itemLst.append(line.split(':')[1].strip('"\n'))
  20.             lineLst.append(itemLst)
  21.             itemLst = []
  22.             line_no += 1
  23.             protien_id = False
  24.     f.close()
  25.     f = open(fOut, 'w')
  26.     f.write(','.join(labelLst)+'\n')
  27.     for line in lineLst:
  28.         f.write(','.join(line)+'\n')
  29.     f.close()           
  30.     return labelLst+lineLst
  31.  
  32.  
  33. if __name__ == '__main__':
  34.     geneData('your_in_file', 'your_out_file')
Feb 21 '07 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

11
by: Mr. Smith | last post by:
Hello all, My code can successfully open, write to, format and save several worksheets in a workbook then save it by a given name, close and quit excel. My problem is that if I try and do it...
4
by: IMS.Rushikesh | last post by:
Hi All, I am trying to execute below code but it gives me an COMException ///// Code Start //// public string GetName(Excel.Range range) { try { if (range.Name != null)
0
by: Larry Jones | last post by:
I am pulling Excel single cell information from a workbook to a textbox in a VB.net form. Most of the information is transferring without a problem, but some cells with function calculations are...
5
by: hmiller | last post by:
Hey there folks: I have been trying to get this work for about a week now. I'm new to VBA... I am trying to transfer a populated table in Access to an existing, but blank, Excel worksheet. I...
2
by: Ch Pravin | last post by:
Hi All: I am having the following xml which i need to convert to excel using xslt. Please help me out. Afghanistan.xml <?xml version="1.0" encoding="utf-16"?> <Languages...
2
by: sachin shah | last post by:
Hi All, 1. i want to transfer the .csv file data into sql server table, i tried with the DTS but while creating DSN it not prompt to attech the .csv file. give me the proper steps to perform...
5
kadghar
by: kadghar | last post by:
Most of the times VBA is used with variables. Objects (such as worksheets, cells or databases) are only used when we read their properties (value, formula, font...) or we use a method (save,...
2
by: pulavarthipraveen | last post by:
Overview: We have a requirement in the c#.NET 1.0 windows application. There will be some input text file in the user’s machine. The user should browse and select the input text file and also select...
6
by: JFKJr | last post by:
Hello everyone, the following is the Access VBA code which opens an excel spreadsheet and creates combo boxes dynamically. And whenever a user selects a value in a combo box, I am trying to pass...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.