transfer genome text file into excel file and select each cell to blast in NCBI

I would like to know how to select each item of data from text file transfer to each column in excel file.

Expand|Select|Wrap|Line Numbers

 
                     source          1..159662

                     /organism="Candidatus Carsonella ruddii PV"

                     /mol_type="genomic DNA"

                     /strain="PV"

                     /specific_host="Pachypsylla venusta"

                     /db_xref="taxon:387662"

     gene            1..1317

                     /locus_tag="CRP_001"

                     /db_xref="GeneID:4414829"

     CDS             1..1317

                     /locus_tag="CRP_001"

                     /codon_start=1

                     /transl_table=11

                     /product="tRNA modification GTPase"

                     /protein_id="YP_802398.1"

                     /db_xref="GI:116334903"

                     /db_xref="GeneID:4414829"

After that, how can I select each gene (in each cell) from excel file to blast (align) in NCBI database.

Feb 21 '07 #1

Subscribe Post Reply

2914

bartonc

6,596

Expert 4TB

I'm not sure if this is a python question. Are you using Visual Basic or Python?

I would like to know how to select each item of data from text file transfer to each column in excel file.

Expand|Select|Wrap|Line Numbers

                     source          1..159662

                     /organism="Candidatus Carsonella ruddii PV"

                     /mol_type="genomic DNA"

                     /strain="PV"

                     /specific_host="Pachypsylla venusta"

                     /db_xref="taxon:387662"

     gene            1..1317

                     /locus_tag="CRP_001"

                     /db_xref="GeneID:4414829"

     CDS             1..1317

                     /locus_tag="CRP_001"

                     /codon_start=1

                     /transl_table=11

                     /product="tRNA modification GTPase"

                     /protein_id="YP_802398.1"

                     /db_xref="GI:116334903"

                     /db_xref="GeneID:4414829"

After that, how can I select each gene (in each cell) from excel file to blast (align) in NCBI database.

Feb 21 '07 #2

khunohm

I use python however I'm just the beginner of programing. Ok, I think my first problem is how can I select only each word that say "gene locus = xxx" from the data set in text file and make a new data set like this pattern => "gene locus1 =xxx", "gene locus2 =xxx", "gene locus3 =xxx", ......

I'm not sure if this is a python question. Are you using Visual Basic or Python?

Feb 21 '07 #3

bartonc

6,596

Expert 4TB

I use python however I'm just the beginner of programing. Ok, I think my first problem is how can I select only each word that say "gene locus = xxx" from the data set in text file and make a new data set like this pattern => "gene locus1 =xxx", "gene locus2 =xxx", "gene locus3 =xxx", ......

Did your post copy you text file format accurately? It looks a bit odd to me.

Feb 21 '07 #4

bvdet

2,851

Expert Mod 2GB

I use python however I'm just the beginner of programing. Ok, I think my first problem is how can I select only each word that say "gene locus = xxx" from the data set in text file and make a new data set like this pattern => "gene locus1 =xxx", "gene locus2 =xxx", "gene locus3 =xxx", ......

One option that should be straightforward is to read each line in your data file, extract the text you need, and write the rusults to a tab delimited file formatted into rows and columns. This file can be imported directly into Excel. If you want us to help you with this, you should post a portion of a representative file and the format you require for the output (rows and columns of the spreadsheet).

Feb 21 '07 #5

khunohm

Thanks for the suggestion, here is part of my raw data from NCBI database.

gene 1..1317
/locus_tag="CRP_001"
/db_xref="GeneID:4414829"
CDS 1..1317
/locus_tag="CRP_001"
/codon_start=1
/transl_table=11
/product="tRNA modification GTPase"
/protein_id="YP_802398.1"
/db_xref="GI:116334903"
/db_xref="GeneID:4414829"
/translation="KNLKCFINKIVDNKDFSKNNYSDVKILFNKFSF"

gene 1314..2816
/locus_tag="CRP_002"
/db_xref="GeneID:4414830"
CDS 1314..2816
/locus_tag="CRP_002"
/codon_start=1
/transl_table=11
/product="glucose inhibited division protein A"
/protein_id="YP_802399.1"
/db_xref="GI:116334904"
/db_xref="GeneID:4414830"
/translation="KIKLFDNFYLFKLIIIMSKYYGYIKKKYFK"

gene 2785..3477
/locus_tag="CRP_003"
/db_xref="GeneID:4414831"
CDS 2785..3477
/locus_tag="CRP_003"
/codon_start=1
/transl_table=11
/product="F0F1-type ATP synthase A subunit"
/protein_id="YP_802400.1"
/db_xref="GI:116334905"
/db_xref="GeneID:4414831"
/translation="MVILKKNILNNFLNFKIIDLNLIILL"

And I need to transform it into the pattern as seen below.

No., gene, locus_tag, protein_id, GeneID,
1, 1..1317, CRP_001, YP_802398.1, 4414829,
2, 1314..2816, CRP_002, YP_802399.1, 4414830,
3, 2785..3477, CRP_003, YP_802400.1, 4414831,
4, ................ .............. .................. ...........

If it becomes like this pattern, I will be able to convert it into excel file.

Feb 21 '07 #6

bvdet

2,851

Expert Mod 2GB

Thanks for the suggestion, here is part of my raw data from NCBI database.

gene 1..1317
/locus_tag="CRP_001"
/db_xref="GeneID:4414829"
CDS 1..1317
/locus_tag="CRP_001"
/codon_start=1
/transl_table=11
/product="tRNA modification GTPase"
/protein_id="YP_802398.1"
/db_xref="GI:116334903"
/db_xref="GeneID:4414829"
/translation="KNLKCFINKIVDNKDFSKNNYSDVKILFNKFSF"

gene 1314..2816
/locus_tag="CRP_002"
/db_xref="GeneID:4414830"
CDS 1314..2816
/locus_tag="CRP_002"
/codon_start=1
/transl_table=11
/product="glucose inhibited division protein A"
/protein_id="YP_802399.1"
/db_xref="GI:116334904"
/db_xref="GeneID:4414830"
/translation="KIKLFDNFYLFKLIIIMSKYYGYIKKKYFK"

gene 2785..3477
/locus_tag="CRP_003"
/db_xref="GeneID:4414831"
CDS 2785..3477
/locus_tag="CRP_003"
/codon_start=1
/transl_table=11
/product="F0F1-type ATP synthase A subunit"
/protein_id="YP_802400.1"
/db_xref="GI:116334905"
/db_xref="GeneID:4414831"
/translation="MVILKKNILNNFLNFKIIDLNLIILL"

And I need to transform it into the pattern as seen below.

No., gene, locus_tag, protein_id, GeneID,
1, 1..1317, CRP_001, YP_802398.1, 4414829,
2, 1314..2816, CRP_002, YP_802399.1, 4414830,
3, 2785..3477, CRP_003, YP_802400.1, 4414831,
4, ................ .............. .................. ...........

If it becomes like this pattern, I will be able to convert it into excel file.

This is not elegant, but see if it works for you:

Expand|Select|Wrap|Line Numbers

 import os
 
def geneData(fn, fOut):

    f = open(fn, 'r')

    line_no = 1

    lineLst = []

    itemLst = []

    labelLst = ['No.', 'gene', 'locus_tag', 'protein_id', 'GeneID']

    protien_id = False

    for line in f:

        if 'gene' in line:

            itemLst += [str(line_no), line.split()[1]]

        elif 'locus_tag' in line:

            itemLst.append(line.split('=')[1].strip('"\n'))

        elif 'protein_id' in  line:

            itemLst.append(line.split('=')[1].strip('"\n'))

            protien_id = True

        elif 'GeneID' in line and protien_id == True:

            itemLst.append(line.split(':')[1].strip('"\n'))

            lineLst.append(itemLst)

            itemLst = []

            line_no += 1

            protien_id = False

    f.close()

    f = open(fOut, 'w')

    f.write(','.join(labelLst)+'\n')

    for line in lineLst:

        f.write(','.join(line)+'\n')

    f.close()           

    return labelLst+lineLst
 
if __name__ == '__main__':

    geneData('your_in_file', 'your_out_file')

Feb 21 '07 #7

khunohm

Thank you very much. I will try and learn it from your script.

This is not elegant, but see if it works for you:

Expand|Select|Wrap|Line Numbers

import os

def geneData(fn, fOut):

    f = open(fn, 'r')

    line_no = 1

    lineLst = []

    itemLst = []

    labelLst = ['No.', 'gene', 'locus_tag', 'protein_id', 'GeneID']

    protien_id = False

    for line in f:

        if 'gene' in line:

            itemLst += [str(line_no), line.split()[1]]

        elif 'locus_tag' in line:

            itemLst.append(line.split('=')[1].strip('"\n'))

        elif 'protein_id' in  line:

            itemLst.append(line.split('=')[1].strip('"\n'))

            protien_id = True

        elif 'GeneID' in line and protien_id == True:

            itemLst.append(line.split(':')[1].strip('"\n'))

            lineLst.append(itemLst)

            itemLst = []

            line_no += 1

            protien_id = False

    f.close()

    f = open(fOut, 'w')

    f.write(','.join(labelLst)+'\n')

    for line in lineLst:

        f.write(','.join(line)+'\n')

    f.close()

    return labelLst+lineLst

if __name__ == '__main__':

    geneData('your_in_file', 'your_out_file')

Feb 21 '07 #8

by: Mr. Smith | last post by:

Hello all, My code can successfully open, write to, format and save several worksheets in a workbook then save it by a given name, close and quit excel. My problem is that if I try and do it...

Microsoft Access / VBA

Excel.Range.Name gives error an exception

by: IMS.Rushikesh | last post by:

Hi All, I am trying to execute below code but it gives me an COMException ///// Code Start //// public string GetName(Excel.Range range) { try { if (range.Name != null)

C# / C Sharp

Excel to VB Cell Transfer

by: Larry Jones | last post by:

I am pulling Excel single cell information from a workbook to a textbox in a VB.net form. Most of the information is transferring without a problem, but some cells with function calculations are...

Visual Basic .NET

Transfer Table to Existing Excel Worksheet

by: hmiller | last post by:

Hey there folks: I have been trying to get this work for about a week now. I'm new to VBA... I am trying to transfer a populated table in Access to an existing, but blank, Excel worksheet. I...

Microsoft Access / VBA

Problem to convert xml to excel sheet using xslt

by: Ch Pravin | last post by:

Hi All: I am having the following xml which i need to convert to excel using xslt. Please help me out. Afghanistan.xml <?xml version="1.0" encoding="utf-16"?> <Languages...

XML

data transfer from .csv file to sql server 2000

by: sachin shah | last post by:

Hi All, 1. i want to transfer the .csv file data into sql server table, i tried with the DTS but while creating DSN it not prompt to attech the .csv file. give me the proper steps to perform...

Microsoft SQL Server

VBA Minesweeper - Take advantage of your Mouse in an Excel's Worksheet.

by: kadghar | last post by:

Most of the times VBA is used with variables. Objects (such as worksheets, cells or databases) are only used when we read their properties (value, formula, font...) or we use a method (save,...

Visual Basic 4 / 5 / 6

Export XML data from text file to Excel file

by: pulavarthipraveen | last post by:

Overview: We have a requirement in the c#.NET 1.0 windows application. There will be some input text file in the user’s machine. The user should browse and select the input text file and also select...

XML

Access VBA to transfer combo box values into Access Table

by: JFKJr | last post by:

Hello everyone, the following is the Access VBA code which opens an excel spreadsheet and creates combo boxes dynamically. And whenever a user selects a value in a combo box, I am trying to pass...

Microsoft Access / VBA

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

transfer genome text file into excel file and select each cell to blast in NCBI

Similar topics