473,385 Members | 1,983 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Processing text using python

Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

Feb 20 '06 #1
12 1536
nuttydevil <sj***@sussex.ac.uk> wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


Open each file and call thefile.read(3) in a loop, move to the next file
when the current one is exhausted. What part of this is giving you
problems?
Alex
Feb 20 '06 #2
nuttydevil wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

Since you're reading from files, the "read" operation of file-like
objects takes an argument specifying the number of characters to read
from the stream e.g.
f = file("stuff.txt")
f.read(3) 'car' f.read(3) 'act' f.read()

'erization'

Would that be enough for what you need?
Feb 20 '06 #3
In article <11*********************@g43g2000cwa.googlegroups. com>,
"nuttydevil" <sj***@sussex.ac.uk> wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


Don't reinvent the wheel. Take a look at http://www.biopython.org/.
Feb 20 '06 #4
I think this is what you want:

file = open(r'c:/test.txt','r')

c = file.read(3)
while c:
print c
c = file.read(3)

file.close();

Feb 20 '06 #5
da********@yahoo.com wrote:
I think this is what you want:

file = open(r'c:/test.txt','r')

c = file.read(3)
while c:
print c
c = file.read(3)

file.close();

Or:

def read3():
return file.read(3)
for chars in iter(read3, ''):
... do something with chars ...

STeVe
Feb 20 '06 #6
"nuttydevil" <sj***@sussex.ac.uk> wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


did you read the string chapter in the tutorial ?

http://docs.python.org/tut/node5.htm...00000000000000

around the middle of that chapter, there's a section on slicing:

"substrings can be specified with the slice notation: two indices
separated by a colon"

</F>

Feb 20 '06 #7
If you have already read the string into memory and want a convenient
way to loop through it 3 characters at a time, check out the "batch" recipe:

http://aspn.activestate.com/ASPN/Coo.../Recipe/303279

It uses itertools to make an iterator over the string, returning 3
characters at a time. Cool stuff.
nuttydevil wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

Feb 20 '06 #8
nuttydevil wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

data1 = '''FOOTFALLSECHOINTHEMEMORY
DOWNTHEPASSAGEWHICHWEDIDNOTTAKE
TOWARDSTHEDOORWENEVEROPENED'''

num_codons = len(data1) // 3

codons = [ data1[3*i:3*(i+1)] for i in range( num_codons ) ]

print codons

class Codon(object):
#__slots__ = ['alpha', 'beta', 'gamma']
def __init__(self, a, b, c):
self.alpha = a
self.beta = b
self.gamma = c

codons = [ Codon(*codon) for codon in codons ]

print codons[0].alpha, codons[0].beta, codons[0].gamma

###output####

['FOO', 'TFA', 'LLS', 'ECH', 'OIN', 'THE', 'MEM', 'ORY', '\nDO', 'WNT',
'HEP', 'ASS', 'AGE', 'WHI', 'CHW', 'EDI', 'DNO', 'TTA', 'KE\n', 'TOW',
'ARD', 'STH', 'EDO', 'ORW', 'ENE', 'VER', 'OPE', 'NED']
F O O
Gerard

Feb 20 '06 #9
Sure. There's probably a thousand ways to do this.

Feb 20 '06 #10
Hi,

you have plenty of good responses. I thought I would add one more:

def data_iter(file_name):
data = file(file_name)
while True:
value = data.read(3)
if not value:
break
yield value
data.close()

With the above, you can grab the entire data set (3 characters at a
time) like so:

data_set = [ d for d in data_iter('data') ]

Or iterate over it:

for d in data_iter('data'):
# do stuff

Enjoy!

Feb 20 '06 #11
Fredrik Lundh wrote:
did you read the string chapter in the tutorial ?

http://docs.python.org/tut/node5.htm...00000000000000

around the middle of that chapter, there's a section on slicing:

"substrings can be specified with the slice notation: two indices
separated by a colon"

Fredrik, how would you use slices to split a string by groups of 3
characters?
Feb 20 '06 #12
Xavier Morel <xa**********@masklinn.net> wrote:
Fredrik Lundh wrote:
did you read the string chapter in the tutorial ?

http://docs.python.org/tut/node5.htm...00000000000000

around the middle of that chapter, there's a section on slicing:

"substrings can be specified with the slice notation: two indices
separated by a colon"

Fredrik, how would you use slices to split a string by groups of 3
characters?


I can't answer for him, but maybe:

[s[i:i+3] for i in xrange(0, len(s), 3)]

....?
Alex
Feb 20 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: christof hoeke | last post by:
hi, i wrote a small application which extracts a javadoc similar documentation for xslt stylesheets using python, xslt and pyana. using non-ascii characters was a problem. so i set the...
1
by: webworldL | last post by:
Has anybody had any luck processing XHTML1.1 documents with xml.sax? Whenever I try it, python loads the W3C DTD from the top, then crashes saying that there's an error in the external DTD. All I...
3
by: anthony hornby | last post by:
Hi, I am starting my honours degree project and part of it is going to be manipulating ASCII encoded XML files from a legacy database and converting them to Unicode and doing text processing stuff...
1
by: Anthony Liu | last post by:
I believe that topic related to Chinese processing was discussed before. I could not dig out the info I want from the mail list archive. My Python script reads some Chinese text and then split...
6
by: James Radke | last post by:
Hello, I have a multithreaded windows NT service application (vb.net 2003) that I am working on (my first one), which reads a message queue and creates multiple threads to perform the processing...
1
by: ankit | last post by:
There are various packages availaible for XML processing using python. So which to choose and when. I summarized some of the features, advantages and disadvantages of some packages int the...
4
by: Alexis Gallagher | last post by:
(I tried to post this yesterday but I think my ISP ate it. Apologies if this is a double-post.) Is it possible to do very fast string processing in python? My bioinformatics application needs to...
4
by: ferrad | last post by:
I have not used Python before, but believe it may be what I need. I have large text files containing text, numbers, and junk. I want to delete large chunks process other bits, etc, much like I'd...
1
by: Xah Lee | last post by:
Text Processing with Emacs Lisp Xah Lee, 2007-10-29 This page gives a outline of how to use emacs lisp to do text processing, using a specific real-world problem as example. If you don't know...
3
by: John Carlyle-Clarke | last post by:
Hi. I'm new to Python and trying to use it to solve a specific problem. I have an XML file in which I need to locate a specific text node and replace the contents with some other text. The...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.