By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,077 Members | 1,790 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,077 IT Pros & Developers. It's quick & easy.

Processing text using python

P: n/a
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

Feb 20 '06 #1
Share this Question
Share on Google+
12 Replies


P: n/a
nuttydevil <sj***@sussex.ac.uk> wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


Open each file and call thefile.read(3) in a loop, move to the next file
when the current one is exhausted. What part of this is giving you
problems?
Alex
Feb 20 '06 #2

P: n/a
nuttydevil wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

Since you're reading from files, the "read" operation of file-like
objects takes an argument specifying the number of characters to read
from the stream e.g.
f = file("stuff.txt")
f.read(3) 'car' f.read(3) 'act' f.read()

'erization'

Would that be enough for what you need?
Feb 20 '06 #3

P: n/a
In article <11*********************@g43g2000cwa.googlegroups. com>,
"nuttydevil" <sj***@sussex.ac.uk> wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


Don't reinvent the wheel. Take a look at http://www.biopython.org/.
Feb 20 '06 #4

P: n/a
I think this is what you want:

file = open(r'c:/test.txt','r')

c = file.read(3)
while c:
print c
c = file.read(3)

file.close();

Feb 20 '06 #5

P: n/a
da********@yahoo.com wrote:
I think this is what you want:

file = open(r'c:/test.txt','r')

c = file.read(3)
while c:
print c
c = file.read(3)

file.close();

Or:

def read3():
return file.read(3)
for chars in iter(read3, ''):
... do something with chars ...

STeVe
Feb 20 '06 #6

P: n/a
"nuttydevil" <sj***@sussex.ac.uk> wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


did you read the string chapter in the tutorial ?

http://docs.python.org/tut/node5.htm...00000000000000

around the middle of that chapter, there's a section on slicing:

"substrings can be specified with the slice notation: two indices
separated by a colon"

</F>

Feb 20 '06 #7

P: n/a
If you have already read the string into memory and want a convenient
way to loop through it 3 characters at a time, check out the "batch" recipe:

http://aspn.activestate.com/ASPN/Coo.../Recipe/303279

It uses itertools to make an iterator over the string, returning 3
characters at a time. Cool stuff.
nuttydevil wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

Feb 20 '06 #8

P: n/a
nuttydevil wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

data1 = '''FOOTFALLSECHOINTHEMEMORY
DOWNTHEPASSAGEWHICHWEDIDNOTTAKE
TOWARDSTHEDOORWENEVEROPENED'''

num_codons = len(data1) // 3

codons = [ data1[3*i:3*(i+1)] for i in range( num_codons ) ]

print codons

class Codon(object):
#__slots__ = ['alpha', 'beta', 'gamma']
def __init__(self, a, b, c):
self.alpha = a
self.beta = b
self.gamma = c

codons = [ Codon(*codon) for codon in codons ]

print codons[0].alpha, codons[0].beta, codons[0].gamma

###output####

['FOO', 'TFA', 'LLS', 'ECH', 'OIN', 'THE', 'MEM', 'ORY', '\nDO', 'WNT',
'HEP', 'ASS', 'AGE', 'WHI', 'CHW', 'EDI', 'DNO', 'TTA', 'KE\n', 'TOW',
'ARD', 'STH', 'EDO', 'ORW', 'ENE', 'VER', 'OPE', 'NED']
F O O
Gerard

Feb 20 '06 #9

P: n/a
Sure. There's probably a thousand ways to do this.

Feb 20 '06 #10

P: n/a
Hi,

you have plenty of good responses. I thought I would add one more:

def data_iter(file_name):
data = file(file_name)
while True:
value = data.read(3)
if not value:
break
yield value
data.close()

With the above, you can grab the entire data set (3 characters at a
time) like so:

data_set = [ d for d in data_iter('data') ]

Or iterate over it:

for d in data_iter('data'):
# do stuff

Enjoy!

Feb 20 '06 #11

P: n/a
Fredrik Lundh wrote:
did you read the string chapter in the tutorial ?

http://docs.python.org/tut/node5.htm...00000000000000

around the middle of that chapter, there's a section on slicing:

"substrings can be specified with the slice notation: two indices
separated by a colon"

Fredrik, how would you use slices to split a string by groups of 3
characters?
Feb 20 '06 #12

P: n/a
Xavier Morel <xa**********@masklinn.net> wrote:
Fredrik Lundh wrote:
did you read the string chapter in the tutorial ?

http://docs.python.org/tut/node5.htm...00000000000000

around the middle of that chapter, there's a section on slicing:

"substrings can be specified with the slice notation: two indices
separated by a colon"

Fredrik, how would you use slices to split a string by groups of 3
characters?


I can't answer for him, but maybe:

[s[i:i+3] for i in xrange(0, len(s), 3)]

....?
Alex
Feb 20 '06 #13

This discussion thread is closed

Replies have been disabled for this discussion.