473,657 Members | 2,597 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Processing text using python

Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

Feb 20 '06 #1
12 1555
nuttydevil <sj***@sussex.a c.uk> wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


Open each file and call thefile.read(3) in a loop, move to the next file
when the current one is exhausted. What part of this is giving you
problems?
Alex
Feb 20 '06 #2
nuttydevil wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

Since you're reading from files, the "read" operation of file-like
objects takes an argument specifying the number of characters to read
from the stream e.g.
f = file("stuff.txt ")
f.read(3) 'car' f.read(3) 'act' f.read()

'erization'

Would that be enough for what you need?
Feb 20 '06 #3
In article <11************ *********@g43g2 000cwa.googlegr oups.com>,
"nuttydevil " <sj***@sussex.a c.uk> wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


Don't reinvent the wheel. Take a look at http://www.biopython.org/.
Feb 20 '06 #4
I think this is what you want:

file = open(r'c:/test.txt','r')

c = file.read(3)
while c:
print c
c = file.read(3)

file.close();

Feb 20 '06 #5
da********@yaho o.com wrote:
I think this is what you want:

file = open(r'c:/test.txt','r')

c = file.read(3)
while c:
print c
c = file.read(3)

file.close();

Or:

def read3():
return file.read(3)
for chars in iter(read3, ''):
... do something with chars ...

STeVe
Feb 20 '06 #6
"nuttydevil " <sj***@sussex.a c.uk> wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


did you read the string chapter in the tutorial ?

http://docs.python.org/tut/node5.htm...00000000000000

around the middle of that chapter, there's a section on slicing:

"substrings can be specified with the slice notation: two indices
separated by a colon"

</F>

Feb 20 '06 #7
If you have already read the string into memory and want a convenient
way to loop through it 3 characters at a time, check out the "batch" recipe:

http://aspn.activestate.com/ASPN/Coo.../Recipe/303279

It uses itertools to make an iterator over the string, returning 3
characters at a time. Cool stuff.
nuttydevil wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

Feb 20 '06 #8
nuttydevil wrote:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?
I'm going to be optimistic and thank you for your help in advance!
Samantha.

data1 = '''FOOTFALLSECH OINTHEMEMORY
DOWNTHEPASSAGEW HICHWEDIDNOTTAK E
TOWARDSTHEDOORW ENEVEROPENED'''

num_codons = len(data1) // 3

codons = [ data1[3*i:3*(i+1)] for i in range( num_codons ) ]

print codons

class Codon(object):
#__slots__ = ['alpha', 'beta', 'gamma']
def __init__(self, a, b, c):
self.alpha = a
self.beta = b
self.gamma = c

codons = [ Codon(*codon) for codon in codons ]

print codons[0].alpha, codons[0].beta, codons[0].gamma

###output####

['FOO', 'TFA', 'LLS', 'ECH', 'OIN', 'THE', 'MEM', 'ORY', '\nDO', 'WNT',
'HEP', 'ASS', 'AGE', 'WHI', 'CHW', 'EDI', 'DNO', 'TTA', 'KE\n', 'TOW',
'ARD', 'STH', 'EDO', 'ORW', 'ENE', 'VER', 'OPE', 'NED']
F O O
Gerard

Feb 20 '06 #9
Sure. There's probably a thousand ways to do this.

Feb 20 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2550
by: christof hoeke | last post by:
hi, i wrote a small application which extracts a javadoc similar documentation for xslt stylesheets using python, xslt and pyana. using non-ascii characters was a problem. so i set the defaultending to UTF-8 and now everything works (at least it seems so, need to do more testing though). it may not be the most elegant solution (according to python in a nutshell) but it almost seems when doing xml processing it is mandatory to set the...
1
2154
by: webworldL | last post by:
Has anybody had any luck processing XHTML1.1 documents with xml.sax? Whenever I try it, python loads the W3C DTD from the top, then crashes saying that there's an error in the external DTD. All I need to do is rip through a bunch of XHTML documents and extract some data, does anybody know a quick way to do this without sax making outgoing network connections and fussing with DTDs? BTW, the code to reproduce the error if anybody cares:...
3
1377
by: anthony hornby | last post by:
Hi, I am starting my honours degree project and part of it is going to be manipulating ASCII encoded XML files from a legacy database and converting them to Unicode and doing text processing stuff on the data. I am new to python ( total n00b ) but am keen to use it as the rest of the software my application has to extend is already written in python, plus I've always wanted to learn more about it - so here's my chance :-) I've written...
1
2360
by: Anthony Liu | last post by:
I believe that topic related to Chinese processing was discussed before. I could not dig out the info I want from the mail list archive. My Python script reads some Chinese text and then split a line delimited by white spaces. I got lists like
6
4984
by: James Radke | last post by:
Hello, I have a multithreaded windows NT service application (vb.net 2003) that I am working on (my first one), which reads a message queue and creates multiple threads to perform the processing for long running reports. When the processing is complete it uses crystal reports to load a template file, populate it, and then export it to a PDF. It works fine so far....
1
1623
by: ankit | last post by:
There are various packages availaible for XML processing using python. So which to choose and when. I summarized some of the features, advantages and disadvantages of some packages int the following text. Have a look to it. May this get out of the dillema of choice. Here we go: OPTIONS ========= - libxml2
4
3600
by: Alexis Gallagher | last post by:
(I tried to post this yesterday but I think my ISP ate it. Apologies if this is a double-post.) Is it possible to do very fast string processing in python? My bioinformatics application needs to scan very large ASCII files (80GB+), compare adjacent lines, and conditionally do some further processing. I believe the disk i/o is the main bottleneck so for now that's what I'm optimizing. What I have now is roughly as follows (on python...
4
1519
by: ferrad | last post by:
I have not used Python before, but believe it may be what I need. I have large text files containing text, numbers, and junk. I want to delete large chunks process other bits, etc, much like I'd do in an editor, but want to do it automatically. I have a set of generic rules that my fingers follow to process these files, which all follow a similar template. Question: can I translate these types of rules into programmatical constructs...
1
3437
by: Xah Lee | last post by:
Text Processing with Emacs Lisp Xah Lee, 2007-10-29 This page gives a outline of how to use emacs lisp to do text processing, using a specific real-world problem as example. If you don't know elisp, first take a gander at Emacs Lisp Basics. HTML version with links and colors is at: http://xahlee.org/emacs/elisp_text_processing.html
3
2649
by: John Carlyle-Clarke | last post by:
Hi. I'm new to Python and trying to use it to solve a specific problem. I have an XML file in which I need to locate a specific text node and replace the contents with some other text. The text in question is actually about 70k of base64 encoded data. I wrote some code that works on my Linux box using xml.dom.minidom, but it will not run on the windows box that I really need it on. Python 2.5.1 on both.
0
8392
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8305
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8605
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7321
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6163
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4151
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4301
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2726
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1607
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.