Text processing and file creation

I have a text source file of about 20.000 lines.

>From this file, I like to write the first 5 lines to a new file. Close

that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
In advance, thanks for your help.

Sep 5 '07 #1

Subscribe Post Reply

1423

kyosohma

On Sep 5, 11:13 am, "malibus...@gmail.com" <malibus...@gmail.com>
wrote:

I have a text source file of about 20.000 lines.>From this file, I like to write the first 5 lines to a new file. Close

that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
In advance, thanks for your help.

I would use a counter in a for loop using the readline method to
iterate over the 20,000 line file. Reset the counter every 5 lines/
iterations and close the file. To name files with unique names, use
the time module. Something like this:

x = 'filename-%s.txt' % time.time()

Have fun!

Mike

Sep 5 '07 #2

Arnau Sanchez

ma********@gmail.com escribió:

I have a text source file of about 20.000 lines.
>>From this file, I like to write the first 5 lines to a new file. Close
that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?

Perhaps you could provide some code to see how you approached it?

Sep 5 '07 #3

Bjoern Schliessmann

ky******@gmail.com wrote:

I would use a counter in a for loop using the readline method to
iterate over the 20,000 line file.

file objects are iterables themselves, so there's no need to do that
by using a method.

Reset the counter every 5 lines/ iterations and close the file.

I'd use a generator that fetches five lines of the file per
iteration and iterate over it instead of the file directly.

Have fun!

Definitely -- and also do your homework yourself :)

Regards,
Björn

--
BOFH excuse #339:

manager in the cable duct

Sep 5 '07 #4

Shawn Milochik

On 9/5/07, ma********@gmail.com <ma********@gmail.comwrote:

I have a text source file of about 20.000 lines.
From this file, I like to write the first 5 lines to a new file. Close
that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
In advance, thanks for your help.

I have written a working test of this. Here's the basic setup:

open the input file

function newFileName:
generate a filename (starting with 00001.tmp).
If filename exists, increment and test again (0002.tmp and so on).
return fileName

read a line until input file is empty:

test to see whether I have written five lines. If so, get a new
file name, close file, and open new file

write line to file

close output file final time
Once you get some code running, feel free to post it and we'll help.

Sep 5 '07 #5

kyosohma

On Sep 5, 11:57 am, Bjoern Schliessmann <usenet-
mail-0306.20.chr0n...@spamgourmet.comwrote:

kyoso...@gmail.com wrote:
I would use a counter in a for loop using the readline method to
iterate over the 20,000 line file.

file objects are iterables themselves, so there's no need to do that
by using a method.

Very true! Darn it!

>
Reset the counter every 5 lines/ iterations and close the file.

I'd use a generator that fetches five lines of the file per
iteration and iterate over it instead of the file directly.

I still haven't figured out how to use generators, so this didn't even
come to mind. I usually see something like this example for reading a
file:

f = open(somefile)
for line in f:
# do something
http://docs.python.org/tut/node9.html

Okay, so they didn't use readline. I wonder where I saw that.

Have fun!

Definitely -- and also do your homework yourself :)

Regards,

Björn

--
BOFH excuse #339:

manager in the cable duct

Mike

Sep 5 '07 #6

Paddy

On Sep 5, 5:13 pm, "malibus...@gmail.com" <malibus...@gmail.com>
wrote:

I have a text source file of about 20.000 lines.>From this file, I like to write the first 5 lines to a new file. Close

that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
In advance, thanks for your help.

If its on unix: use split.
If its your homework: show us what you have so far...

- Paddy.

Sep 5 '07 #7

malibuster

On Sep 5, 1:28 pm, Paddy <paddy3...@googlemail.comwrote:

On Sep 5, 5:13 pm, "malibus...@gmail.com" <malibus...@gmail.com>
wrote:

I have a text source file of about 20.000 lines.>From this file, I like to write the first 5 lines to a new file. Close

that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
In advance, thanks for your help.

If its on unix: use split.
If its your homework: show us what you have so far...

- Paddy.

Paddy,

Thanks for making me aware of the (UNIX) split command (split -l 5
inFile.txt), it's short, it's fast, it's beautiful.

I am still wondering how to do this efficiently in Python (being kind
of new to it... and it's not for homework).

-- Martin.
I am still wondering how to do this in Python (being new to Python)

Sep 5 '07 #8

Arnaud Delobelle

On Sep 5, 5:13 pm, "malibus...@gmail.com" <malibus...@gmail.com>
wrote:

I have a text source file of about 20.000 lines.>From this file, I like to write the first 5 lines to a new file. Close

that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?

Sure!

In advance, thanks for your help.

from my_useful_functions import new_file, write_first_5_lines,
done_processing_file, grab_next_5_lines, another_new_file, write_these

in_f = open('myfile')
out_f = new_file()
write_first_5_lines(in_f, out_f) # write first 5 lines
close(out_f)
while not done_processing_file(in_f): # until done processing
lines = grab_next_5_lines(in_f) # grab next 5 lines
out_f = another_new_file()
write_these(lines, out_f) # write these
close(out_f)
print "all done!" # All done
print "Now there are 4000 files in this directory..."

Python 3.0 - ready (I've used open() instead of file())

HTH

--
Arnaud

Sep 5 '07 #9

Steve Holden

Arnaud Delobelle wrote:
[...]

from my_useful_functions import new_file, write_first_5_lines,
done_processing_file, grab_next_5_lines, another_new_file, write_these

in_f = open('myfile')
out_f = new_file()
write_first_5_lines(in_f, out_f) # write first 5 lines
close(out_f)
while not done_processing_file(in_f): # until done processing
lines = grab_next_5_lines(in_f) # grab next 5 lines
out_f = another_new_file()
write_these(lines, out_f) # write these
close(out_f)
print "all done!" # All done
print "Now there are 4000 files in this directory..."

Python 3.0 - ready (I've used open() instead of file())

bzzzzzzzzzzt!

Python 3.0a1 (py3k:57844, Aug 31 2007, 16:54:27) ...
Type "help", "copyright", "credits" or "license" for more information.

>>print "all done!" # All done

File "<stdin>", line 1
print "all done!" # All done
^
SyntaxError: invalid syntax

>>>

Close, but no cigar ;-)

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Sep 5 '07 #10

Arnaud Delobelle

On Sep 6, 12:46 am, Steve Holden <st...@holdenweb.comwrote:

Arnaud Delobelle wrote:

[...]

print "all done!" # All done
print "Now there are 4000 files in this directory..."

Python 3.0 - ready (I've used open() instead of file())

bzzzzzzzzzzt!

Python 3.0a1 (py3k:57844, Aug 31 2007, 16:54:27) ...
Type "help", "copyright", "credits" or "license" for more information.

>>print "all done!" # All done

File "<stdin>", line 1
print "all done!" # All done
^
SyntaxError: invalid syntax

>>>

Damn! That'll teach me to make such bold claims.
At least I'm unlikely to forget again now...

--
Arnaud

Sep 6 '07 #11

Alberto Griggio

Thanks for making me aware of the (UNIX) split command (split -l 5

inFile.txt), it's short, it's fast, it's beautiful.

I am still wondering how to do this efficiently in Python (being kind
of new to it... and it's not for homework).

Something like this should do the job:

def nlines(num, fileobj):
done = [False]
def doit():
for i in xrange(num):
l = fileobj.readline()
if not l:
done[0] = True
return
yield l
while not done[0]:
yield doit()

for i, group in enumerate(nlines(5, open('bigfile.txt'))):
out = open('chunk_%d.txt' % i)
for line in group:
out.write(line)

I am still wondering how to do this in Python (being new to Python)

This is just one way of doing it, but not as concise as using split...

Alberto

Sep 6 '07 #12

Arnau Sanchez

ma********@gmail.com escribió:

I am still wondering how to do this efficiently in Python (being kind
of new to it... and it's not for homework).

You should post some code anyway, it would be easier to give useful advice (it
would also demonstrate that you put some effort on it).

Anyway, here is an option. Text-file objects are line-iterable, so you could use
itertools (perhaps a bit difficult module for a newbie...):

from itertools import islice, takewhile, repeat

def take(it, n):
return list(islice(it, n))

def readnlines(fd, n):
return takewhile(bool, (take(fd, n) for _ in repeat(None)))

def splitfile(path, prefix, nlines, suffix_digits):
sformat = "%%0%dd" % suffix_digits
for index, lines in enumerate(readnlines(file(path), nlines)):
open("%s_%s"%(prefix, sformat % index), "w").writelines(lines)

splitfile("/etc/services", "out", 5, 4)

arnau

Sep 6 '07 #13

Shawn Milochik

Here's my solution, for what it's worth:

#!/usr/bin/env python

import os

input = open("test.txt", "r")

counter = 0
fileNum = 0
fileName = ""

def newFileName():

global fileNum, fileName
while os.path.exists(fileName) or fileName == "":
fileNum += 1
x = "%0.5d" % fileNum
fileName = "%s.tmp" % x

return fileName
for line in input:

if (fileName == "") or (counter == 5):
if fileName:
output.close()
fileName = newFileName()
counter = 0
output = open(fileName, "w")

output.write(line)
counter += 1

output.close()

Sep 6 '07 #14

=?ISO-8859-1?Q?Ricardo_Ar=E1oz?=

Shawn Milochik wrote:

On 9/5/07, ma********@gmail.com <ma********@gmail.comwrote:
>I have a text source file of about 20.000 lines.
>From this file, I like to write the first 5 lines to a new file. Close
that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
In advance, thanks for your help.

Maybe (untested):

def read5Lines(f):
L = f.readline()
while L :
yield (L,f.readline(),f.readline(),f.readline(),f.readli ne())
L = f.readline()

in = open('C:\YourFile','rb')
for fileNo, fiveLines in enumerate(read5Lines(in)) :
out = open('c:\OutFile'+str(fileNo), 'wb')
out.writelines(fiveLines)
out.close()

or something similar? (notice that in the last output file you may have
a few (4 at most) blank lines)

Sep 7 '07 #15

George Sakkis

On Sep 5, 5:17 pm, "malibus...@gmail.com" <malibus...@gmail.com>
wrote:

On Sep 5, 1:28 pm, Paddy <paddy3...@googlemail.comwrote:

On Sep 5, 5:13 pm, "malibus...@gmail.com" <malibus...@gmail.com>
wrote:

I have a text source file of about 20.000 lines.>From this file, I like to write the first 5 lines to a new file. Close

that file, grab the next 5 lines write these to a new file... grabbing
5 lines and creating new files until processing of all 20.000 lines is
done.
Is there an efficient way to do this in Python?
In advance, thanks for your help.

If its on unix: use split.
If its your homework: show us what you have so far...

- Paddy.

Paddy,

Thanks for making me aware of the (UNIX) split command (split -l 5
inFile.txt), it's short, it's fast, it's beautiful.

I am still wondering how to do this efficiently in Python (being kind
of new to it... and it's not for homework).

-- Martin.

I am still wondering how to do this in Python (being new to Python)

If this was a code golf challenge, a decent entry (146 chars) could
be:

import itertools as it
for i,g in it.groupby(enumerate(open('input.txt')),lambda(i,_ ):i/
5):open("output.%d.txt"%i,'w').writelines(s for _,s in g)

or a bit less cryptically:

import itertools as it
for chunk,enum_lines in it.groupby(enumerate(open('input.txt')),
lambda (i,line): i//5):
open("output.%d.txt" % chunk, 'w').writelines(line for _,line
in enum_lines)
George

Sep 7 '07 #16

Paddy

On Sep 7, 3:50 am, George Sakkis <george.sak...@gmail.comwrote:

On Sep 5, 5:17 pm, "malibus...@gmail.com" <malibus...@gmail.com>
wrote:
If this was a code golf challenge,

I'd choose the Unix split solution and be both maintainable as well as
concise :-)

- Paddy.

Sep 7 '07 #17

by: Rare Book School | last post by:

RARE BOOK SCHOOL 2005 Rare Book School is pleased to announce its schedule of courses for 2005, including sessions at the University of Virginia, the Walters Art Museum/Johns Hopkins University...

.NET Framework

Text File parsing

by: Imran | last post by:

hello all, I have to parse a text file and get some value in that. text file content is as follows. ####TEXT FILE CONTENT STARTS HERE ##### /start first 0x1234 AC /end

C / C++

Dynamic Controls Without Doubling Processing Overhead

by: Dave Williamson | last post by:

When a ASPX page is created with dynamic controls based on what the user is doing the programmer must recreate the dynamic controls again on PostBack in the Page_Load so that it's events are wired...

ASP.NET

append to the beginning of a text file

by: Eddie Suey | last post by:

I want to add a new line to the begining of a text file. I dont want to write over existing data. How do I do this? the file is about 7 mb.

Visual Basic .NET

fast text processing

by: Alexis Gallagher | last post by:

(I tried to post this yesterday but I think my ISP ate it. Apologies if this is a double-post.) Is it possible to do very fast string processing in python? My bioinformatics application needs to...

Python

Split large text file by number of lines?

by: ivan.perak | last post by:

Hello, im a beginner in VB.NET... The thing i would like to do is as it follows.... I have a text file (list of names, every name to the next line) which is about 350000 lines long. I would...

Visual Basic .NET

Read text file to Temp file and apply formating and color changes

by: =?Utf-8?B?QnJpYW4gQ29vaw==?= | last post by:

I want to open a text file and format it into a specific line and then apply color to a specific location of the text and then display it in a RichTextBox after all of this is done. I can do all...

C# / C Sharp

Text retrieval systems - 7: the Software and Data

by: JosAH | last post by:

Greetings, Introduction Last week I was a bit too busy to cook up this part of the article series; sorry for that. This article part wraps up the Text Processing article series. The ...

Java

emacs lisp as text processing language...

by: Xah Lee | last post by:

Text Processing with Emacs Lisp Xah Lee, 2007-10-29 This page gives a outline of how to use emacs lisp to do text processing, using a specific real-world problem as example. If you don't know...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Text processing and file creation

Similar topics