473,421 Members | 1,618 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,421 software developers and data experts.

Deleting lines from a file

Hi,

I need to write a program which reads an external text file. Each time
it reads, then it needs to delete some lines, for instance from second
line to 55th line. The file is really big, so what do you think is the
fastest method to delete specific lines in a text file ?

Thanks

Dec 17 '07 #1
8 2199
Horacius ReX wrote:
Hi,

I need to write a program which reads an external text file. Each time
it reads, then it needs to delete some lines, for instance from second
line to 55th line. The file is really big, so what do you think is the
fastest method to delete specific lines in a text file ?
Not using a file but a database instead. If that's not possible, you can't
do anything but open/read/filter/write - filesystems (at least not the
known ones) don't support random deletion.

Diez
Dec 17 '07 #2
Horacius ReX wrote:
Hi,

I need to write a program which reads an external text file. Each time
it reads, then it needs to delete some lines, for instance from second
line to 55th line. The file is really big, so what do you think is the
fastest method to delete specific lines in a text file ?

Thanks
One way would be to "mark" the lines as being deleted by either:

1) replacing them with some known character sequence that you treat as deleted.
This assumes that the lines are long enough.

or

2) by keeping a separate dictionary that holds line numbers and deleteflag.
Pickle and dump this dictionary before program execution ends. Load it at
program execution beginning.

deletedFlags={1:False, 2: True, ...}

def load():
pFiles="deletedLines.toc"
fp=open(pFiles, 'wb')
deletedFlags=pickle.dump(fp)
fp.close()
def dump(deletedFlags):
pFiles="deletedLines.toc"
fp=open(pFiles, 'rb')
pickle.dump(deletedFlags, fp)
fp.close()

Caveats:

1) you must write EXACTLY the same number of bytes (padded with spaces, etc.) on
top of deleted lines. This method doesn't work if any of the lines
are so short they don't support your <DELETEDflag string.

2) You must be very careful to maintain consistency of the deletedFlags
dictionary and the data file (by using try/except/finally around your entire
process).

Personally I would employ method #2 and periodically "pack" the file with a
separate process. That could run unattended (e.g. at night). Or, if I did this
a lot, I would use a database instead.

-Larry
Dec 17 '07 #3

On Dec 17, 2007, at 5:34 AM, Horacius ReX wrote:
I need to write a program which reads an external text file. Each time
it reads, then it needs to delete some lines, for instance from second
line to 55th line. The file is really big, so what do you think is the
fastest method to delete specific lines in a text file ?
AFAIK, there really isn't much you can do to *speed* the reading and
writing of the large text file. But maybe you can avoid doing it too
much. If you must make many changes it might help to just keep a list
of lines to consider "deleted" -- and write the modified file out later.

hth,
Michael

---
"I use tuples simply because of their mellifluous appellation." --Neil
Cerutti

Dec 17 '07 #4
and regardless of the speed, what do you think would be the best
method to do this ?

Michael Bentley wrote:
On Dec 17, 2007, at 5:34 AM, Horacius ReX wrote:
I need to write a program which reads an external text file. Each time
it reads, then it needs to delete some lines, for instance from second
line to 55th line. The file is really big, so what do you think is the
fastest method to delete specific lines in a text file ?

AFAIK, there really isn't much you can do to *speed* the reading and
writing of the large text file. But maybe you can avoid doing it too
much. If you must make many changes it might help to just keep a list
of lines to consider "deleted" -- and write the modified file out later.

hth,
Michael

---
"I use tuples simply because of their mellifluous appellation." --Neil
Cerutti
Dec 17 '07 #5
I need to write a program which reads an external text file. Each time
it reads, then it needs to delete some lines, for instance from second
line to 55th line. The file is really big, so what do you think is the
fastest method to delete specific lines in a text file ?
Generally, with files that are "really big", you either want to
edit them in place (which takes a database-type structure), or
you need to stream through the file a line/window at a time,
dumping the output to a temporary output file. The *nix tool for
this job is sed:

sed '2,55d' infile.txt outfile.txt

(it doesn't get much more consise than this).

That's about the same as the following in Python

out = file('outfile.txt', 'w')
for i, line in enumerate(file('infile.txt')):
if 1 < i < 54: continue
out.write(line)
out.close()

If you want it "in place", sed will do the output file and
renaming for you with

sed -i '2,55d' file.txt

whereas in the Python variant, you'd have to then use the
os.rename call to move outfile.txt to infile.txt

The Python version is a bit more flexible, as you can add other
logic to change your bounds. Not that sed isn't flexible, but it
starts getting unreadible very quickly as logic grows.

-tkc
Dec 17 '07 #6
Horacius ReX wrote:
and regardless of the speed, what do you think would be the best
method to do this ?
Without more information about the contents of the file and who's reading
them, we can't say more.

if the reader is not under your control & doesn't deal with deletion-marks
or anything such in the file, you can't do anything but really delete the
lines.

If you can control it, it depends on how you process the file - has it fixed
line length, or not, and so forth. Because you need to use seek to position
the file-pointer to the proper location in the file to write a deletion
mark, but to do so you of course need to determine it first - and that will
need to be done in a two-pass apporach most probably.

Diez
Dec 17 '07 #7
On 12/17/07, Horacius ReX <ho**********@gmail.comwrote:
>
and regardless of the speed, what do you think would be the best
method to do this ?
use sqlite

--
Vladimir Rusinov
GreenMice Solutions: IT-решения на базе Linux
http://greenmice.info/
Dec 17 '07 #8

On Dec 17, 2007, at 6:25 AM, Horacius ReX wrote:
and regardless of the speed, what do you think would be the best
method to do this ?

The first thing I'd look into is reading the whole file into memory,
making all the deletions, and finally writing it out. But you said
the file is big, so here's a quick stab at it (with multiple read
passes and a single write):

import string
rm = []

#first pass through file -- mark some lines for deletion
for line, text in enumerate(file('words')):
if text[0] in string.uppercase:
rm.append(line)

#second pass -- mark lines with 'e' for deletion
for line, text in enumerate(file('words')):
if line in rm:
print 'skipping %s' % line
continue
if 'e' in text:
rm.append(line)

# now write the modified file
print 'Writing %d of %d lines' % (len(rm), line)
outFile = file('newWords', 'w')
for line, text in enumerate(file('words')):
if line not in rm:
outFile.write(text)

hth,
Michael

---
Simplicity is the ultimate sophistication. -Leonardo da Vinci

Dec 17 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: John Aherne | last post by:
Being a bit of a newbie, I hope this question isn't too stupid. I have searched the archives and docs for any reports about files and csv messages and not found anything that mentions the problem I...
6
by: Martin Bischoff | last post by:
Hi, I'm creating temporary directories in my web app (e.g. ~/data/temp/temp123) to allow users to upload files. When I later delete these directories (from the code behind), the application...
5
by: Patrick Vanden Driessche | last post by:
Hi All, I'm currently writing an in-house Form validation framework (WinForms) which is based on 'Component'-inheriting object. So basically, I have a small hierarchy. FormValidator +--...
5
by: George | last post by:
VB.net 2003 standard, XP windows home edition. Installed first application OK today. When I removed the application via Control Panel, there were no problems and the app folders were deleted. ...
2
by: SiouxieQ | last post by:
Hi there, I'm using the code below to try to delete a name from a list of names in a file. Unfortunately it doesn't quite do what I want it to. Instead of looking for the name in the...
2
by: fool | last post by:
Dear group, I am a beginner in php and I was little bit experience in C language. I want to read a file's content and delete the first line of the file and display those lines which has got...
13
by: programming | last post by:
how do i delete from a text file 1 of the following lines: jon|scott adam|smith <--delete paul|clark say i would like to delete the middle line of this txt, in member.txt what php code or...
4
by: blackice | last post by:
Hello All, i have a Perl Script that deleting Zones from named.conf file and here is the script #!/usr/bin/perl -w use strict; print "please enter the domain name: "; chomp (my...
0
by: aadsaca | last post by:
Deleting/Removing lines on Text File in VB -------------------------------------------------------------------------------- Hi there, i just want to know the syntax on how to remove line on...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.