473,320 Members | 1,694 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

deleting texts between patterns

hi
say i have a text file

line1
line2
line3
line4
line5
line6
abc
line8 <---to be delete
line9 <---to be delete
line10 <---to be delete
line11 <---to be delete
line12 <---to be delete
line13 <---to be delete
xyz
line15
line16
line17
line18

I wish to delete lines that are in between 'abc' and 'xyz' and print
the rest of the lines. Which is the best way to do it? Should i get
everything into a list, get the index of abc and xyz, then pop the
elements out? or any other better methods?
thanks

May 12 '06 #1
14 1656

mickle...@hotmail.com wrote:
hi
say i have a text file

line1
line2
line3
line4
line5
line6
abc
line8 <---to be delete
line9 <---to be delete
line10 <---to be delete
line11 <---to be delete
line12 <---to be delete
line13 <---to be delete
xyz
line15
line16
line17
line18

I wish to delete lines that are in between 'abc' and 'xyz' and print
the rest of the lines. Which is the best way to do it? Should i get
everything into a list, get the index of abc and xyz, then pop the
elements out? or any other better methods?
thanks


In other words ...
lines = open('test.txt').readlines()
for line in lines[lines.index('abc\n') + 1:lines.index('xyz\n')]:
lines.remove(line)
for line in lines:
print line,

Regular expressions are better in this case
import re
pat = re.compile('abc\n.*?xyz\n', re.DOTALL)
print re.sub(pat, '', open('test.txt').read())

May 12 '06 #2
wrote:
hi
say i have a text file

line1
line2
line3
line4
line5
line6
abc
line8 <---to be delete
line9 <---to be delete
line10 <---to be delete
line11 <---to be delete
line12 <---to be delete
line13 <---to be delete
xyz
line15
line16
line17
line18

I wish to delete lines that are in between 'abc' and 'xyz' and print
the rest of the lines. Which is the best way to do it? Should i get
everything into a list, get the index of abc and xyz, then pop the
elements out? or any other better methods?
thanks


Something like this (untested code):

def filtered(f, stop, restart):
f = iter(f)
for line in f:
yield line
if line==stop:
break
for line in f:
if line==restart:
yield line
break
for line in f:
yield line

for line in filtered(open('thefile'), "abc\n", "xyz\n"):
print line
May 12 '06 #3

<mi*******@hotmail.com> skrev i meddelandet news:11**********************@i40g2000cwc.googlegr oups.com...
hi
say i have a text file

line1
line2
line3
line4
line5
line6
abc
line8 <---to be delete
line9 <---to be delete
line10 <---to be delete
line11 <---to be delete
line12 <---to be delete
line13 <---to be delete
xyz
line15
line16
line17
line18

I wish to delete lines that are in between 'abc' and 'xyz' and print
the rest of the lines. Which is the best way to do it? Should i get
everything into a list, get the index of abc and xyz, then pop the
elements out? or any other better methods?


what's wrong with a simple

emit = True
for line in open("q.txt"):
if line == "xyz\n":
emit = True
if emit:
print line,
if line == "abc\n":
emit = False

loop ? (this is also easy to tweak for cases where you don't want to include
the patterns in the output).

to print to a file instead of stdout, just replace the print line with a f.write call.

</F>

May 12 '06 #4
On 12/05/2006 6:11 PM, Ravi Teja wrote:
mickle...@hotmail.com wrote:
hi
say i have a text file

line1 [snip]
line6
abc
line8 <---to be delete [snip] line13 <---to be delete
xyz
line15 [snip] line18

I wish to delete lines that are in between 'abc' and 'xyz' and print
the rest of the lines. Which is the best way to do it? Should i get
everything into a list, get the index of abc and xyz, then pop the
elements out? or any other better methods?
thanks
In other words ...
lines = open('test.txt').readlines()
for line in lines[lines.index('abc\n') + 1:lines.index('xyz\n')]:
lines.remove(line)


I don't think that's what you really meant.
lines = ['blah', 'fubar', 'abc\n', 'blah', 'fubar', 'xyz\n', 'xyzzy']
for line in lines[lines.index('abc\n') + 1:lines.index('xyz\n')]: .... lines.remove(line)
.... lines ['abc\n', 'blah', 'fubar', 'xyz\n', 'xyzzy']

Uh-oh.

Try this:
lines = ['blah', 'fubar', 'abc\n', 'blah', 'fubar', 'xyz\n', 'xyzzy']
del lines[lines.index('abc\n') + 1:lines.index('xyz\n')]
lines ['blah', 'fubar', 'abc\n', 'xyz\n', 'xyzzy']
Of course wrapping it in try/except would be a good idea, not for the
slicing, which behaves itself and does nothing if the 'abc\n' appears
AFTER the 'xyz\n', but for the index() in case the sought markers aren't
there. Perhaps it might be a good idea even to do it carefully one piece
at a time: is the abc there? is the xyz there? is the xyz after the abc
-- then del[index1+1:index2].

I wonder what the OP wants to happen in a case like this:

guff1 xyz guff2 abc guff2 xyz guff3
or this:
guff1 abc guff2 abc guff2 xyz guff3
for line in lines:
print line,

Regular expressions are better in this case
Famous last words.
import re
pat = re.compile('abc\n.*?xyz\n', re.DOTALL)
print re.sub(pat, '', open('test.txt').read())


I don't think you really meant that either.
lines = ['blah', 'fubar', 'abc\n', 'blah', 'fubar', 'xyz\n', 'xyzzy']
linestr = "".join(lines)
linestr 'blahfubarabc\nblahfubarxyz\nxyzzy' import re
pat = re.compile('abc\n.*?xyz\n', re.DOTALL)
print re.sub(pat, '', linestr) blahfubarxyzzy
Uh-oh.

Try this:
pat = re.compile('(?<=abc\n).*?(?=xyz\n)', re.DOTALL)
re.sub(pat, '', linestr) 'blahfubarabc\nxyz\nxyzzy'

.... and I can't imagine why you're using the confusing [IMHO]
undocumented [AFAICT] feature that the first arg of the module-level
functions like sub and friends can be a compiled regular expression
object. Why not use this:
pat.sub('', linestr) 'blahfubarabc\nxyz\nxyzzy'
One-liner fanboys might prefer this:
re.sub('(?i)(?<=abc\n).*?(?=xyz\n)', '', linestr) 'blahfubarabc\nxyz\nxyzzy'


HTH,
John
May 12 '06 #5
mi*******@hotmail.com wrote:
hi
say i have a text file

line1
line2
line3
line4
line5
line6
abc
line8 <---to be delete
line9 <---to be delete
line10 <---to be delete
line11 <---to be delete
line12 <---to be delete
line13 <---to be delete
xyz
line15
line16
line17
line18

I wish to delete lines that are in between 'abc' and 'xyz' and print
the rest of the lines. Which is the best way to do it? Should i get
everything into a list, get the index of abc and xyz, then pop the
elements out?
Would be somewhat inefficient IMHO - at least for big files, since it
implies reading the whole file in memory.
or any other better methods?


Don't know if it's better for your actual use case, but this avoids
reading up the whole file:

def skip(iterable, skipfrom, skipuntil):
""" example usage :
f = open("/path/to/my/file.txt")
for line in skip_print(f, 'abc', 'yyz'):
print line
f.close()

"""
skip = False
for line in iterable:
if skip:
if line == skipuntil:
skip = False
continue
else:
if line == skipfrom:
skip = True
continue
yield line

def main():
lines = """
line1
line2
line3
line4
line5
line6
abc
line8 <---to be delete
line9 <---to be delete
line10 <---to be delete
line11 <---to be delete
line12 <---to be delete
line13 <---to be delete
xyz
line15
line16
line17
line18
""".strip().split()
for line in skip(lines, 'abc', 'xyz'):
print line
HTH

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
May 12 '06 #6
Fredrik Lundh wrote:
(snip)
to print to a file instead of stdout, just replace the print line with a f.write call.


Or redirect stdout to a file when calling the program !-)

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
May 12 '06 #7
bruno at modulix wrote:
mi*******@hotmail.com wrote:
(snip)
Don't know if it's better for your actual use case, but this avoids
reading up the whole file: def skip(iterable, skipfrom, skipuntil):
""" example usage :
>>> f = open("/path/to/my/file.txt")
>>> for line in skip_print(f, 'abc', 'yyz'):
>>> print line
>>> f.close()

"""

(snip code)

Forgot to say this will also skip markers. If you want to keep them, see
the effbot answer...

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
May 12 '06 #8
> I wish to delete lines that are in between 'abc' and
'xyz' and print the rest of the lines. Which is the best
way to do it?


While this *is* the python list, you don't specify whether
this is the end goal, or whether it's part of a larger
program. If it *is* the end goal (namely, you just want the
filtered output someplace), and you're not adverse to using
other tools, you can do something like

sed -n -e'1,/abc/p' -e'/xyz/,$p' file.txt

which is pretty straight-forward. It translates to

-n don't print each line by default
-e execute the following item
1,/abc/ from line 1, through the line where you match "abc"
p print each line
and also
-e execute the following item
/xyz/,$ from the line matching "abc" through the last line
p print each line
It assumes that
1) there's only one /abc/ & /xyz/ in the file (otherwise, it
defaults to the first one it finds in each case)
2) that they're in that order (otherwise, you'll get 2x each
line, rather than 0x each line)

However, it's a oneliner here, and seems to be a bit more
complex in python, so if you don't need to integrate the
results into further down-stream python processing, this
might be a nice way to go. If you need the python, others
on the list have offered a panoply of good answers already.

-tkc


May 12 '06 #9
On Fri, 12 May 2006 07:29:54 -0500,
Tim Chase <py*********@tim.thechases.com> wrote:
I wish to delete lines that are in between 'abc' and
'xyz' and print the rest of the lines. Which is the best
way to do it?
While this *is* the python list, you don't specify whether
this is the end goal, or whether it's part of a larger
program. If it *is* the end goal (namely, you just want the
filtered output someplace), and you're not adverse to using
other tools, you can do something like sed -n -e'1,/abc/p' -e'/xyz/,$p' file.txt


Or even

awk '/abc/,/xyz/' file.txt

Excluding the abc and xyz lines is left as an exercise to the
interested reader.

Regards,
Dan

--
Dan Sommers
<http://www.tombstonezero.net/dan/>
"I wish people would die in alphabetical order." -- My wife, the genealogist
May 12 '06 #10
Dan Sommers wrote:
Or even

awk '/abc/,/xyz/' file.txt

Excluding the abc and xyz lines is left as an exercise to the
interested reader.


Once again, us completely disinterested readers get the short end of the
stick. :)

--
Edward Elliott
UC Berkeley School of Law (Boalt Hall)
complangpython at eddeye dot net
May 12 '06 #11
>> I don't think that's what you really meant ^ 2

Right! That was very buggy. That's what I get for posting past 1 AM :-(.

May 12 '06 #12
Tim Chase <py*********@tim.thechases.com> writes:
I wish to delete lines that are in between 'abc' and
'xyz' and print the rest of the lines. Which is the best
way to do it?


sed -n -e'1,/abc/p' -e'/xyz/,$p' file.txt

which is pretty straight-forward.


While it looks neat, it will not work when /abc/ matches line 1.
Non-standard versions of sed, e.g., GNU, allow you to use 0,/abc/
to neatly step around this nuisance; but for standard sed you'll
need a more complicated sed script.
--
John Savage (my news address is not valid for email)

May 20 '06 #13
John Machin <sj******@lexicon.net> writes:
Uh-oh.

Try this:
pat = re.compile('(?<=abc\n).*?(?=xyz\n)', re.DOTALL)
re.sub(pat, '', linestr) 'blahfubarabc\nxyz\nxyzzy'


This regexp still has a problem. It may remove the lines between two
lines like 'aaabc' and 'xxxyz' (and also removes the first two 'x's in
'xxxyz').

The following regexp works better:

pattern = re.compile('(?<=^abc\n).*?(?=^xyz\n)', re.DOTALL | re.MULTILINE)
lines = '''line1 .... abc
.... line2
.... xyz
.... line3
.... aaabc
.... line4
.... xxxyz
.... line5''' pattern = re.compile('(?<=^abc\n).*?(?=^xyz\n)', re.DOTALL | re.MULTILINE)
print pattern.sub('', lines) line1
abc
xyz
line3
aaabc
line4
xxxyz
line5


- Baoqiu

--
Baoqiu Cui <cbaoqiu at yahoo.com>
Jun 4 '06 #14
On 5/06/2006 2:51 AM, Baoqiu Cui wrote:
John Machin <sj******@lexicon.net> writes:
Uh-oh.

Try this:
> pat = re.compile('(?<=abc\n).*?(?=xyz\n)', re.DOTALL)
> re.sub(pat, '', linestr)

'blahfubarabc\nxyz\nxyzzy'


This regexp still has a problem. It may remove the lines between two
lines like 'aaabc' and 'xxxyz' (and also removes the first two 'x's in
'xxxyz').

The following regexp works better:

pattern = re.compile('(?<=^abc\n).*?(?=^xyz\n)', re.DOTALL | re.MULTILINE)


You are quite correct. Your reply, and the rejoinder below, only add to
the proposition that regexes are not necessarily the best choice for
every text-processing job :-)

Just in case the last line is 'xyz' but is not terminated by '\n':

pattern = re.compile('(?<=^abc\n).*?(?=^xyz$)', re.DOTALL | re.MULTILINE)

Cheers,
John
Jun 4 '06 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Design Pattern Catalog | last post by:
Thank you for your interest in "Design Patterns: Elements of Reusable Object-Oriented Design", by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. This message answers several...
2
by: Rob | last post by:
Hi everyone, I have a form that has several disabled text fields. In these text fields, I have initial values. Next to each text field lies a "copy" button. When clicked, the content of the...
13
by: John Salerno | last post by:
Here are a few I'm considering: Design Patterns Explained : A New Perspective on Object-Oriented Design (2nd Edition) (Software Patterns Series) by Alan Shalloway Design Patterns C# by...
12
by: Jeff | last post by:
I'm just getting up to speed on OOP patterns (e.g, MVC) and I'm wondering how closely they are followed out in the real world. Do those of you who use them try to follow them as closely as possible...
51
by: Joe Van Dyk | last post by:
When you delete a pointer, you should set it to NULL, right? Joe
62
by: ivan.leben | last post by:
How can I really delete a preloaded image from memory/disk cache? Let's say I preload an image by creating an Image object and setting its src attribute to desired URL: var img = new Image();...
2
by: bobbymusic | last post by:
Hey all, my problem is when I start to compare two texts with different lenght. The point is that the texts are loaded in two RTB and I don't know how to compare them. I found that I have to parse...
7
by: =?Utf-8?B?bWF2cmlja18xMDE=?= | last post by:
Hi, I would like to know more about design patterns and specifically using C#. Can any one recommend a good book? Thanks
10
by: kriz4321 | last post by:
I have many files in a dirctory in which I need to make the common subsitution. I need to delete all lines between two matched patterns I need to match a line having words " chkstats to...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.