By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,035 Members | 1,388 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,035 IT Pros & Developers. It's quick & easy.

How to delete lines between two words from text file?

P: 8
Hi,

i need a script, that deletes all the lines between two certain words in a text file. the two words appear several times in the text file.

i have a code, that works so far. the problem is, once it looped for the first time through the text file, the end_word appears before the new start_word. so i have to tell the program, to start searching for the end_word, after the start_word.

here is the code. i hope that makes it more understandable:

Expand|Select|Wrap|Line Numbers
  1. orig_file = open ("*Strata1CombinedAll.txt", "r")
  2. lines = orig_file.readlines()
  3.  
  4.  
  5.  
  6.  
  7. import fnmatch
  8. def find(seq, pattern):
  9.     pattern = pattern.lower()
  10.     for i, n in enumerate(seq):
  11.         if fnmatch.fnmatch(n.lower(), pattern):
  12.             return i
  13.             return -1
  14.  
  15. def index(seq, pattern):
  16.     result = find(seq, pattern)
  17.     if result == -1:
  18.         raise ValueError
  19.     return result
  20.  
  21.  
  22.  
  23. try:
  24.     for item in lines:
  25.         begin_line = index(lines, "*start_word*")
  26.         end_line = index(lines, "*end_word*") 
  27.         del lines[begin_line:end_line]
  28.  
  29. except:
  30.     new_text_file = open ("*test604", "w")
  31.     new_text_file.writelines(lines)
  32.     new_text_file.close()
  33.  
Dec 9 '10 #1

✓ answered by dwblas

Remove all records between "abc" and "def". There should be a way to do this using groupby from itertools but I don't have time now to try it. Perhaps someone else will post it.
Expand|Select|Wrap|Line Numbers
  1. test_data = ["abc", "def", "ghi", "xabcy", "xyz", "def", "mno", "rst"]
  2.  
  3. start = False
  4. saved_list = []
  5.  
  6. for rec in test_data:
  7.     if "abc" in rec.lower():
  8.         start = True
  9.     if not start:
  10.         saved_list.append(rec)
  11.         ## or in your case
  12.         ##output_file.write(rec)
  13.     if "def" in rec:
  14.         start = False
  15. print saved_list 

Share this Question
Share on Google+
4 Replies


Expert 100+
P: 624
In the function index(), result will never equal negative one. Also, under the try block you delete records from the same list that the for loop is using leading to errors. To illustrate, look at the length of the list originally and compare to how many records actually print (you should copy the records you want to keep to a new list instead of deleting from the original list).
Expand|Select|Wrap|Line Numbers
  1. test_data = [["abc"], ["def"], ["ghi"], ["abc"], ["xyz"], ["abc"], ["mno"], ["rst"]]
  2. ctr = 0
  3. for rec in test_data:
  4.     ctr += 1
  5.     print ctr, rec
  6.     if ctr%2:
  7.         del test_data[ctr] 
Dec 9 '10 #2

Expert 100+
P: 624
Remove all records between "abc" and "def". There should be a way to do this using groupby from itertools but I don't have time now to try it. Perhaps someone else will post it.
Expand|Select|Wrap|Line Numbers
  1. test_data = ["abc", "def", "ghi", "xabcy", "xyz", "def", "mno", "rst"]
  2.  
  3. start = False
  4. saved_list = []
  5.  
  6. for rec in test_data:
  7.     if "abc" in rec.lower():
  8.         start = True
  9.     if not start:
  10.         saved_list.append(rec)
  11.         ## or in your case
  12.         ##output_file.write(rec)
  13.     if "def" in rec:
  14.         start = False
  15. print saved_list 
Dec 9 '10 #3

P: 8
the last code would work. thank you dwblas!
i have it like this now:

Expand|Select|Wrap|Line Numbers
  1. orig_file = open (input(), "r")
  2. lines = orig_file.readlines()
  3.  
  4. start = False
  5. saved_list = []
  6. for rec in lines:
  7.     if "count" in rec:
  8.         start = True
  9.     if not start:
  10.         saved_list.append(rec)
  11.     if "Volume" in rec:
  12.         start= False
  13.  
  14.  
  15. new_text_file = open (input(), "w")
  16. new_text_file.writelines(saved_list)
  17. new_text_file.close()
  18.  
the only problem that is left is, that the last line shall not be deleted. the last line always has the word 'volume' in there. the line before the word 'volume' is always a blank line, which shall be deleted.
any ideas how to get this to work?
Dec 10 '10 #4

P: 8
i figured it out. the following code, does what i need to do.

Expand|Select|Wrap|Line Numbers
  1. orig_file = open (input(), "r")
  2. lines = orig_file.readlines()
  3.  
  4. start = False
  5. saved_list = []
  6. for rec in lines:
  7.     if "count" in rec:
  8.         start = True
  9.     if "Volume" in rec:
  10.         start= False
  11.     if not start:
  12.         saved_list.append(rec) 
  13.  
  14. new_text_file = open (input(), "w")
  15. new_text_file.writelines(saved_list)
  16. new_text_file.close()
Dec 11 '10 #5

Post your reply

Sign in to post your reply or Sign up for a free account.