473,378 Members | 1,413 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Text file manipulation.

Hi there, im trying to create a python program that can read a text file line by line and search for specified words/text/strings and remove them from the text file. Then finally save the modified text file to an output file. The only problem is, the text file contains code for a BACKSPACE typed in the text. e.g. "<BACKSPACE>" this needs to be removed which sounds quite simple, but often there are numbers involved. e.g. "<BACKSPACE: 4>" which would represent 4 backspaces, and so the string needs to be removed and 4 backspaces take place on the text before the code. e.g. "repatre<BACKSPACE: 4>resent" would become "represent" I know a search and delete script that can search for a word at the beggining of the line and then delete the whole line, but im not sure about this. Any ideas would be great.
Cheers,
Joe
Jan 22 '08 #1
5 4234
kudos
127 Expert 100+
hi,
start by searching for <BACKSPACE (for instance by using the find method). Note the index of it, then check what the value of the char after BACKSPACE (you handle it differently if it is & or :). substring the string and clue it together without the backspace, and go n characters back, given what the value is. I would recommend to string replace every <BACKSPACE> with <BACKSPACE: 1> (so you only need to handle one case)

Did this help, or was my post confusing?
-kudos

Hi there, im trying to create a python program that can read a text file line by line and search for specified words/text/strings and remove them from the text file. Then finally save the modified text file to an output file. The only problem is, the text file contains code for a BACKSPACE typed in the text. e.g. "<BACKSPACE>" this needs to be removed which sounds quite simple, but often there are numbers involved. e.g. "<BACKSPACE: 4>" which would represent 4 backspaces, and so the string needs to be removed and 4 backspaces take place on the text before the code. e.g. "repatre<BACKSPACE: 4>resent" would become "represent" I know a search and delete script that can search for a word at the beggining of the line and then delete the whole line, but im not sure about this. Any ideas would be great.
Cheers,
Joe
Jan 22 '08 #2
hi,
start by searching for <BACKSPACE (for instance by using the find method). Note the index of it, then check what the value of the char after BACKSPACE (you handle it differently if it is & or :). substring the string and clue it together without the backspace, and go n characters back, given what the value is. I would recommend to string replace every <BACKSPACE> with <BACKSPACE: 1> (so you only need to handle one case)

Did this help, or was my post confusing?
-kudos
Hi there,

Im a little confused. Im not sure I understand the string replace method you mentioned at the end of your post.
cheers,
Joe
Jan 23 '08 #3
Hi,
Here is a solution. I can write results to a txt file which is great. All I need to do now is read from a text which im guessing will just go in place of the 'test' section in the code and to be able to iterate this code so that I can run through a whole paragraph containing multiple <BACKSPACE> sections removing all (rather than getting to the first one and terminating.) Would you have any ideas looking at this to do such a thing? Any help would be great. Cheers. Joe

Expand|Select|Wrap|Line Numbers
  1. #! /usr/bin/python
  2.  
  3. import re
  4. # Global variable
  5.  
  6.  
  7. bs = re.compile('<BACKSPACE(:[ ]*[0-9]+)?>')
  8.  
  9.  
  10. #theInFile = open("test2.txt", "r")
  11.  
  12. theOutFile = open("backspace_out.txt", "w")
  13.  
  14.  
  15.  
  16. tests = ['re <BACKSPACE>present', 'This is bound to repatreaaaabbbbcccc <BACKSPACE: 17>resent']
  17.  
  18. def bs_remove(s):
  19.  
  20.     global bs
  21.  
  22.     for m in bs.finditer(s):
  23.  
  24.         if m.groups()[0] is None:
  25.  
  26.             return s[:m.start() - 1] + s[m.end():]
  27.  
  28.         else:
  29.  
  30.             return s[:m.start() - int(m.groups()[0][1:])] + s[m.end():]
  31.  
  32.  
  33.  
  34. for s in tests:
  35.  
  36.       theOutFile.write (bs_remove(s)+ " ")
Jan 23 '08 #4
bvdet
2,851 Expert Mod 2GB
Hi,
Here is a solution. I can write results to a txt file which is great. All I need to do now is read from a text which im guessing will just go in place of the 'test' section in the code and to be able to iterate this code so that I can run through a whole paragraph containing multiple <BACKSPACE> sections removing all (rather than getting to the first one and terminating.) Would you have any ideas looking at this to do such a thing? Any help would be great. Cheers. Joe

Expand|Select|Wrap|Line Numbers
  1. #! /usr/bin/python
  2.  
  3. import re
  4. # Global variable
  5.  
  6.  
  7. bs = re.compile('<BACKSPACE(:[ ]*[0-9]+)?>')
  8.  
  9.  
  10. #theInFile = open("test2.txt", "r")
  11.  
  12. theOutFile = open("backspace_out.txt", "w")
  13.  
  14.  
  15.  
  16. tests = ['re <BACKSPACE>present', 'This is bound to repatreaaaabbbbcccc <BACKSPACE: 17>resent']
  17.  
  18. def bs_remove(s):
  19.  
  20.     global bs
  21.  
  22.     for m in bs.finditer(s):
  23.  
  24.         if m.groups()[0] is None:
  25.  
  26.             return s[:m.start() - 1] + s[m.end():]
  27.  
  28.         else:
  29.  
  30.             return s[:m.start() - int(m.groups()[0][1:])] + s[m.end():]
  31.  
  32.  
  33.  
  34. for s in tests:
  35.  
  36.       theOutFile.write (bs_remove(s)+ " ")
To account for single or multiple occurrences of backspaces in each line:
Expand|Select|Wrap|Line Numbers
  1. # &lt;BACKSPACE&gt;         1 backspace
  2. # &lt;BACKSPACE: 4&gt;      4 backspaces
  3.  
  4. import re
  5.  
  6. def remove_bs(s):
  7.     patt = re.compile(r'&lt;BACKSPACE:? ?(\d+)?&gt;')
  8.     while True:
  9.         m = patt.search(s)
  10.         if m:
  11.             if m.group(1):
  12.                 start = m.start()-int(m.group(1))
  13.             else:
  14.                 start = m.start()-1
  15.             s = ''.join([s[:start], s[m.end():]])
  16.         else:
  17.             break
  18.     return s
  19.  
  20. def parse_file(fn):
  21.     return ''.join([remove_bs(line) for line in open(fn).readlines()])
  22.  
  23. fn = 'input.txt'
  24. fnOut = 'output.txt
  25. f = open(fnOut, 'w')
  26. f.write(parse_file(fn))
  27. f.close()
Jan 24 '08 #5
Hi there, im trying to create a python program that can read a text file line by line and search for specified words/text/strings and place them in another text file.
where line length should be 65 characters and number of lines per page are 70.
Feb 2 '08 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Anonymous | last post by:
I need to create an application that will do fairly simple text manipulation on 20,000 files in text format (html files). The files exist both on my Windows machine and on a FreeBSD server. I...
10
by: ross | last post by:
I want to do some tricky text file manipulation on many files, but have only a little programming knowledge. What are the ideal languages for the following examples? 1. Starting from a certain...
1
by: blangela | last post by:
Bjarne Stroustrup has a new text coming out called "Programming: Principles and Practice Using C++" (ISBN: 0321543726), due to be published in December of this year. Some of the features of this...
15
by: pakerly | last post by:
How would i do this, convert a test file to excel? Lets say my text file has fields like this: NUMBER NAME ADDRESS PHONE 11002 Test1 ...
1
by: nick777 | last post by:
Hope the Community can bear with me as I muddle with the vocabulary since I am not really sure if I am going about this the correct way. My question is as follows: If I had some sample data in...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.