473,396 Members | 2,011 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

editing a text file

Hi there,

Iam new to python forms and programming too

I had a text file where i need to extract few words of data from the header(which is of 3 lines) and search for the keyword TEXT1, TEXT2, TEXT3in entire file(file consisting of 150 lines) that is related to a particular id and get the sum of the, WRITES of TEXT1, TEXT2, TEXT3

ex: input file will be as below


Windows 2000 text text text text text text text
343434 text text text text text text code textplayback
text text text text text start check text..

text text text text text 343434, text ID: 504181, File: text1, Write_0=24
text text text text text 343434, text ID: 504182, File: text1, Write_0=20
text text text text text 343434, text ID: 504183, File: text1, Write_0=21
text text text text text 343434, text ID: 504181, File: text2, Write_0=29
text text text text text 343434, text ID: 504182, File: text2, Write_0=20
text text text text text 343434, text ID: 504183, File: text2, Write_0=23
text text text text text 343434, text ID: 504181, File: text3, Write_0=21
text text text text text 343434, text ID: 504182, File: text3, Write_0=24
text text text text text 343434, text ID: 504183, File: text3, Write_0=23

output should be like:

343434 textplayback, start check-->this is from the header
text ID: 504181: sum of (24+21+29) ie 27----> this is result of entire file search

Please help me in this regard
Thanks in advance for your help
Jan 22 '08 #1
10 3148
Hi there,

i need help in getting the needed contents of the text file

how can i modify the text file, by filtering out the unwanted text

ex: input file will be as below


Windows 2000 text text text text text text text
343434 text text text text text text code textplayback
text text text text text start check text..

text text text text text 343434, text ID: 504181, File: text1, Write_0=24
text text text text text 343434, text ID: 504182, File: text1, Write_0=20
text text text text text 343434, text ID: 504183, File: text1, Write_0=21
text text text text text 343434, text ID: 504181, File: text2, Write_0=29
text text text text text 343434, text ID: 504182, File: text2, Write_0=20
text text text text text 343434, text ID: 504183, File: text2, Write_0=23
text text text text text 343434, text ID: 504181, File: text3, Write_0=21
text text text text text 343434, text ID: 504182, File: text3, Write_0=24
text text text text text 343434, text ID: 504183, File: text3, Write_0=23

output should be like:

i need to get 1st and 2nd line avoiding the third like in the header

and also sum of the write times for the text id504181 and file text1 from the entire file

How can i search for them. Please help me in this regard
Thanks so much
Jan 24 '08 #2
dshimer
136 Expert 100+
I haven't had a lot of time to think through the entire process, but it seems like some basic concepts would be helpful. Here is a session I just ran through and commented that shows how some of the values could be accessed. You will find with python that there are generally dozens of different ways to accomplish a given and there is always a more compact way (unless you get it from a couple of the real pros here). These are just ideas.

Expand|Select|Wrap|Line Numbers
  1. >>> f=open('/tmp/test.txt','r')
  2. # Open the file in text, read mode
  3.  
  4. >>> data=f.readlines()
  5. # I really like reading the whole file into a list in which each line is an item in the list.
  6.  
  7. >>> print data[2]
  8. text text text text text start check text..
  9. # Each list item can be treated individually, processed, printed out, or skipped.
  10.  
  11. >>> data[8]
  12. 'text text text text text 343434, text ID: 504182, File: text2, Write_0=20\n'
  13. # This is just one of the data lines randomly selected and printed out.
  14.  
  15. >>> data[8].find('504182')
  16. 42
  17. If you just need to find a piece of text then this will return it's position in the string or -1 if not found. Note that not finding it does not return a conditional FALSE of 0 because 0 is a valid position.
  18.  
  19. >>> data[8].split(',')
  20. ['text text text text text 343434', ' text ID: 504182', ' File: text2', ' Write_0=20\n']
  21. # If individual fields need to be tested you can split on the delimiter (comma in this case).
  22.  
  23. >>> data[8].split(',')[3]
  24. ' Write_0=20\n'
  25. # So pulling out an idividual field just requires it's index value.
  26.  
  27. >>> data[8].split(',')[3].split('=')
  28. [' Write_0', '20\n']
  29. # If I want to break that field into it's components I can split it again based on the fields delimeter (here the =)
  30.  
  31. >>> data[8].split(',')[3].split('=')[1]
  32. '20\n'
  33. # So by specifying the index value of the actual data I can get that number. (but it's still a string)
  34.  
  35. >>> int(data[8].split(',')[3].split('=')[1])
  36. 20
  37. # Which can be solved by the int() method.
  38.  
  39. >>> data[8].split(',')[1].split(':')[1]
  40. ' 504182'
  41. # Using the same idea to pull out a key value.

Hi there,

i need help in getting the needed contents of the text file

how can i modify the text file, by filtering out the unwanted text

ex: input file will be as below


Windows 2000 text text text text text text text
343434 text text text text text text code textplayback
text text text text text start check text..

text text text text text 343434, text ID: 504181, File: text1, Write_0=24
text text text text text 343434, text ID: 504182, File: text1, Write_0=20
text text text text text 343434, text ID: 504183, File: text1, Write_0=21
text text text text text 343434, text ID: 504181, File: text2, Write_0=29
text text text text text 343434, text ID: 504182, File: text2, Write_0=20
text text text text text 343434, text ID: 504183, File: text2, Write_0=23
text text text text text 343434, text ID: 504181, File: text3, Write_0=21
text text text text text 343434, text ID: 504182, File: text3, Write_0=24
text text text text text 343434, text ID: 504183, File: text3, Write_0=23

output should be like:

i need to get 1st and 2nd line avoiding the third like in the header

and also sum of the write times for the text id504181 and file text1 from the entire file

How can i search for them. Please help me in this regard
Thanks so much
Jan 24 '08 #3
bvdet
2,851 Expert Mod 2GB
bluemountain,

Hopefully dshimer's suggestions will enable you to find a solution. The following makes use of the re module.
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. def parse_data(fn, textID, hdrkeys):
  4.     lineList = [item.strip() for item in open(fn).readlines()]
  5.     pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
  6.     # check first three lines for header key words
  7.     hdrList = pattH.findall(''.join(lineList[:3]))
  8.     # create pattern to extract values
  9.     pattD = re.compile(r'=(\d+)')
  10.     valueList = []
  11.     for item in lineList[3:]:
  12.         # if textID is in list item, extract the value
  13.         if textID in item:
  14.             valueList.append(pattD.search(item).group(1))
  15.     # Create output string by joining formatted data
  16.     return '\n'.join([','.join(hdrList), 'textID: %s: sum of (%s) = %s' % \
  17.                       (textID, ','.join(valueList), \
  18.                        sum([int(i) for i in valueList]))
  19.                       ])
  20.  
  21. fn = r'H:\TEMP\temsys\textID.txt'    
  22. hdrkeys = ['343434', 'textplayback', 'start check']
  23. textID = '504181'
  24. print parse_data(fn, textID, hdrkeys)
Output:
Expand|Select|Wrap|Line Numbers
  1. >>> 343434,textplayback,start check
  2. textID: 504181: sum of (24,29,21) = 74
  3. >>> 
Jan 24 '08 #4
>>> import re
>>> def parse_data(fn, TagID, hdrkeys):
... lineList = [item.strip() for item in open(fn).readlines()]
... pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
... hdrList = pattH.findall(''.join(lineList[:4]))
... pattD = re.compile(r'=(\d+)')
... valueList = []
... for item in lineList[4:]:
... if TagID in item:
... valueList.append(pattD.search(item).group(1))
... return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \
... (TagID, ','.join(valueList), \
... sum([int(i) for i in valueList]))
... ])
... fn = r'C:\log.txt'
File "<stdin>", line 14
fn = r'C:\log.txt'
^
SyntaxError: invalid syntax

At line 14 it is saying invalid syntax highlighting "fn"

i tried as below
... fn = r'H:\TEMP\temsys\LOG.TXT
File "<stdin>", line 14
fn = r'H:\TEMP\temsys\Log.txt
^
SyntaxError: invalid syntax
>>>

I have a question, where should i specify the input file name
Iam unable to figure out, please correct me where i went wrong
Jan 24 '08 #5
And also Thank you so much for your help BVDET
Jan 24 '08 #6
bvdet
2,851 Expert Mod 2GB
>>> import re
>>> def parse_data(fn, TagID, hdrkeys):
... lineList = [item.strip() for item in open(fn).readlines()]
... pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
... hdrList = pattH.findall(''.join(lineList[:4]))
... pattD = re.compile(r'=(\d+)')
... valueList = []
... for item in lineList[4:]:
... if TagID in item:
... valueList.append(pattD.search(item).group(1))
... return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \
... (TagID, ','.join(valueList), \
... sum([int(i) for i in valueList]))
... ])
... fn = r'C:\log.txt'
File "<stdin>", line 14
fn = r'C:\log.txt'
^
SyntaxError: invalid syntax

At line 14 it is saying invalid syntax highlighting "fn"

i tried as below
... fn = r'H:\TEMP\temsys\LOG.TXT
File "<stdin>", line 14
fn = r'H:\TEMP\temsys\Log.txt
^
SyntaxError: invalid syntax
>>>

I have a question, where should i specify the input file name
Iam unable to figure out, please correct me where i went wrong
Please use code tags around your code. There may be an indentation problem or missing parentheses. The following code executed at the DOS command prompt as shown below:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. def parse_data(fn, textID, hdrkeys):
  4.     lineList = [item.strip() for item in open(fn).readlines()]
  5.     pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
  6.     # check first three lines for header key words
  7.     hdrList = pattH.findall(''.join(lineList[:3]))
  8.     # create pattern to extract values
  9.     pattD = re.compile(r'=(\d+)')
  10.     valueList = []
  11.     for item in lineList[3:]:
  12.         # if textID is in list item, extract the value
  13.         if textID in item:
  14.             valueList.append(pattD.search(item).group(1))
  15.     # Create output string by joining formatted data
  16.     return '\n'.join([','.join(hdrList), 'textID: %s: sum of (%s) = %s' % \
  17.                       (textID, ','.join(valueList), \
  18.                        sum([int(i) for i in valueList]))
  19.                       ])
  20.  
  21. if __name__ == '__main__':
  22.     fn = r'H:\TEMP\temsys\textID.txt'    
  23.     hdrkeys = ['343434', 'textplayback', 'start check']
  24.     textID = '504181'
  25.     print parse_data(fn, textID, hdrkeys)
C:\Python23>python file_textID.py
343434,textplayback,start check
textID: 504181: sum of (24,29,21) = 74

It also works in Pythonwin.
Jan 24 '08 #7
Firstly Thanks for the fast info you had provided me.
This time i ran the whole test in pyinteractive shell instead of running it in command prompt. Its pretty easy running from there, no need of checking for indentation all the time.

The code you provided is working, but having a problem retrieving the sum
it is ginving me sum=0

import re

def parse_data(fn, TagID, hdrkeys):
lineList = [item.strip() for item in open(fn).readlines()]
pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
# check first three lines for header key words
hdrList = pattH.findall(''.join(lineList[:3]))
# create pattern to extract values
pattD = re.compile(r'=(\d+)')
valueList = []
for item in lineList[3:]:
# if textID is in list item, extract the value
if TagID in item:
valueList.append(pattD.search(item).group(1))
# Create output string by joining formatted data
return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \
(TagID, ','.join(valueList), \
sum([int(i) for i in valueList]))
])

if __name__ == '__main__':
fn = r'c:\log1.txt'
hdrkeys = ['53301', 'RS485', 'Write']
TagID = '504181'
print parse_data(fn, TagID, hdrkeys)

I dont have any knowledge of copying and pasting it along with numbers.

but the above is the code i tried to run and the result is like
IDLE 1.1.4
>>>

TagID: 504181: sum of () = 0
>>>

Please correct me

Thanks for all the help you were extending
Jan 24 '08 #8
bvdet
2,851 Expert Mod 2GB
Firstly Thanks for the fast info you had provided me.
This time i ran the whole test in pyinteractive shell instead of running it in command prompt. Its pretty easy running from there, no need of checking for indentation all the time.

The code you provided is working, but having a problem retrieving the sum
it is ginving me sum=0
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. def parse_data(fn, TagID, hdrkeys):
  4.     lineList = [item.strip() for item in open(fn).readlines()]
  5.     pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
  6.     # check first three lines for header key words
  7.     hdrList = pattH.findall(''.join(lineList[:3]))
  8.     # create pattern to extract values
  9.     pattD = re.compile(r'=(\d+)')
  10.     valueList = []
  11.     for item in lineList[3:]:
  12.         # if textID is in list item, extract the value
  13.         if TagID in item:
  14.             valueList.append(pattD.search(item).group(1))
  15.     # Create output string by joining formatted data
  16.     return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \
  17.                       (TagID, ','.join(valueList), \
  18.                        sum([int(i) for i in valueList]))
  19.                       ])
  20.  
  21. if __name__ == '__main__':
  22.     fn = r'c:\log1.txt'    
  23.     hdrkeys = ['53301', 'RS485', 'Write']
  24.     TagID = '504181'
  25.     print parse_data(fn, TagID, hdrkeys)
I dont have any knowledge of copying and pasting it along with numbers.

but the above is the code i tried to run and the result is like
IDLE 1.1.4
>>>

TagID: 504181: sum of () = 0
>>>

Please correct me

Thanks for all the help you were extending
I added code tags for you above. Now isn't that better? There is nothing in valueList to sum. Maybe the TagID you are using is not in the file.
Jan 24 '08 #9
After adding the tag codes , it is looking great. Please let me know how to add them and coming to code.

As you said, there is nothing in the value list. I provided the value list that need to be searched, it is throwing an error msg saying the data i provided in the value list is not defined. I tried defining it at the starting of the script, but din fetch me anythin, still error. Please help me

Code:
import re
int = MIN1, MIN2, MIN3

def parse_data(fn, TagID, hdrkeys):
lineList = [item.strip() for item in open(fn).readlines()]
pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
# check first three lines for header key words
hdrList = pattH.findall(''.join(lineList[:3]))
# create pattern to extract values
pattD = re.compile(r'=(\d+)')
valueList = []
for item in lineList[3:]:
# if textID is in list item, extract the value
if TagID in item:
valueList.append(pattD.search(item).group(1))
# Create output string by joining formatted data
return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \
(TagID, ','.join(MIN1,MIN2,MIN3), \
sum([int(i) for i in valueList]))
])

if __name__ == '__main__':
fn = r'c:\log1.txt'
hdrkeys = ['53301', 'RS485', 'Write']
TagID = '504181'
print parse_data(fn, TagID, hdrkeys)

Error:

Traceback (most recent call last):
File "C:\Python24\file_tagid.py", line 2, in -toplevel-
int = MIN1, MIN2, MIN3
NameError: name 'MIN1_13_5' is not defined
>>>

Iam pretty sure the tag is id there in the text file
Thanks
Jan 25 '08 #10
bvdet
2,851 Expert Mod 2GB
After adding the tag codes , it is looking great. Please let me know how to add them and coming to code.

As you said, there is nothing in the value list. I provided the value list that need to be searched, it is throwing an error msg saying the data i provided in the value list is not defined. I tried defining it at the starting of the script, but din fetch me anythin, still error. Please help me

Code:
import re
int = MIN1, MIN2, MIN3

def parse_data(fn, TagID, hdrkeys):
lineList = [item.strip() for item in open(fn).readlines()]
pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
# check first three lines for header key words
hdrList = pattH.findall(''.join(lineList[:3]))
# create pattern to extract values
pattD = re.compile(r'=(\d+)')
valueList = []
for item in lineList[3:]:
# if textID is in list item, extract the value
if TagID in item:
valueList.append(pattD.search(item).group(1))
# Create output string by joining formatted data
return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \
(TagID, ','.join(MIN1,MIN2,MIN3), \
sum([int(i) for i in valueList]))
])

if __name__ == '__main__':
fn = r'c:\log1.txt'
hdrkeys = ['53301', 'RS485', 'Write']
TagID = '504181'
print parse_data(fn, TagID, hdrkeys)

Error:

Traceback (most recent call last):
File "C:\Python24\file_tagid.py", line 2, in -toplevel-
int = MIN1, MIN2, MIN3
NameError: name 'MIN1_13_5' is not defined
>>>

Iam pretty sure the tag is id there in the text file
Thanks
See reply guidelines for code tags. It looks like your problem happens with the assignment of int (int is a built-in function, and you should not use int as a variable name) to variables MIN1, MIN2, MIN3. Maybe you should post part of your data file.
Jan 25 '08 #11

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: Dan Weeb | last post by:
Hi All, I have struggled through this far with help from many of you so thanks. I am stuck again. I am really new to this so don't be harsh :-) There are a few problems. You can run the script...
1
by: sam.collett | last post by:
Is there a basic guide on Xml document creation and editing (simpler than the MSDN docs). Say I want to create a file containing the following: <?xml version="1.0" encoding="utf-8"...
8
by: Sam Collett | last post by:
Is there a basic guide on Xml document creation and editing (simpler than the MSDN docs). Say I want to create a file containing the following: <?xml version="1.0" encoding="utf-8"...
1
by: Jeff Petter | last post by:
I can't seem to get the update piece working properly while doing in-place editing. I don't receive any errors, but the update doesn't take place. From the examples I've used as "go bys" it looks...
12
by: Ron Weldy | last post by:
I have a test server runinng 2003/IIS 6 with a mixture of asp and asp.net files. On my workstation I have a share set up to the folder where the web files reside. I am just doing quick and dirty...
8
by: D | last post by:
Hi, I currently have a Python app with a Tkinter GUI frontend that I use for system administration. Everytime it launches, it reads a text file which contains info about each host I wish to...
2
by: ritesh | last post by:
Hi, I'm facing a problem in which I need to edit an already created file, and the editing needs to be done at the start of the file rather then appending to the file. OS - Linux,Solaris ...
0
by: Frnak McKenney | last post by:
Can I use a bound ComboBox for both browsing and editing? I'm working on a small, standalone database application using Visual C#.NET 2003 and an Access data file. In order to keep the number...
4
by: ASPnewb1 | last post by:
Hello, I have a 2-dimensional array created from the following text file: data1=asdfawe data2=2 data3=223d data4=22
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.