editing a text file

Hi there,

Iam new to python forms and programming too

I had a text file where i need to extract few words of data from the header(which is of 3 lines) and search for the keyword TEXT1, TEXT2, TEXT3in entire file(file consisting of 150 lines) that is related to a particular id and get the sum of the, WRITES of TEXT1, TEXT2, TEXT3

ex: input file will be as below

Windows 2000 text text text text text text text
343434 text text text text text text code textplayback
text text text text text start check text..

text text text text text 343434, text ID: 504181, File: text1, Write_0=24
text text text text text 343434, text ID: 504182, File: text1, Write_0=20
text text text text text 343434, text ID: 504183, File: text1, Write_0=21
text text text text text 343434, text ID: 504181, File: text2, Write_0=29
text text text text text 343434, text ID: 504182, File: text2, Write_0=20
text text text text text 343434, text ID: 504183, File: text2, Write_0=23
text text text text text 343434, text ID: 504181, File: text3, Write_0=21
text text text text text 343434, text ID: 504182, File: text3, Write_0=24
text text text text text 343434, text ID: 504183, File: text3, Write_0=23

output should be like:

343434 textplayback, start check-->this is from the header
text ID: 504181: sum of (24+21+29) ie 27----> this is result of entire file search

Please help me in this regard
Thanks in advance for your help

Jan 22 '08 #1

Subscribe Post Reply

3148

bluemountain

Hi there,

i need help in getting the needed contents of the text file

how can i modify the text file, by filtering out the unwanted text

ex: input file will be as below

Windows 2000 text text text text text text text
343434 text text text text text text code textplayback
text text text text text start check text..

text text text text text 343434, text ID: 504181, File: text1, Write_0=24
text text text text text 343434, text ID: 504182, File: text1, Write_0=20
text text text text text 343434, text ID: 504183, File: text1, Write_0=21
text text text text text 343434, text ID: 504181, File: text2, Write_0=29
text text text text text 343434, text ID: 504182, File: text2, Write_0=20
text text text text text 343434, text ID: 504183, File: text2, Write_0=23
text text text text text 343434, text ID: 504181, File: text3, Write_0=21
text text text text text 343434, text ID: 504182, File: text3, Write_0=24
text text text text text 343434, text ID: 504183, File: text3, Write_0=23

output should be like:

i need to get 1st and 2nd line avoiding the third like in the header

and also sum of the write times for the text id504181 and file text1 from the entire file

How can i search for them. Please help me in this regard
Thanks so much

Jan 24 '08 #2

dshimer

136

Expert 100+

I haven't had a lot of time to think through the entire process, but it seems like some basic concepts would be helpful. Here is a session I just ran through and commented that shows how some of the values could be accessed. You will find with python that there are generally dozens of different ways to accomplish a given and there is always a more compact way (unless you get it from a couple of the real pros here). These are just ideas.

Expand|Select|Wrap|Line Numbers

 >>> f=open('/tmp/test.txt','r')

# Open the file in text, read mode
 
>>> data=f.readlines()

# I really like reading the whole file into a list in which each line is an item in the list.
 
>>> print data[2]

text text text text text start check text..

# Each list item can be treated individually, processed, printed out, or skipped.
 
>>> data[8]

'text text text text text 343434, text ID: 504182, File: text2, Write_0=20\n'

# This is just one of the data lines randomly selected and printed out.
 
>>> data[8].find('504182')

42

If you just need to find a piece of text then this will return it's position in the string or -1 if not found. Note that not finding it does not return a conditional FALSE of 0 because 0 is a valid position.
 
>>> data[8].split(',')

['text text text text text 343434', ' text ID: 504182', ' File: text2', ' Write_0=20\n']

# If individual fields need to be tested you can split on the delimiter (comma in this case).
 
>>> data[8].split(',')[3]

' Write_0=20\n'

# So pulling out an idividual field just requires it's index value.
 
>>> data[8].split(',')[3].split('=')

[' Write_0', '20\n']

# If I want to break that field into it's components I can split it again based on the fields delimeter (here the =)
 
>>> data[8].split(',')[3].split('=')[1]

'20\n'

# So by specifying the index value of the actual data I can get that number. (but it's still a string)
 
>>> int(data[8].split(',')[3].split('=')[1])

20

# Which can be solved by the int() method.
 
>>> data[8].split(',')[1].split(':')[1]

' 504182'

# Using the same idea to pull out a key value.

Hi there,

i need help in getting the needed contents of the text file

how can i modify the text file, by filtering out the unwanted text

ex: input file will be as below

Windows 2000 text text text text text text text
343434 text text text text text text code textplayback
text text text text text start check text..

text text text text text 343434, text ID: 504181, File: text1, Write_0=24
text text text text text 343434, text ID: 504182, File: text1, Write_0=20
text text text text text 343434, text ID: 504183, File: text1, Write_0=21
text text text text text 343434, text ID: 504181, File: text2, Write_0=29
text text text text text 343434, text ID: 504182, File: text2, Write_0=20
text text text text text 343434, text ID: 504183, File: text2, Write_0=23
text text text text text 343434, text ID: 504181, File: text3, Write_0=21
text text text text text 343434, text ID: 504182, File: text3, Write_0=24
text text text text text 343434, text ID: 504183, File: text3, Write_0=23

output should be like:

i need to get 1st and 2nd line avoiding the third like in the header

and also sum of the write times for the text id504181 and file text1 from the entire file

How can i search for them. Please help me in this regard
Thanks so much

Jan 24 '08 #3

bvdet

2,851

Expert Mod 2GB

bluemountain,

Hopefully dshimer's suggestions will enable you to find a solution. The following makes use of the re module.

Expand|Select|Wrap|Line Numbers

 import re
 
def parse_data(fn, textID, hdrkeys):

    lineList = [item.strip() for item in open(fn).readlines()]

    pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))

    # check first three lines for header key words

    hdrList = pattH.findall(''.join(lineList[:3]))

    # create pattern to extract values

    pattD = re.compile(r'=(\d+)')

    valueList = []

    for item in lineList[3:]:

        # if textID is in list item, extract the value

        if textID in item:

            valueList.append(pattD.search(item).group(1))

    # Create output string by joining formatted data

    return '\n'.join([','.join(hdrList), 'textID: %s: sum of (%s) = %s' % \

                      (textID, ','.join(valueList), \

                       sum([int(i) for i in valueList]))

                      ])
 
fn = r'H:\TEMP\temsys\textID.txt'    

hdrkeys = ['343434', 'textplayback', 'start check']

textID = '504181'

print parse_data(fn, textID, hdrkeys)

Output:

Expand|Select|Wrap|Line Numbers

 >>> 343434,textplayback,start check

textID: 504181: sum of (24,29,21) = 74

>>>

Jan 24 '08 #4

bluemountain

>>> import re
>>> def parse_data(fn, TagID, hdrkeys):
... lineList = [item.strip() for item in open(fn).readlines()]
... pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
... hdrList = pattH.findall(''.join(lineList[:4]))
... pattD = re.compile(r'=(\d+)')
... valueList = []
... for item in lineList[4:]:
... if TagID in item:
... valueList.append(pattD.search(item).group(1))
... return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \
... (TagID, ','.join(valueList), \
... sum([int(i) for i in valueList]))
... ])
... fn = r'C:\log.txt'
File "<stdin>", line 14
fn = r'C:\log.txt'
^
SyntaxError: invalid syntax

At line 14 it is saying invalid syntax highlighting "fn"

i tried as below
... fn = r'H:\TEMP\temsys\LOG.TXT
File "<stdin>", line 14
fn = r'H:\TEMP\temsys\Log.txt
^
SyntaxError: invalid syntax
>>>

I have a question, where should i specify the input file name
Iam unable to figure out, please correct me where i went wrong

Jan 24 '08 #5

bluemountain

And also Thank you so much for your help BVDET

Jan 24 '08 #6

bvdet

2,851

Expert Mod 2GB

>>> import re
>>> def parse_data(fn, TagID, hdrkeys):
... lineList = [item.strip() for item in open(fn).readlines()]
... pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
... hdrList = pattH.findall(''.join(lineList[:4]))
... pattD = re.compile(r'=(\d+)')
... valueList = []
... for item in lineList[4:]:
... if TagID in item:
... valueList.append(pattD.search(item).group(1))
... return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \
... (TagID, ','.join(valueList), \
... sum([int(i) for i in valueList]))
... ])
... fn = r'C:\log.txt'
File "<stdin>", line 14
fn = r'C:\log.txt'
^
SyntaxError: invalid syntax

At line 14 it is saying invalid syntax highlighting "fn"

i tried as below
... fn = r'H:\TEMP\temsys\LOG.TXT
File "<stdin>", line 14
fn = r'H:\TEMP\temsys\Log.txt
^
SyntaxError: invalid syntax
>>>

I have a question, where should i specify the input file name
Iam unable to figure out, please correct me where i went wrong

Please use code tags around your code. There may be an indentation problem or missing parentheses. The following code executed at the DOS command prompt as shown below:

Expand|Select|Wrap|Line Numbers

 import re
 
def parse_data(fn, textID, hdrkeys):

    lineList = [item.strip() for item in open(fn).readlines()]

    pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))

    # check first three lines for header key words

    hdrList = pattH.findall(''.join(lineList[:3]))

    # create pattern to extract values

    pattD = re.compile(r'=(\d+)')

    valueList = []

    for item in lineList[3:]:

        # if textID is in list item, extract the value

        if textID in item:

            valueList.append(pattD.search(item).group(1))

    # Create output string by joining formatted data

    return '\n'.join([','.join(hdrList), 'textID: %s: sum of (%s) = %s' % \

                      (textID, ','.join(valueList), \

                       sum([int(i) for i in valueList]))

                      ])
 
if __name__ == '__main__':

    fn = r'H:\TEMP\temsys\textID.txt'    

    hdrkeys = ['343434', 'textplayback', 'start check']

    textID = '504181'

    print parse_data(fn, textID, hdrkeys)

C:\Python23>python file_textID.py
343434,textplayback,start check
textID: 504181: sum of (24,29,21) = 74

It also works in Pythonwin.

Jan 24 '08 #7

bluemountain

Firstly Thanks for the fast info you had provided me.
This time i ran the whole test in pyinteractive shell instead of running it in command prompt. Its pretty easy running from there, no need of checking for indentation all the time.

The code you provided is working, but having a problem retrieving the sum
it is ginving me sum=0

import re

def parse_data(fn, TagID, hdrkeys):
lineList = [item.strip() for item in open(fn).readlines()]
pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
# check first three lines for header key words
hdrList = pattH.findall(''.join(lineList[:3]))
# create pattern to extract values
pattD = re.compile(r'=(\d+)')
valueList = []
for item in lineList[3:]:
# if textID is in list item, extract the value
if TagID in item:
valueList.append(pattD.search(item).group(1))
# Create output string by joining formatted data
return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \
(TagID, ','.join(valueList), \
sum([int(i) for i in valueList]))
])

if __name__ == '__main__':
fn = r'c:\log1.txt'
hdrkeys = ['53301', 'RS485', 'Write']
TagID = '504181'
print parse_data(fn, TagID, hdrkeys)

I dont have any knowledge of copying and pasting it along with numbers.

but the above is the code i tried to run and the result is like
IDLE 1.1.4
>>>

TagID: 504181: sum of () = 0
>>>

Please correct me

Thanks for all the help you were extending

Jan 24 '08 #8

bvdet

2,851

Expert Mod 2GB

Firstly Thanks for the fast info you had provided me.
This time i ran the whole test in pyinteractive shell instead of running it in command prompt. Its pretty easy running from there, no need of checking for indentation all the time.

The code you provided is working, but having a problem retrieving the sum
it is ginving me sum=0

Expand|Select|Wrap|Line Numbers

import re

def parse_data(fn, TagID, hdrkeys):

    lineList = [item.strip() for item in open(fn).readlines()]

    pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))

    # check first three lines for header key words

    hdrList = pattH.findall(''.join(lineList[:3]))

    # create pattern to extract values

    pattD = re.compile(r'=(\d+)')

    valueList = []

    for item in lineList[3:]:

        # if textID is in list item, extract the value

        if TagID in item:

            valueList.append(pattD.search(item).group(1))

    # Create output string by joining formatted data

    return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \

                      (TagID, ','.join(valueList), \

                       sum([int(i) for i in valueList]))

                      ])

if __name__ == '__main__':

    fn = r'c:\log1.txt'

    hdrkeys = ['53301', 'RS485', 'Write']

    TagID = '504181'

    print parse_data(fn, TagID, hdrkeys)

I dont have any knowledge of copying and pasting it along with numbers.

but the above is the code i tried to run and the result is like
IDLE 1.1.4
>>>

TagID: 504181: sum of () = 0
>>>

Please correct me

Thanks for all the help you were extending

I added code tags for you above. Now isn't that better? There is nothing in valueList to sum. Maybe the TagID you are using is not in the file.

Jan 24 '08 #9

bluemountain

After adding the tag codes , it is looking great. Please let me know how to add them and coming to code.

As you said, there is nothing in the value list. I provided the value list that need to be searched, it is throwing an error msg saying the data i provided in the value list is not defined. I tried defining it at the starting of the script, but din fetch me anythin, still error. Please help me

Code:
import re
int = MIN1, MIN2, MIN3

def parse_data(fn, TagID, hdrkeys):
lineList = [item.strip() for item in open(fn).readlines()]
pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
# check first three lines for header key words
hdrList = pattH.findall(''.join(lineList[:3]))
# create pattern to extract values
pattD = re.compile(r'=(\d+)')
valueList = []
for item in lineList[3:]:
# if textID is in list item, extract the value
if TagID in item:
valueList.append(pattD.search(item).group(1))
# Create output string by joining formatted data
return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \
(TagID, ','.join(MIN1,MIN2,MIN3), \
sum([int(i) for i in valueList]))
])

if __name__ == '__main__':
fn = r'c:\log1.txt'
hdrkeys = ['53301', 'RS485', 'Write']
TagID = '504181'
print parse_data(fn, TagID, hdrkeys)

Error:

Traceback (most recent call last):
File "C:\Python24\file_tagid.py", line 2, in -toplevel-
int = MIN1, MIN2, MIN3
NameError: name 'MIN1_13_5' is not defined
>>>

Iam pretty sure the tag is id there in the text file
Thanks

Jan 25 '08 #10

bvdet

2,851

Expert Mod 2GB

After adding the tag codes , it is looking great. Please let me know how to add them and coming to code.

As you said, there is nothing in the value list. I provided the value list that need to be searched, it is throwing an error msg saying the data i provided in the value list is not defined. I tried defining it at the starting of the script, but din fetch me anythin, still error. Please help me

Code:
import re
int = MIN1, MIN2, MIN3

def parse_data(fn, TagID, hdrkeys):
lineList = [item.strip() for item in open(fn).readlines()]
pattH = re.compile(r'%s' % ('|'.join(hdrkeys)))
# check first three lines for header key words
hdrList = pattH.findall(''.join(lineList[:3]))
# create pattern to extract values
pattD = re.compile(r'=(\d+)')
valueList = []
for item in lineList[3:]:
# if textID is in list item, extract the value
if TagID in item:
valueList.append(pattD.search(item).group(1))
# Create output string by joining formatted data
return '\n'.join([','.join(hdrList), 'TagID: %s: sum of (%s) = %s' % \
(TagID, ','.join(MIN1,MIN2,MIN3), \
sum([int(i) for i in valueList]))
])

if __name__ == '__main__':
fn = r'c:\log1.txt'
hdrkeys = ['53301', 'RS485', 'Write']
TagID = '504181'
print parse_data(fn, TagID, hdrkeys)

Error:

Traceback (most recent call last):
File "C:\Python24\file_tagid.py", line 2, in -toplevel-
int = MIN1, MIN2, MIN3
NameError: name 'MIN1_13_5' is not defined
>>>

Iam pretty sure the tag is id there in the text file
Thanks

See reply guidelines for code tags. It looks like your problem happens with the assignment of int (int is a built-in function, and you should not use int as a variable name) to variables MIN1, MIN2, MIN3. Maybe you should post part of your data file.

Jan 25 '08 #11

by: Dan Weeb | last post by:

Hi All, I have struggled through this far with help from many of you so thanks. I am stuck again. I am really new to this so don't be harsh :-) There are a few problems. You can run the script...

PHP

Basic XML File Creation / Editing

by: sam.collett | last post by:

Is there a basic guide on Xml document creation and editing (simpler than the MSDN docs). Say I want to create a file containing the following: <?xml version="1.0" encoding="utf-8"...

.NET Framework

Basic XML file creation / editing

by: Sam Collett | last post by:

Is there a basic guide on Xml document creation and editing (simpler than the MSDN docs). Say I want to create a file containing the following: <?xml version="1.0" encoding="utf-8"...

C# / C Sharp

Updating DataGrid while editing, doesn't update

by: Jeff Petter | last post by:

I can't seem to get the update piece working properly while doing in-place editing. I don't receive any errors, but the update doesn't take place. From the examples I've used as "go bys" it looks...

ASP.NET

Access Denied trying to do quick editing

by: Ron Weldy | last post by:

I have a test server runinng 2003/IIS 6 with a mixture of asp and asp.net files. On my workstation I have a share set up to the folder where the web files reside. I am just doing quick and dirty...

ASP.NET

Editing File

by: D | last post by:

Hi, I currently have a Python app with a Tkinter GUI frontend that I use for system administration. Everytime it launches, it reads a text file which contains info about each host I wish to...

Python

Editing a file at Start

by: ritesh | last post by:

Hi, I'm facing a problem in which I need to edit an already created file, and the editing needs to be done at the start of the file rather then appending to the file. OS - Linux,Solaris ...

C / C++

Bound ComboBox for browsing _and_ editing?

by: Frnak McKenney | last post by:

Can I use a bound ComboBox for both browsing and editing? I'm working on a small, standalone database application using Visual C#.NET 2003 and an Access data file. In order to keep the number...

C# / C Sharp

Populating a GridView from a 2-dimensional array and editing the rows

by: ASPnewb1 | last post by:

Hello, I have a 2-dimensional array created from the following text file: data1=asdfawe data2=2 data3=223d data4=22

.NET Framework

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

editing a text file

Similar topics