473,659 Members | 2,683 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Improving my text processing script

I am sure there is a better way of writing this, but how?

import re
f=file('tlst')
tlst=f.read().s plit('\n')
f.close()
f=file('plst')
sep=re.compile( 'Identifier "(.*?)"')
plst=[]
for elem in f.read().split( 'Identifier'):
content='Identi fier'+elem
match=sep.searc h(content)
if match:
plst.append((ma tch.group(1),co ntent))
f.close()
flst=[]
for table in tlst:
for prog,content in plst:
if content.find(ta ble)>0:
flst.append('"% s","%s"'%(prog, table))
flst.sort()
for elem in flst:
print elem

What would be the best way of writing this program. BTW find>0 to check
in case table=='' (empty line) so I do not include everything.

tlst is of the form:

tablename1
tablename2

....

plst is of the form:

Identifier "Program1"
Name "Random Stuff"
Value "tablename2 "
....other random properties
Name "More Random Stuff"
Identifier "Program 2"
Name "Yet more stuff"
Value "tablename2 "
....
I want to know in what programs are the tables in tlst (and only those)
used.

Aug 31 '05 #1
6 1117
Even though you are using re's to try to look for specific substrings
(which you sort of fake in by splitting on "Identifier ", and then
prepending "Identifier " to every list element, so that the re will
match...), this program has quite a few holes.

What if the word "Identifier " is inside one of the quoted strings?
What if the actual value is "tablename1 0"? This will match your
"tablename1 " string search, but it is certainly not what you want.
Did you know there are trailing blanks on your table names, which could
prevent any program name from matching?

So here is an alternative approach using, as many have probably
predicted by now if they've spent any time on this list, the pyparsing
module. You may ask, "isn't a parser overkill for this problem?" and
the answer will likely be "probably", but in the case of pyparsing, I'd
answer "probably, but it is so easy, and takes care of so much junk
like dealing with quoted strings and intermixed data, so, who cares if
it's overkill?"

So here is the 20-line pyparsing solution, insert it into your program
after you have read in tlst, and read in the input data using something
like data = file('plst).rea d(). (The first line strips the whitespace
from the ends of your table names.)

tlist = map(str.rstrip, tlist)

from pyparsing import quotedString,Li neStart,LineEnd ,removeQuotes
quotedString.se tParseAction( removeQuotes )

identLine = (LineStart() + "Identifier " + quotedString +
LineEnd()).setR esultsName("ide ntifier")
tableLine = (LineStart() + "Value" + quotedString +
LineEnd()).setR esultsName("tab leref")

interestingLine s = ( identLine | tableLine )
thisprog = ""
for toks,start,end in interestingLine s.scanString( data ):
toktype = toks.getName()
if toktype == 'identifier':
thisprog = toks[1]
elif toktype == 'tableref':
thistable = toks[1]
if thistable in tlist:
print '"%s","%s"' % (thisprog, thistable)
else:
print "Not", thisprog, "contains wrong table
("+thistable+") "

This program will print out:
"Program1","tab lename2"
"Program 2","tablenam e2"
Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

Sep 1 '05 #2
Hello pruebauno,
import re
f=file('tlst')
tlst=f.read().s plit('\n')
f.close() tlst = open("tlst").re adlines()
f=file('plst')
sep=re.compile( 'Identifier "(.*?)"')
plst=[]
for elem in f.read().split( 'Identifier'):
content='Identi fier'+elem
match=sep.searc h(content)
if match:
plst.append((ma tch.group(1),co ntent))
f.close() Look at re.findall, I think it'll be easier.
flst=[]
for table in tlst:
for prog,content in plst:
if content.find(ta ble)>0: if table in content: flst.append('"% s","%s"'%(prog, table)) flst.sort()
for elem in flst:
print elem

print "\n".join(sorte d(flst))

HTH.
--
------------------------------------------------------------------------
Miki Tebeka <mi*********@zo ran.com>
http://tebeka.bizhat.com
The only difference between children and adults is the price of the toys

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Cygwin)

iD8DBQFDFrzO8jA dENsUuJsRAk42AJ 0Q2CEr8e+1/ZLLhadgxtz879oR OACggk24
/2SSAFEgEVbS/SmT6cl17xo=
=OF21
-----END PGP SIGNATURE-----

Sep 1 '05 #3
Paul McGuire wrote:
match...), this program has quite a few holes.

What if the word "Identifier " is inside one of the quoted strings?
What if the actual value is "tablename1 0"? This will match your
"tablename1 " string search, but it is certainly not what you want.
Did you know there are trailing blanks on your table names, which could
prevent any program name from matching?
Good point. I did not think about that. I got lucky because none of the
table names had trailing blanks (google groups seems to add those) the
word identifier is not used inside of quoted strings anywhere and I do
not have tablename10, but I do have "dba.tablename1 " and that one has
to match with tablename1 (and magically did).

So here is an alternative approach using, as many have probably
predicted by now if they've spent any time on this list, the pyparsing
module. You may ask, "isn't a parser overkill for this problem?" and


You had to plug pyparsing! :-). Thanks for the info I did not know
something like pyparsing existed. Thanks for the code too, because
looking at the module it was not totally obvious to me how to use it. I
tried run it though and it is not working for me. The following code
runs but prints nothing at all:

import pyparsing as prs

f=file('tlst'); tlst=[ln.strip() for ln in f if ln]; f.close()
f=file('plst'); plst=f.read() ; f.close()

prs.quotedStrin g.setParseActio n(prs.removeQuo tes)

identLine=(prs. LineStart()
+ 'Identifier'
+ prs.quotedStrin g
+ prs.LineEnd()
).setResultsNam e('prog')

tableLine=(prs. LineStart()
+ 'Value'
+ prs.quotedStrin g
+ prs.LineEnd()
).setResultsNam e('table')

interestingLine s=(identLine | tableLine)

for toks,start,end in interestingLine s.scanString(pl st):
print toks,start,end

Sep 1 '05 #4
Miki Tebeka wrote:
Look at re.findall, I think it'll be easier.


Minor changes aside the interesting thing, as you pointed out, would be
using re.findall. I could not figure out how to.

Sep 1 '05 #5
pr*******@latin mail.com wrote:
Paul McGuire wrote:
match...), this program has quite a few holes.
tried run it though and it is not working for me. The following code
runs but prints nothing at all:

import pyparsing as prs

And this is the point where I have to post the real stuff because your
code works with the example i posted and not with the real thing. The
identifier I am interested in is (if I understood the the requirements
correctly) the one after the "title with the stars"

So here is the "real" data for tlst some info replaced with z to
protect privacy:

*************** *************** *************** *************** *************** **
Identifier "zzz0main"
*************** *************** *************** *************** *************** **
Identifier "zz501"
Value "zzz_CLCL_zzzz, zzzzzz_ID"
Name "zzzzz"
Name "zzzzzz"
*************** *************** *************** *************** *************** **
Identifier "zzzz3main"
*************** *************** *************** *************** *************** **
Identifier "zzz505"
Value "dba.zzz_CKPY_z zzz_SUM"
Name "xxx_xxx_xxx_DT "
----------------------------------
Value "zzz_zzzz_zzz_z zz"
Name "zzz_zz_zzz "
----------------------------------
Value "zzz_zzz_zzz_HI ST"
Name "zzz_zzz"
----------------------------------
Sep 1 '05 #6
Yes indeed, the real data often has surprising differences from the
simulations! :)

It turns out that pyparsing LineStart()'s are pretty fussy. Usually,
pyparsing is very forgiving about whitespace between expressions, but
it turns out that LineStart *must* be followed by the next expression,
with no leading whitespace.

Fortunately, your syntax is really quite forgiving, in that your
key-value pairs appear to always be an unquoted word (for the key) and
a quoted string (for the value). So you should be able to get this
working just by dropping the LineStart()'s from your expressions, that
is:

identLine=('Ide ntifier'
+ prs.quotedStrin g
+ prs.LineEnd()
).setResultsNam e('prog')
tableLine=('Val ue'
+ prs.quotedStrin g
+ prs.LineEnd()
).setResultsNam e('table')

See if that works any better for you.

-- Paul

Sep 1 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
2382
by: JDJones | last post by:
I need some ideas as to why this may have happened. I have a form up on a friend's web site that upon posting will fwrite the information submitted to a text file located in the same directory on the server. It was working fine. It had 255 entries on it. Then suddenly it just stopped writing any more to the file. I tried downloading the entire text file and uploading the same filename empty but that didn't help. Nothing was changed on...
4
2965
by: David | last post by:
Hi, I hope this is the correct place for this post. I have an asp page with a form. The form has 2 text boxes for entering a serial number range. first serial & last serial In a variable on the asp page I have a quantity. What I want is that when the user enters a serial number in the first
7
1510
by: Marios Koumides | last post by:
I post the same question few days ago but there was a confusion in the answers. Anywayz I am posting again. I have a form with 96 textboxes (8 rows x 12 columns) Now I want in the last row to have the total of each row. The form initially is loaded with zeros so Total would be zero. If a user modifies a value, I want it immediately added in the coressponding box at the end of each column. Second question.. I want to validate every input...
8
2490
by: murphy | last post by:
I'm programming a site that displays info from AWS Commerce Service 4.0. At each change of the asp.net application the first load of a page that uses the web service takes 30 seconds. Subsequent calls are snappy. From what I've learned this overhead is for processing the wsdl file (which has of course not changed). The file is large, 2200 lines. Is there a way to use this file locally on the web server or cache the result of the...
2
1497
by: Luiz Vianna | last post by:
Hi folks, I got a problem that certainly someone had too. After a user request, I (my server) must process a lot of data that will expend some time. During this process I must inform the user the status of the overall process. What I'm facing is that my ASPX/vb file that does this long task is completely processed before some response could be given to the user. Result: After filling the form, the user clicks the button and the form...
3
1511
by: matej | last post by:
I am trying to write new GM script after couple of months of not working with Firefox at all, and I am hitting the wall even with the simplest preliminary steps to it. What I would like to achieve is to add key shortcut for setting status of the bug to NEEDINFO and information required from the reporter (which is like 75 % of bugs I manage). For that I would like to add new <Aelement to the end of the bottom button bar (see e.g....
2
1527
by: Alan Samet | last post by:
I have a performance issue related to HttpHandlers. I've written a photo gallery application that uses HttpHandlers to manage a virtual URL to my thumbnails. When I render the document with the actual thumbnail URLs, the images appear to load instantly. When using the HttpHandler, it looks as if the images are loaded one at a time -- as if the HttpHandler execution is serialized. I assume this is caused by ASP.NET processing the request...
5
2798
by: rn5a | last post by:
Can someone please suggest me a text editor especially for DEBUGGING ASP scripts apart from Microsoft Visual Interdev? I tried using Visual Interdev & created a project but Interdev generates some error related to FrontPage extensions. I couldn't exactly understand the error. I tried to create the project in C: \Inetpub\wwwroot. If I just open a ASP file (by navigating to the File-->Open File... menu), then Interdev doesn't give the...
0
1604
by: Johannes Nix | last post by:
Hi, this might be of interest for people who are look for practical information on doing real-time signal processing, possibly using multiple CPUs, and wonder whether it's possible to use Python for audio-type worst case latencies (around 25 ms). I've done that in my PhD work, both with real-time requirements on dual-CPU
3
4083
by: jackson.rayne | last post by:
Hello, Another newbie question here. Let me explain my situation first. I have bought a 3rd party tool that runs a PHP script and gives me some HTML code which I can directly use in my pages. The code generated is normal HTML code, example
0
8427
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8332
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8851
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8746
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
7356
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5649
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4175
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4335
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
1975
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.