473,668 Members | 2,449 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

remove strings from source

For a python code I am writing I need to remove all strings
definitions from source and substitute them with a place-holder.

To make clearer:
line 45 sVar="this is the string assigned to sVar"
must be converted in:
line 45 sVar=s00001

Such substitution is recorded in a file under:
s0001[line 45]="this is the string assigned to sVar"

For curious guys:
I am trying to implement a cross variable reference tool and the
variability (in lenght) of the string definitions (expecially if
multi-line) can cause display problems.

I need your help in correctly identifying the strings (also embedding
the r'xx..' or u'yy...' as part of the string definition). The problem
is mainly on the multi-line definitions or in cached strings
(embedding chr() definitions or escape sequences).
Jul 18 '05 #1
6 1778
qwweeeit wrote:
For a python code I am writing I need to remove all strings
definitions from source and substitute them with a place-holder.

To make clearer:
line 45 sVar="this is the string assigned to sVar"
must be converted in:
line 45 sVar=s00001

Such substitution is recorded in a file under:
s0001[line 45]="this is the string assigned to sVar"

For curious guys:
I am trying to implement a cross variable reference tool and the
variability (in lenght) of the string definitions (expecially if
multi-line) can cause display problems.

I need your help in correctly identifying the strings (also embedding
the r'xx..' or u'yy...' as part of the string definition). The problem
is mainly on the multi-line definitions or in cached strings
(embedding chr() definitions or escape sequences).

Approach this in a test-driven development way. Create sample input and
output files. Write a unit test something like this (below) and run
it. You'll either solve the problem yourself or ask more specific
questions. ;-)

Cheers,

// m

#!/usr/bin/env python
import unittest

def substitute(data ):
# As a first pass, just return the data itself--obviously, this
should fail.
return data

class Test(unittest.T estCase):
def test(self):
data = open("input.txt ").read()
expected = open("expected. txt").read()
actual = substitute(data )
self.assertEqua ls(expected, actual)

if __name__ == '__main__':
unittest.main()
Jul 18 '05 #2
qwweeeit wrote:
I need your help in correctly identifying the strings (also embedding
the r'xx..' or u'yy...' as part of the string definition). The problem
is mainly on the multi-line definitions or in cached strings
(embedding chr() definitions or escape sequences).


Have a look at tokenize.genera te_tokens() in the standard library. That
ought to give you enough information to identify the strings reliably and
output modified source.
Jul 18 '05 #3
qwweeeit wrote:
For a python code I am writing I need to remove all strings
definitions from source and substitute them with a place-holder.

To make clearer:
line 45 sVar="this is the string assigned to sVar"
must be converted in:
line 45 sVar=s00001

Such substitution is recorded in a file under:
s0001[line 45]="this is the string assigned to sVar"

For curious guys:
I am trying to implement a cross variable reference tool and the
variability (in lenght) of the string definitions (expecially if
multi-line) can cause display problems.

I need your help in correctly identifying the strings (also embedding
the r'xx..' or u'yy...' as part of the string definition). The problem is mainly on the multi-line definitions or in cached strings
(embedding chr() definitions or escape sequences).


Hello,
I have written a few python parsers before.
Here is my attempt :)
# string_mapper.p y
from __future__ import generators# python 2.2
import keyword, os, sys, traceback
import cStringIO, token, tokenize

def StringNamer(num =0):
'''This is a name creating generator'''
while 1:
num += 1
stringname = 's'+str(num).zf ill(6)
yield stringname

class ReplaceParser(o bject):
"""
filein = open('yourfileh ere.py').read()
replacer = ReplaceParser(f ilein, out=sys.stdout)
replacer.format ()
replacer.String Map

"""

def __init__(self, raw, out=sys.stdout) :
''' Store the source text.
'''
self.raw =raw.expandtabs ().strip()
self.out = out
self.StringName = StringNamer()
self.StringMap = {}

def format(self):
''' Parse and send the source.
'''
self.lines = [0, 0]
pos = 0
self.temp = cStringIO.Strin gIO()
while 1:
pos = self.raw.find(' \n', pos) + 1
if not pos: break
self.lines.appe nd(pos)
self.lines.appe nd(len(self.raw ))
self.pos = 0
text = cStringIO.Strin gIO(self.raw)
try:
tokenize.tokeni ze(text.readlin e, self)
except tokenize.TokenE rror, ex:
traceback.print _exc()

def __call__(self, toktype, toktext, (srow,scol),
(erow,ecol), line):
''' Token handler.
'''
oldpos = self.pos
newpos = self.lines[srow] + scol
self.pos = newpos + len(toktext)
if toktype in [token.NEWLINE, tokenize.NL]:
self.out.write( '\n')
return
if newpos > oldpos:
self.out.write( self.raw[oldpos:newpos])
if toktype in [token.INDENT, token.DEDENT]:
self.pos = newpos
return
if (toktype == token.STRING):
sname = self.StringName .next()
self.StringMap[sname] = toktext
toktext = sname
self.out.write( toktext)
self.out.flush( )
return

hth,
M.E.Farmer

Jul 18 '05 #4
Thank you for your suggestion, but it is too complicated for me...
I decided to proceed in steps:
1. Take away all commented lines
2. Rebuild the multi-lines as single lines

I have already written the code and now I can face the problem of
mouving string definitions into a data base file...
Hopefully I will then build cross reference tables of the variables.
My project is also to implement the code for building functions' tree
..
Jul 18 '05 #5
qwweeeit wrote:
Thank you for your suggestion, but it is too complicated for me...
I decided to proceed in steps:
1. Take away all commented lines
2. Rebuild the multi-lines as single lines

ummm,
Ok all i can say is did you try this?
if not save it as a module then import it into the interperter and try
it.
This is a dead simple module to do *exactly* what you asked for :)
Like i said I have done this before so I will restate *I HAVE FAILED AT
THIS BEFORE, MANY TIMES*. Now I have a solution.
It handles stdio by default but can write to a filelike object if you
give it one.
Handles continued lines already, no need to futz around with some
solution.
Here is an example:
Py> filein = """
.... class Stripper:
.... '''python comment and whitespace stripper
.... '''
.... def __init__(self, raw):
.... ''' Store the source text & set some flags.
.... '''
.... self.raw = raw
....
.... def format(self, out=sys.stdout, comments=0,
.... spaces=1, untabify=1,eol= 'unix'):
.... '''Parse and send the colored source.'''
.... # Store line offsets in self.lines
.... self.lines = [0, 0]
.... pos = 0
.... # Strips the first blank line if 1
.... self.lasttoken = 1
.... self.temp = StringIO.String IO()
.... self.spaces = spaces
.... self.comments = comments
....
.... if untabify:
.... self.raw = self.raw.expand tabs()
.... self.raw = self.raw.rstrip ()+' '
.... self.out = out
.... """
Py> replacer = ReplaceParser(f ilein, out=sys.stdout)
Py> replacer.format ()
class Stripper:
s000001
def __init__(self, raw):
s000002
self.raw = raw

def format(self, out=sys.stdout, comments=0,
spaces=1, untabify=1,eol= s000003):
s000004
# Store line offsets in self.lines
self.lines = [0, 0]
pos = 0
# Strips the first blank line if 1
self.lasttoken = 1
self.temp = StringIO.String IO()
self.spaces = spaces
self.comments = comments

if untabify:
self.raw = self.raw.expand tabs()
self.raw = self.raw.rstrip ()+s000005
self.out = out
Py> replacer.String Map
{'s000004': "'''Parse and send the colored source.'''",
's000005': "' '",
's000001': "'''python comment and whitespace stripper :)\n '''",
's000002': "''' Store the source text & set some flags.\n '''",
's000003': "'unix'"}

You can also strip out comments with a few line.
It can easily get single comments or doubles.
add this in your __call__ function:
[snip]
self.pos = newpos
return
# kills comments
if (toktype == tokenize.COMMEN T):
return
if (toktype == token.STRING):
sname = self.StringName .next()
[snip]

If you insist on writing something go ahead.
Let me know what your solution is, I am curious.
M.E.Farmer

Jul 18 '05 #6
I am in debt with you of an answer on " my" solution in removing
literal strings...
I apologize not to have followed your suggestions but I am just
learning Python, and your approach was too difficult for me!
I've already developed the cross reference tool, and for that I
identified two types of literals (the standard one which I named in
general s~ and the multi-line or triple quoted strins, which I called
m~).

You can see a step in my appproach to the solution in an answer to
Fredrik Lundh
http://groups.google.it/groups?q=qww...gle.com&rnum=2

After that I have almost completed the application, and better than
explanations you can see the result (a small extract).

052 PROGNAME: PROGNAME = sys.argv[0]
053 AUTHOR: AUTHOR = us~.encode(s~)
054 VERSION: VERSION = s~
056 URL_BASE: URL_BASE = s~
057 OUTPUT_HTML: OUTPUT_HTML = s~ etc...

The cross references are mainly useful for variables, but I use them
also for Python reserved words, to learn the language and also classes
and functions.

For small applications there is no need for my toool, but with a
source of almost 1 Mb... (like Pysol).
Excuse if I don't go deeper in my solution for removing strings, but
it is so standard that there is nothing to learn ...

Bye.
Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
12707
by: Christopher Armstrong | last post by:
Hello! I'm trying to write a part of a program that will remove all files in its directory. I have tried the std::remove feature of the standard library, but I don't know its syntax. Also, what's the difference between std::remove and std::erase? Thanks for your time!
1
19013
by: Craig Buchanan | last post by:
what is the fastest way to remove a value from a string array? something like: dim x as string() = {"A","B","C","D"} 'remove C x.Clear(x, x.IndexOf(x, "C"), 1) Questions:
36
2595
by: Roman Mashak | last post by:
Hello, All! I implemented simple program to eliminate entry from the file having the following structure (actually it's config file of 'named' DNS package for those who care and know): options { directory "/var/named"; listen-on { 192.168.11.22; 127.0.0.1; }; forwarders { 168.126.63.1; };
100
5097
by: jacob navia | last post by:
Recently, a heated debate started because of poor mr heathfield was unable to compile a program with // comments. Here is a utility for him, so that he can (at last) compile my programs :-) More seriously, this code takes 560 bytes. Amazing isn't it? C is very ompact, you can do great things in a few bytes. Obviously I have avoided here, in consideration for his pedantic
12
1317
by: Martin Drautzburg | last post by:
I would like to validate sql strings, which are spread all over the code, i.e. I run ("prepare") them against a database to see if it happy with the statements. Spelling errors in sql have been a major pain for me. The statements will not be assembled from smaller pieces, but they will not neccessarily be defined at module level. I could live with class level, but method level would be best. And I definitely don't want to parse the...
3
1991
by: Harry Strybos | last post by:
Hi All I have a really strange problem occurring in my application. When I read in the application settings for connection strings the following happens: Here are my connection string settings - <connectionStrings> <add name="TRM8.UI.My.MySettings.TRM8" connectionString="Data Source=SQLDATA;Initial Catalog=TRM8_PROD;Integrated Security=True"
26
13784
by: Brad | last post by:
I'm writing a function to remove certain characters from strings. For example, I often get strings with commas... they look like this: "12,384" I'd like to take that string, remove the comma and return "12384" What is the most efficient, fastest way to approach this? Thanks,
2
1600
by: Jean-Paul Calderone | last post by:
On Fri, 5 Sep 2008 14:24:16 -0500, Robert Dailey <rcdailey@gmail.comwrote: mystring = ( "This is a very long string that " "spans multiple lines and does " "not include line breaks or tabs " "from the source file between " "the strings partitions.") Jean-Paul
36
9158
by: laredotornado | last post by:
Hi, I'm using PHP 5. I have an array of strings. What is the simplest way to remove the elements that are empty, i.e. where the expression "empty($elt)" returns true? Thanks, - Dave
0
8459
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8790
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8572
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7391
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5677
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4202
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4372
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2782
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1779
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.