473,378 Members | 1,360 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Replacing words from strings except 'and' / 'or' / 'and not'

Hi there,

Background of this question is:
I want to convert all words <word> except 'and' / 'or' / 'and not' from
a string into '*<word>*'.

Example:
I have the following string:
"test and testing and not perl or testit or example"

I want to convert this string to:
'*test*' and '*testing*' and not '*perl*' or '*testit*' or '*example*'
Any idea, how to do this?

Thanks in advance,
Nico
Jul 18 '05 #1
9 1509

import sets
KEYWORDS = sets.Set(['and', 'or', 'not'])

query = "test and testing and not perl or testit or example"

def decorate(w):
if w in KEYWORDS:
return w
return "*%s*" % w

query = " ".join([decorate(w.strip()) for w in query.split()])

--
Regards,

Diez B. Roggisch
Jul 18 '05 #2
Am Thu, 25 Nov 2004 15:43:53 +0100 schrieb Nico Grubert:
Hi there,

Background of this question is:
I want to convert all words <word> except 'and' / 'or' / 'and not' from
a string into '*<word>*'.


You can give re.sub() a function

import re
ignore=["and", "not", "or"]
test="test and testing and not perl or testit or example"
def repl(match):
word=match.group(1)
if word in ignore:
return word
else:
return "*%s*" % word
print re.sub(r'(\w+)', repl, test)

Result: *test* and *testing* and not *perl* or *testit* or *example*

HTH,
Thomas
Jul 18 '05 #3

Just a comment. The w.strip() call in the last line is superfluous in
this particular case. The items in the list resulting from the
query.split() call will be stripped already. Example,
"a b c".split()
['a', 'b', 'c']
/Jean Bouwers
In article <co*************@news.t-online.com>, Diez B. Roggisch
<de*********@web.de> wrote:
import sets
KEYWORDS = sets.Set(['and', 'or', 'not'])

query = "test and testing and not perl or testit or example"

def decorate(w):
if w in KEYWORDS:
return w
return "*%s*" % w

query = " ".join([decorate(w.strip()) for w in query.split()])

Jul 18 '05 #4
On Thu, 25 Nov 2004 15:43:53 +0100, Nico Grubert <ni*********@arcor.de>
wrote:
Example:
I have the following string: "test and testing and not perl or testit or
example"

I want to convert this string to:
'*test*' and '*testing*' and not '*perl*' or '*testit*' or '*example*'


A compact, though not too readable a solution:

foo="test and testing and not perl or testit or example"

' '.join([
("'*"+w+"*'",w)[w in ('and','or')]
for w in foo.split()
]).replace("and '*not*'","and not")

--
Mitja
Jul 18 '05 #5
Diez B. Roggisch schrieb:
import sets
KEYWORDS = sets.Set(['and', 'or', 'not'])

query = "test and testing and not perl or testit or example"

def decorate(w):
if w in KEYWORDS:
return w
return "*%s*" % w

query = " ".join([decorate(w.strip()) for w in query.split()])


Is there a reason to use sets here? I think lists will do as well.

--
-------------------------------------------------------------------
Peter Maas, M+R Infosysteme, D-52070 Aachen, Tel +49-241-93878-0
E-mail 'cGV0ZXIubWFhc0BtcGx1c3IuZGU=\n'.decode('base64')
-------------------------------------------------------------------
Jul 18 '05 #6
Peter Maas wrote:
Diez B. Roggisch schrieb:
import sets
KEYWORDS = sets.Set(['and', 'or', 'not'])

query = "test and testing and not perl or testit or example"

def decorate(w):
if w in KEYWORDS:
return w
return "*%s*" % w

query = " ".join([decorate(w.strip()) for w in query.split()])


Is there a reason to use sets here? I think lists will do as well.


Sets represent the concept better, and large lists will significantly slow
down the code (linear vs constant time). Unfortunately, as 2.3's Set is
implemented in Python, you'll have to wait for the 2.4 set builtin to see
the effect for small lists/sets. In the meantime, from a performance point
of view, a dictionary fares best:

$cat contains.py
from sets import Set

# we need more items than in KEYWORDS above for Set
# to even meet the performance of list :-(
alist = dir([])
aset = Set(alist)
adict = dict.fromkeys(alist)

$timeit.py -s"from contains import alist, aset, adict" "'not' in alist"
100000 loops, best of 3: 2.21 usec per loop
$timeit.py -s"from contains import alist, aset, adict" "'not' in aset"
100000 loops, best of 3: 2.2 usec per loop
$timeit.py -s"from contains import alist, aset, adict" "'not' in adict"
1000000 loops, best of 3: 0.337 usec per loop

Peter

Jul 18 '05 #7
Peter Maas wrote:
Diez B. Roggisch schrieb:
import sets
KEYWORDS = sets.Set(['and', 'or', 'not'])
...
def decorate(w):
if w in KEYWORDS:
return w
return "*%s*" % w

Is there a reason to use sets here? I think lists will do as well.


Sets are implemented using dictionaries, so the "if w in KEYWORDS"
part would be O(1) instead of O(n) as with lists...

(I.e. searching a list is a brute-force operation, whereas
sets are not.)

-Peter
Jul 18 '05 #8
> Is there a reason to use sets here? I think lists will do as well.


Sets are implemented using dictionaries, so the "if w in KEYWORDS"
part would be O(1) instead of O(n) as with lists...

(I.e. searching a list is a brute-force operation, whereas
sets are not.)


Jp> And yet... using sets here is slower in every possible case:
...
Jp> This is a pretty clear example of premature optimization.

I think the set concept is correct. The keywords of interest are best
thought of as an unordered collection. Lists imply some ordering (or at
least that potential). Premature optimization would have been realizing
that scanning a short list of strings was faster than testing for set
membership and choosing to use lists instead of sets.

Skip
Jul 18 '05 #9
Skip Montanaro <sk**@pobox.com> wrote in message news:<ma**************************************@pyt hon.org>...
> Is there a reason to use sets here? I think lists will do as well.
>>
>> Sets are implemented using dictionaries, so the "if w in KEYWORDS"
>> part would be O(1) instead of O(n) as with lists...
>>
>> (I.e. searching a list is a brute-force operation, whereas
>> sets are not.)


Jp> And yet... using sets here is slower in every possible case:
...
Jp> This is a pretty clear example of premature optimization.

I think the set concept is correct. The keywords of interest are best
thought of as an unordered collection. Lists imply some ordering (or at
least that potential). Premature optimization would have been realizing
that scanning a short list of strings was faster than testing for set
membership and choosing to use lists instead of sets.

Skip


Jp scores extra points for pre-maturity by not trying out version 2.4,
by not reading the bit about sets now being built-in, based on dicts,
dicts being one of the timbot's optimise-the-snot-out-of targets ...
herewith some results from a box with a 1.4Ghz Athlon chip running
Windows 2000:

C:\junk>\python24\python \python24\lib\timeit.py -s "from sets import
Set; x = Set(['and', 'or', 'not'])" "None in x"
1000000 loops, best of 3: 1.81 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "from sets import
Set; x = Set(['and', 'or', 'not'])" "None in x"
1000000 loops, best of 3: 1.77 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "x = set(['and',
'or', 'not'])" "None in x"
1000000 loops, best of 3: 0.29 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "x = set(['and',
'or', 'not'])" "None in x"
1000000 loops, best of 3: 0.289 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "x = ['and',
'or', 'not']" "None in x"
1000000 loops, best of 3: 0.804 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "x = ['and',
'or', 'not']" "None in x"
1000000 loops, best of 3: 0.81 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "from sets import
Set; x = Set(['and', 'or', 'not'])" "'and' in x"
1000000 loops, best of 3: 1.69 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "x = set(['and',
'or', 'not'])" "'and' in x"
1000000 loops, best of 3: 0.243 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "x = set(['and',
'or', 'not'])" "'and' in x"
1000000 loops, best of 3: 0.245 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "x = ['and',
'or', 'not']" "'and' in x"
1000000 loops, best of 3: 0.22 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "x = ['and',
'or', 'not']" "'and' in x"
1000000 loops, best of 3: 0.22 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "x = set(['and',
'or', 'not'])" "'not' in x"
1000000 loops, best of 3: 0.257 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py -s "x = ['and',
'or', 'not']" "'not' in x"
1000000 loops, best of 3: 0.34 usec per loop

tee hee ...
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: yaipa | last post by:
What would be the common sense way of finding a binary pattern in a ..bin file, say some 200 bytes, and replacing it with an updated pattern of the same length at the same offset? Also, the...
4
by: Martin Pritchard | last post by:
Hi, I'm working on a project that historically contains around 40 enums. In the database various fields refer to the int values of these enums, but of course ref integrity is not enofrced and...
7
by: Timo Haberkern | last post by:
Hi there, i have some troubles with my TSearch2 Installation. I have done this installation as described in http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words...
35
by: jacob navia | last post by:
Hi guys! I like C because is fun. So, I wrote this function for the lcc-win32 standard library: strrepl. I thought that with so many "C heads" around, maybe we could improve it in a...
9
by: Steven | last post by:
Hello, I have a question about strcmp(). I have four words, who need to be compared if it were two strings. I tried adding the comparison values like '(strcmp(w1, w2) + strcmp(w3, w4))', where...
5
by: Dennis | last post by:
I know this is probably a very overworked issue but thought I'd share the code below to convert words in a text string to capitalize the first letter of the word using an array of word delimiters. ...
10
by: Robert R. | last post by:
Hello, i would like to write a piece of code to help me to align some sequence of words and suggest me the ordered common subwords of them s0 = "this is an example of a thing i would like to...
7
by: aine_canby | last post by:
Hi, Im totally new to Python so please bare with me. Data is entered into my program using the folling code - str = raw_input(command) words = str.split() for word in words:
7
by: DarthBob88 | last post by:
I have to go through a file and replace any occurrences of a given string with the desired string, like replacing "bug" with "feature". This is made more complicated by the fact that I have to do...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.