473,748 Members | 2,274 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help: Creating condensed expressions

Here's the problem: Given a list of item names like:

apple1
apple2
apple3_SD
formA
formB
formC
kla_MM
kla_MB
kca_MM

which is a subset of a much larger list of items,
is there an efficient algorithm to create condensed forms that match
those items, and only those items? Such as:

apple[12]
apple3_SD
form[ABC]
kla_M[MB]
kca_MM

The condensed expression syntax only has [...] and * as operators. [...]
matches a set of individual characters, * matches any string.
I'd be satisfied with a solution that only uses the [...] syntax, since
I don't think it's possible to use * without potentially matching items
not explicitly in the given list.

I'm not sure what this condensed expression syntax is called (looks a
bit like shell name expansion syntax), and I'm not even sure there is an
efficient way to do what I'm asking. Any ideas would be appreciated.

Thanks,
-David

--
Presenting:
mediocre nebula.

Mar 24 '06 #1
7 1464
This is a first try, is something like this enough for you?
data = """apple1
apple2
apple3_SD
formA
formB
formC
kla_MM
kla_MB
kca_MM""".split ()

headtails = {}
for word in data:
head = word[:-1]
if head in headtails:
headtails[head].append(word[-1])
else:
headtails[head] = [word[-1]]
for head, tails in sorted(headtail s.iteritems()):
if len(tails) == 1:
print head + tails[0]
else:
print head + "[%s]" % "".join(tai ls)
Output:
apple[12]
apple3_SD
form[ABC]
kca_MM
kla_M[MB]

It looks only the last letters. It modifies the original order (it
sorts the sequence on the root-word).
If you don't need sorted results you can remove the sorted().

Bye,
bearophile

Mar 24 '06 #2
David Hirschfield a écrit :
Here's the problem: Given a list of item names like:

apple1
apple2
apple3_SD
formA
formB
formC
kla_MM
kla_MB
kca_MM

which is a subset of a much larger list of items,
is there an efficient algorithm to create condensed forms that match
those items, and only those items? Such as:

apple[12]
apple3_SD
form[ABC]
kla_M[MB]
kca_MM

The condensed expression syntax only has [...] and * as operators. [...]
matches a set of individual characters, * matches any string.
I'd be satisfied with a solution that only uses the [...] syntax, since
I don't think it's possible to use * without potentially matching items
not explicitly in the given list.

I'm not sure what this condensed expression syntax is called (looks a
bit like shell name expansion syntax),
Looks like a very restricted subset of regular expressions.
and I'm not even sure there is an
efficient way to do what I'm asking. Any ideas would be appreciated.

import re

lines = """
apple1
apple2
apple3_SD
formA
formB
formC
kla_MM
kla_MB
kca_MM
""".strip().spl it()

patterns = [
r'apple[12]',
r'apple3_SD',
r'form[ABC]',
r'kla_M[MB]',
r'kca_MM',
]

for pat in patterns:
for line in lines:
m = re.match(pat, line)
print "%s match %s : %s" % (pat, line, m and "Yes" or 'No')
HTH
Mar 24 '06 #3
Bruno Desthuilliers a écrit :
David Hirschfield a écrit :
Here's the problem: (snip)
is there an efficient algorithm to create condensed forms that match
those items,


Sorry, answered to fast... Well, at least you have a way to test the
algo !-)
Mar 24 '06 #4
Nevermind, I didn't understand the problem/question... Sorry.

Bye,
bearophile

Mar 24 '06 #5
be************@ lycos.com a écrit :
Nevermind, I didn't understand the problem/question... Sorry.


actually, I think *you* got it right.

Mar 24 '06 #6
David Hirschfield <da****@ilm.com > writes:
Here's the problem: Given a list of item names like: apple1
apple2
apple3_SD
formA
formB
formC
kla_MM
kla_MB
kca_MM which is a subset of a much larger list of items,
is there an efficient algorithm to create condensed forms that match
those items, and only those items? Such as: apple[12]
apple3_SD
form[ABC]
kla_M[MB]
kca_MM The condensed expression syntax only has [...] and * as operators. [...]
matches a set of individual characters, * matches any string.
I'd be satisfied with a solution that only uses the [...] syntax, since
I don't think it's possible to use * without potentially matching items
not explicitly in the given list. I'm not sure what this condensed expression syntax is called (looks a
bit like shell name expansion syntax), and I'm not even sure there is an
efficient way to do what I'm asking. Any ideas would be appreciated.


If you used regular expressions instead of globs then I think what you want is
to optimize the regular expression for:

'apple1|apple2| apple3_SD|formA |formB|formC|kl a_MM|kla_MB|kca _MM'

then spit the optimised finite automaton back out. Of course if you're just
searching Python might do a decent job of optimising it anyway.

Looking at:

http://en.wikipedia.org/wiki/Finite_...e#Optimization

they make it sound so easy!. There's also a whole bunch of tools on that
page. Maybe there's an optimiser you can use in one of them.

Failing that I guess you build a tree and try to merge nodes where they fork.
I suspect an optimal solution would be hard but if you knew your input did
have lots of redundancy a simple algorithm might work.

Or I could be talking crap as usual.

Eddie
Mar 24 '06 #7
be************@ lycos.com wrote:
Nevermind, I didn't understand the problem/question... Sorry.

Bye,
bearophile

Really? Your solution looks fine to me.

In any case, here's an alternative approach to the (based on the same
understanding of the problem as bearophile's, but with the additional
requirement that the input terms be sorted)
from itertools import groupby
from operator import itemgetter
src_iter = ((i[:-1],i[-1]) for i in src.splitlines( )) #src must be sorted
grouped = groupby(src_ite r, itemgetter(0))
stemmed = ((stem, "".join(i[1] for i in values)) for stem, values in grouped)
[(len(s[1])>1 and "%s[%s]" or "%s%s") % s for s in stemmed] ['apple[12]', 'apple3_SD', 'form[ABC]', 'kla_M[MB]', 'kca_MM']

Michael

Mar 24 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
2277
by: Jacek Generowicz | last post by:
Where can I find concise, clear documentation describing what one has to do in order to enable Python's internal help to be able to provide descriptions of Python keywords ? I am in a situation where I have to give Python novices the ability to fix this for themselves easily. Failing "concise" and "clear", how about "complete and correct" ?
45
3046
by: Joh | last post by:
hello, i'm trying to understand how i could build following consecutive sets from a root one using generator : l = would like to produce : , , , ,
10
3824
by: Tjerk Wolterink | last post by:
The following code does not work: <xsl:variable name="width" select="form:image-width"/> <xsl:if test="$width>$max_image_width"> <xsl:variable name="width" select="$max_image_width"/> </xsl:if> <img class="admin" width="{$width}" src="{form:image-url}" /> The width attribute of img is always "", the $max_image_width variable is set to 200 and $width is initially set to the width of the image.
1
1962
by: Kum | last post by:
Hi, I need help in asp.net dynamic textbox controls validation. I am creating textbox controls dynamically on a asp.net webpage. Now after creating the textboxes on the page I want to validate these text boxes when the user submits or posts back to the server. My intention was to create a textboxvalidator function which takes the control name as argument and returns an true if
1
3717
by: Rahul | last post by:
Hi Everybody I have some problem in my script. please help me. This is script file. I have one *.inq file. I want run this script in XML files. But this script errors shows . If u want i am attach this script files and inq files. I cant understand this error. Please suggest me. You can talk with my yahoo id b_sahoo1@yahoo.com. Now i am online. Plz....Plz..Plz...
3
1246
by: Zach | last post by:
I'm writing an app which is going to rely extremely heavily on the usage of regular expressions. I'm reading the docs but having trouble wrapping my head around some of this since it's all fairly new to me. I have two questions, I'm hoping I can get answers to at least one :) Any help is better than no help: 1) I have many cases I am checking if a particular string matches against a particular regular expression. However, if the match...
23
2877
by: codefire | last post by:
Hi, I am trying to get a regexp to validate email addresses but can't get it quite right. The problem is I can't quite find the regexp to deal with ignoring the case james..kirk@fred.com, which is not valid. Here's my attempt, neither of my regexps work quite how I want: import os import re
11
3555
by: Andrus | last post by:
I created dynamic extension methods for <= and < SQL comparison operators: public static IQueryable<TLessThanOrEqual<T>(this IQueryable<Tsource, string property, object value); public static IQueryable<TLessThan<T>(this IQueryable<Tsource, string property, object value); For example var q = db.Customers.LessThanOrEqual( "City", "London" );
4
1389
by: =?Utf-8?B?anAybXNmdA==?= | last post by:
Prior to now, my RegEx expression was as follows: RegEx pattern = new Regex(@"\d{5} \d{4} \d{2}"); That has worked, but now I have been asked to enable the ability for our expressions to include an underscore ('_') at each of the character positions. (This is an SQL Query, and the underscore acts as a wildcard for individual characters) Is there a nice way to modify my RegEx expression to include the underscore
0
8831
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9552
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9376
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9326
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9249
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8245
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6076
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4877
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2787
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.