Here's the problem: Given a list of item names like:
apple1
apple2
apple3_SD
formA
formB
formC
kla_MM
kla_MB
kca_MM
which is a subset of a much larger list of items,
is there an efficient algorithm to create condensed forms that match
those items, and only those items? Such as:
apple[12]
apple3_SD
form[ABC]
kla_M[MB]
kca_MM
The condensed expression syntax only has [...] and * as operators. [...]
matches a set of individual characters, * matches any string.
I'd be satisfied with a solution that only uses the [...] syntax, since
I don't think it's possible to use * without potentially matching items
not explicitly in the given list.
I'm not sure what this condensed expression syntax is called (looks a
bit like shell name expansion syntax), and I'm not even sure there is an
efficient way to do what I'm asking. Any ideas would be appreciated.
Thanks,
-David
--
Presenting:
mediocre nebula. 7 1464
This is a first try, is something like this enough for you?
data = """apple1
apple2
apple3_SD
formA
formB
formC
kla_MM
kla_MB
kca_MM""".split ()
headtails = {}
for word in data:
head = word[:-1]
if head in headtails:
headtails[head].append(word[-1])
else:
headtails[head] = [word[-1]]
for head, tails in sorted(headtail s.iteritems()):
if len(tails) == 1:
print head + tails[0]
else:
print head + "[%s]" % "".join(tai ls)
Output:
apple[12]
apple3_SD
form[ABC]
kca_MM
kla_M[MB]
It looks only the last letters. It modifies the original order (it
sorts the sequence on the root-word).
If you don't need sorted results you can remove the sorted().
Bye,
bearophile
David Hirschfield a écrit : Here's the problem: Given a list of item names like:
apple1 apple2 apple3_SD formA formB formC kla_MM kla_MB kca_MM
which is a subset of a much larger list of items, is there an efficient algorithm to create condensed forms that match those items, and only those items? Such as:
apple[12] apple3_SD form[ABC] kla_M[MB] kca_MM
The condensed expression syntax only has [...] and * as operators. [...] matches a set of individual characters, * matches any string. I'd be satisfied with a solution that only uses the [...] syntax, since I don't think it's possible to use * without potentially matching items not explicitly in the given list.
I'm not sure what this condensed expression syntax is called (looks a bit like shell name expansion syntax),
Looks like a very restricted subset of regular expressions.
and I'm not even sure there is an efficient way to do what I'm asking. Any ideas would be appreciated.
import re
lines = """
apple1
apple2
apple3_SD
formA
formB
formC
kla_MM
kla_MB
kca_MM
""".strip().spl it()
patterns = [
r'apple[12]',
r'apple3_SD',
r'form[ABC]',
r'kla_M[MB]',
r'kca_MM',
]
for pat in patterns:
for line in lines:
m = re.match(pat, line)
print "%s match %s : %s" % (pat, line, m and "Yes" or 'No')
HTH
Bruno Desthuilliers a écrit : David Hirschfield a écrit :
Here's the problem:
(snip) is there an efficient algorithm to create condensed forms that match those items,
Sorry, answered to fast... Well, at least you have a way to test the
algo !-)
Nevermind, I didn't understand the problem/question... Sorry.
Bye,
bearophile be************@ lycos.com a écrit : Nevermind, I didn't understand the problem/question... Sorry.
actually, I think *you* got it right.
David Hirschfield <da****@ilm.com > writes: Here's the problem: Given a list of item names like:
apple1 apple2 apple3_SD formA formB formC kla_MM kla_MB kca_MM
which is a subset of a much larger list of items, is there an efficient algorithm to create condensed forms that match those items, and only those items? Such as:
apple[12] apple3_SD form[ABC] kla_M[MB] kca_MM
The condensed expression syntax only has [...] and * as operators. [...] matches a set of individual characters, * matches any string. I'd be satisfied with a solution that only uses the [...] syntax, since I don't think it's possible to use * without potentially matching items not explicitly in the given list.
I'm not sure what this condensed expression syntax is called (looks a bit like shell name expansion syntax), and I'm not even sure there is an efficient way to do what I'm asking. Any ideas would be appreciated.
If you used regular expressions instead of globs then I think what you want is
to optimize the regular expression for:
'apple1|apple2| apple3_SD|formA |formB|formC|kl a_MM|kla_MB|kca _MM'
then spit the optimised finite automaton back out. Of course if you're just
searching Python might do a decent job of optimising it anyway.
Looking at: http://en.wikipedia.org/wiki/Finite_...e#Optimization
they make it sound so easy!. There's also a whole bunch of tools on that
page. Maybe there's an optimiser you can use in one of them.
Failing that I guess you build a tree and try to merge nodes where they fork.
I suspect an optimal solution would be hard but if you knew your input did
have lots of redundancy a simple algorithm might work.
Or I could be talking crap as usual.
Eddie be************@ lycos.com wrote: Nevermind, I didn't understand the problem/question... Sorry.
Bye, bearophile
Really? Your solution looks fine to me.
In any case, here's an alternative approach to the (based on the same
understanding of the problem as bearophile's, but with the additional
requirement that the input terms be sorted) from itertools import groupby from operator import itemgetter src_iter = ((i[:-1],i[-1]) for i in src.splitlines( )) #src must be sorted grouped = groupby(src_ite r, itemgetter(0)) stemmed = ((stem, "".join(i[1] for i in values)) for stem, values in grouped) [(len(s[1])>1 and "%s[%s]" or "%s%s") % s for s in stemmed]
['apple[12]', 'apple3_SD', 'form[ABC]', 'kla_M[MB]', 'kca_MM']
Michael This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Jacek Generowicz |
last post by:
Where can I find concise, clear documentation describing what one has
to do in order to enable Python's internal help to be able to provide
descriptions of Python keywords ?
I am in a situation where I have to give Python novices the ability to
fix this for themselves easily.
Failing "concise" and "clear", how about "complete and correct" ?
|
by: Joh |
last post by:
hello,
i'm trying to understand how i could build following consecutive sets
from a root one using generator :
l =
would like to produce :
, , , ,
|
by: Tjerk Wolterink |
last post by:
The following code does not work:
<xsl:variable name="width" select="form:image-width"/>
<xsl:if test="$width>$max_image_width">
<xsl:variable name="width" select="$max_image_width"/>
</xsl:if>
<img class="admin" width="{$width}" src="{form:image-url}" />
The width attribute of img is always "", the $max_image_width variable is set to 200
and $width is initially set to the width of the image.
|
by: Kum |
last post by:
Hi,
I need help in asp.net dynamic textbox controls
validation.
I am creating textbox controls dynamically on a
asp.net webpage. Now after creating the textboxes on
the page I want to validate these text boxes when the
user submits or posts back to the server. My intention
was to create a textboxvalidator function which takes
the control name as argument and returns an true if
|
by: Rahul |
last post by:
Hi Everybody
I have some problem in my script. please help me. This is script file.
I have one *.inq file. I want run this script in XML files. But this
script errors shows . If u want i am attach this script
files and inq files. I cant understand this error. Please suggest me.
You can talk with my yahoo id b_sahoo1@yahoo.com. Now i am online.
Plz....Plz..Plz...
| |
by: Zach |
last post by:
I'm writing an app which is going to rely extremely heavily on the
usage of regular expressions. I'm reading the docs but having trouble
wrapping my head around some of this since it's all fairly new to me.
I have two questions, I'm hoping I can get answers to at least one :)
Any help is better than no help:
1) I have many cases I am checking if a particular string matches
against a particular regular expression. However, if the match...
|
by: codefire |
last post by:
Hi,
I am trying to get a regexp to validate email addresses but can't get
it quite right. The problem is I can't quite find the regexp to deal
with ignoring the case james..kirk@fred.com, which is not valid. Here's
my attempt, neither of my regexps work quite how I want:
import os
import re
|
by: Andrus |
last post by:
I created dynamic extension methods for <= and < SQL comparison operators:
public static IQueryable<TLessThanOrEqual<T>(this IQueryable<Tsource,
string property, object value);
public static IQueryable<TLessThan<T>(this IQueryable<Tsource, string
property, object value);
For example
var q = db.Customers.LessThanOrEqual( "City", "London" );
|
by: =?Utf-8?B?anAybXNmdA==?= |
last post by:
Prior to now, my RegEx expression was as follows:
RegEx pattern = new Regex(@"\d{5} \d{4} \d{2}");
That has worked, but now I have been asked to enable the ability for our
expressions to include an underscore ('_') at each of the character
positions. (This is an SQL Query, and the underscore acts as a wildcard for
individual characters)
Is there a nice way to modify my RegEx expression to include the underscore
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |