473,703 Members | 2,660 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

best way to align words?

Hello,

i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them

s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = 'and this is another " example " but of something ; now i would
still like to have'.split()
....
alist = (s0, s1, s2)

result should be : ('example', 'of', 'i', 'would', 'like', 'to', 'have'

but i do not know how should i start, may be have you a helpful
suggestion?
a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.

best.

Nov 30 '06 #1
10 2188
Robert R. schrieb:
Hello,

i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them

s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = 'and this is another " example " but of something ; now i would
still like to have'.split()
...
alist = (s0, s1, s2)

result should be : ('example', 'of', 'i', 'would', 'like', 'to', 'have'

but i do not know how should i start, may be have you a helpful
suggestion?
a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.

best.
As far as I can see, you want to have the words, that all three lists
have in common, right?

s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = 'and this is another " example " but of something ; now i would
still like to have'.split()

def findCommons(s0, s1, s2):
res = []
for word in s0:
if word in s1 and word in s2:
res.append(word )
return res
>>>print findCommons(s0, s1,s2)
['example', 'of', 'i', 'would', 'like', 'to', 'have']
Nov 30 '06 #2
Robert R. wrote:
Hello,

i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them

a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.
"align"?
Anyway, for finding the commonest words, you'll be best off
counting how many times each word appears:

lst = ["foo bar baz", "qux foo foo kaka", "one foo and kaka
times qux"]

for line in lst:
for word in line.split():
count[word] = count.get(word, 0) + 1

Now you go for the ones with the highest count:

for (word, n) in sorted(d.items( ), key = lambda x: x[1],
reverse = True):
print word, 'appears', n, 'times'

Untested. If you want to count the number of lines a word
appears in (as opposed to the number of times it appears at
all), add an extra condition before count[word] = ...
Nov 30 '06 #3
i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them
Im not sure what you want, but in case you are guy who knows how
quicksort and Djikstra algorithms work :) and wants to find out more.

There are many algorithms out there, discovered on "Text algorithms"
univesity course. The first one does not directly solve your problem -
"edit distance" (Levenshtein distance)
http://en.wikipedia.org/wiki/Levenshtein_distance
I mention it here only because it is simple and shows basic idea of
Dynamic Programming
http://en.wikipedia.org/wiki/Dynamic_programming

If you scroll down you'll see "Longest common subsequence problem" with
implementation in Python for 2 sequences. If you dont understand how it
works just look into "edit distance" idea and see it is exactly the
same algorithm with changed rules.

Oleg

Nov 30 '06 #4
Robert R.:
i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them [...]
a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.
This is my first solution try, surely there are faster, shorter, better
solutions...
from collections import defaultdict
from itertools import chain
from graph import Graph
# http://sourceforge.net/projects/pynetwork/

def commonOrdered(* strings):
lists = [[w for w in string.lower(). split() if w.isalpha()] for
string in strings]

freqs = defaultdict(int )
for w in chain(*lists):
freqs[w] += 1

g = Graph()
for words in lists:
g.addPath(words )

len_strings = len(strings)
return [w for w in g.toposort() if freqs[w]==len_strings]
s0 = "this is an example of a thing i would like to have"
s1 = "another example of something else i would like to have"
s2 = 'and this is another " example " but of something ; now i would
still like to have'

print commonOrdered(s 0, s1, s2)

It creates a graph with the paths of words, then sorts the graph
topologically, then takes only the words of the sorting that are
present in all the original strings.
With a bit of work the code can be used if it contains words like
"example" instead of " example ".
An xtoposort method too can be added to the Graph class...

Bye,
bearophile

Nov 30 '06 #5
This is my first solution try, surely there are faster, shorter, better
solutions...
It creates a graph with the paths of words, then sorts the graph
topologically,
Beside possible inefficiencies, this "solution" breaks if words aren't
in the correct order, the topological sort can't work...
I'll have to think about better solutions, if possible.

Sorry,
bye,
bearophile

Dec 1 '06 #6
Robert R. wrote:
Hello,

i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them

s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = 'and this is another " example " but of something ; now i would
still like to have'.split()
...
alist = (s0, s1, s2)

result should be : ('example', 'of', 'i', 'would', 'like', 'to', 'have'

but i do not know how should i start, may be have you a helpful
suggestion?
a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.

best.
Your requirements are a little vague... how are these three strings handled?

s1 = "hello there dudes"
s2 = "dudes hello there"
s3 = "there dudes hello"

they all share the 3 words, but what order do you want them back?

here is a simplistic approach using sets that results in a list of words
that are in all strings ordered arbitrarily by their order in the first
string ( it also doesn't worry about matches (or lack of) due to
punctuation and case and crap like that)
>>strList = []
strList.appen d('this is an example of a thing i would like to have')
strList.appen d('another example of something else i would like to
have')
>>strList.appen d('and this is another " example " but of something ;
now i would still like to have')
>>[word for word in strList[0].split() if word in reduce(lambda x, y:
x.intersection( y), [set(str.split() ) for str in strList])]
['example', 'of', 'i', 'would', 'like', 'to', 'have']

but you still have issues with mutiple matches and how they are handled
etc...

noah
Dec 1 '06 #7
Noah Rawlins wrote:
>
>>strList = []
>>strList.appen d('this is an example of a thing i would like to have')
>>strList.appen d('another example of something else i would like to
have')
>>strList.appen d('and this is another " example " but of something ;
now i would still like to have')
>>[word for word in strList[0].split() if word in reduce(lambda x, y:
x.intersection( y), [set(str.split() ) for str in strList])]
['example', 'of', 'i', 'would', 'like', 'to', 'have']
I think that ends up doing the set reduction over and over for every
word in the first string, so you probably want to move that outside the
list comprehension

noah
Dec 1 '06 #8

Hello,

thanks for all your replies, i'm now looking to dynamic programming...

sorry for forgetting to say that i wanted the words to be ordered, thus
:

s1 = "hello there dudes"
s2 = "dudes hello there"
s3 = "there dudes hello"

will not return anything while sharing all three words.

Bearophile your solution with graph looks interesting although i still
do not understand how it works, but yes there is definitively something
with drawing path around words.

i have tried SequenceMatcher from difflib after using combinations of
all sentences as i need to process much more than the 3 of my first
example.

best.

Dec 2 '06 #9
thanks for all your replies, i'm now looking to dynamic programming...
Id better warn you before you go further.
"Notice that LCS is often defined to be finding all common
subsequences of a maximum length. This problem inherently has higher
complexity, as the number of such subsequences is exponential in the
worst case"

This means that if you have 10 sentences with 5 words in each there is
5^10 space and time complexity. Definitelly, there are better
algorithms from dynamic programming, but you should review your needs:
how many sentences, words you have.

There can be easier way than dynamic programming.

Oleg

Dec 2 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2509
by: Tony Vasquez | last post by:
What is the best way to center one, or more DIVs in the browser window? Can someone give me a quick code snipplette. I need something that will center the DIV, for all sorts of awkward screen resolutions. Remeber, with that said, I need it to be centered in the BROWSER window, not the screen. User must be able to resize his browser window to anything though. Anywho, if you need more info, let me know, and thanks in advance. Tony
28
3868
by: A.Translator | last post by:
I am still struggling with an unordered list (http://www.xs4all.nl/~hogen/TaalVlinder/). The top navbar contains 4 divs with each an ul, and no padding or margins. But I get far too much white to the left and right of the lists, especially noticeble in the two right hand menu's 'flora & fauna' and afkortingen etc.'
6
2805
by: TJ | last post by:
I've got a calendar that is based on the concept of lots of blocks that are spans with float:left. I would like to be able to have a detail section on the right side of the screen, so that when the user selects an item on the calendar, the detail can get displayed on the right side there. What I have below appears to look perfect in mozilla based browsers - the detail is aligned at the top right corner of the calendar, but in IE the...
6
12444
by: Viken Karaguesian | last post by:
Hello, Just wondering what the difference is between "float" and "align"? If I align a picture to the left, is that not the same as floating it? Can align and float be used together or do you use one or the other? Thanks for your replies... Viken
4
2308
by: Will Hartung | last post by:
The designers have handed me a page that has 5 different blocks on it in the center column (in a typical 3 column layout with page spanning headers and footers). The blocks have elaborate headers (large images), and small content areas that will hold little blocks of text. Currently, I have this done with pixel specific tables within tables within tables, using the images as backgrounds of the repseective TD's.
2
5724
by: Newry | last post by:
Hi, I'm trying to position something with CSS, to have the equivalent of: <table> <tr> <td><img src="foo.jgp"></td> <td valign=bottom>Label</td> </tr> </table>
6
7285
by: Dan V. | last post by:
What is the best/simplest way to have a large top image with let's say 14 thumbnails under it in 7 rows and to replace the top image with the larger one when a user clicks on a thumbnail? I would prefer to only have one html page. thanks,
3
2024
by: Russell | last post by:
Hey, ok i have numerous tables to search through for a 'site search'. some of the searchble fields have html embeded within so after some quick referencing, saw I can use the regExp function to strip out all the HTML leaving only the raw text. (done and works a treat) My issue is:
1
2823
by: Muchach | last post by:
Hello, Ok so what I've got going on is a form that is populated by pulling info from database then using php do{} to create elements in form. I have a text box in each table row for the user to enter input. I need to take this user input and put it back into the database. What would be the best method to do this. I can't use a normal post because the name of the text box is the same for each table row. I've heard that posting the...
0
9243
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9109
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9002
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8956
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7853
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6585
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4420
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4677
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3113
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.