473,387 Members | 1,899 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

need an idea, recognize sequence, fsm or genetic ?

Joh
hello,

here is my trouble, i would to like to write a program which could
help me to detect sequence of consecutive words in list in a very
efficient way. (i need to do it upon large amount of text, and for now
i'm looking for a good start point)

i have many "sentences" of variables sizes, for example one could be :
sentence1 = ['w43', 'w12', 'w41', 'w85', 'w4', 'w74', 'w24', 'w1',
'w6', 'w41', 'w85', 'w7', 'w23', 'w43']
where wx can be seen as a word.

then i had sequence of words which i need to recognize, for example :
sequences = [['w41', 'w85', 'w4'], ['w41', 'w85', 'w7'],] etc.

and need something like span position where sequences are present in
sentence:

{
['w41', 'w85', 'w4']: [[2, 4], ],
['w41', 'w85', 'w7']: [[9, 11], ]
}

for now i'm doing it by using many nested 'for loop', and it is too
slow, as i search for each sequence one by one, and then restart from
beginning, and so on, i'm wondering if this can not be done by using
nested dictionnary or something like that, maybe is there an
improvement to imagine ?
(i'm wondering if i should not find a way to not re-start from
beginning when i had found ['w41', 'w85', ] for the second time and
thus just only look after last token 'w4' or 'w7')

also i think this is related to some kind of finite state automata,
but i do not know how to build automatically massive parallel FSM (i
can have many many sequences and huge amount of sentences)

i was also wondering if some genetics algorithm used to align sequence
of protein could not help me, i had once read something about that ,
but still can not where to start. also as far as i remember such
algorithms take into account that 'sequence' could not be consecutive,
what i need is consecutive.

maybe one of you could help ? give a start point or even an idea that
could help me to design this piece of code ?
Jul 18 '05 #1
4 1916
joh12005,

your problem sounds as it may be solvable with mxtexttools.

they claim to be very fast in analyzing text and splitting into python
tuples. I never fully understood how to deal with them, sorry.

so just google for it and try it.

David Merz wrote "text processing in python" and dealt with this tools.

Sun,

Harald
Jul 18 '05 #2
Joh <jo******@yahoo.fr> wrote:

also i think this is related to some kind of finite state automata,
but i do not know how to build automatically massive parallel FSM (i
can have many many sequences and huge amount of sentences)


here's a naive but clean way to do it

Write a finite state machine class that scans the sentence for a single
sequence. Then instantiate one of them per sequence, and run the
following loop (pseudocode)

for word, index in enumerate sentence
for fsm in sequences
fsm.advance(word)
if fsm.state == END
results[fsm] += (index - fsm.sequence_length, index)
fsm.state == START

A possible optimisation is to combine all your sequences into a single
trie; however, this will also require backtracking so I'm not sure how
much faster it'd be in practice.

martin
Jul 18 '05 #3
Joh wrote:
here is my trouble, i would to like to write a program which could
help me to detect sequence of consecutive words in list in a very
efficient way.


These repeats are called "tandem arrays." Algorithms for quickly
finding them (and many other useful patterns) are given in

Gusfield, Dan. 1997. Algorithms on strings,
trees and sequences. Cambridge: Cambridge
University Press.

Gusfield has a web site
http://wwwcsif.cs.ucdavis.edu/~gusfield/
with downloadable software. I don't recall whether his web site talks
about tandem arrays, but I'm sure you could google the term...

Mike Maxwell
Jul 18 '05 #4
Mike Maxwell wrote:
Joh wrote:
here is my trouble, i would to like to write a program which could
help me to detect sequence of consecutive words in list in a very
efficient way.


These repeats are called "tandem arrays." Algorithms for quickly
finding them (and many other useful patterns) are given in


Apologies, I just re-read your original post, and (if I understand what
you are looking for), it is not tandem arrays. Nevertheless, Gusfield's
book is on a whole host of text search problems, and may still give you
leads (or even answers).

Mike Maxwell
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Jelle Feringa / EZCT Architecture & Design Researc | last post by:
Could anyone recommend me a genetic algorithm package? So far I have found a few, such as GAS, pyGP, Genetic, and of course scipy.ga My problem is that most of the development of these packages...
4
by: John | last post by:
I am trying to use the xml desinger in .net 1.1 to create the follow schema. where USAddress is derived from Address But when I generate the xml file, the .net xml editor says "The active schema...
23
by: Bruno R. Dias | last post by:
Perhaps it would be interesting to program a virtual machine simulating an ancient computer (such as the pdp-7). Then, it would be rather interesting to code for it (porting gcc to it maybe?). I...
2
by: Xiao Jianfeng | last post by:
Hi all, I am looking for a genetic algorithms package for Python. I have googled the web before posting and found some links. The link of pygene(http://www.freenet.org.nz/python/pygene) cannot...
5
by: aaragon | last post by:
Hello everybody, I appreciate your taking the time to take a look at this example. I need some help to start the design of an application. To that purpose I'm using policy-based design. The...
2
by: Anders B | last post by:
I want to make a program that reads the content of a LUA array save file.. More precicely a save file from a World of Warcraft plugin called CharacterProfiler, which dumps alot of information about...
2
by: =?Utf-8?B?TWlndWVsIElzaWRvcm8=?= | last post by:
Hi, I've built an ASP.NET 1.1 web service and an ASP.NET 1.1 application that calls it is throwing the following exception: System.Web.Services.Protocols.SoapException: Server did not...
1
by: DumRat | last post by:
Hi, I wanted to create a genetic algorithm for this question : Suppose you are out shopping. You have to buy things from different n number of stores. And have to bring home the goods...
16
by: sessmurda | last post by:
I am writing a fairly simple script that is supposed to print out lines from a txt file and group them based on where I want them. The general format of the files I am using the script on is as...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.