By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,271 Members | 1,722 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,271 IT Pros & Developers. It's quick & easy.

multi regexp analyzer ? or how to do...

P: n/a
Hello,

here is a trouble that i had, i would like to resolve it with python,
even if i still have no clue on how to do it.

i had many small "text" files, so to speed up processes on them, i used
to copy them inside a huge one adding some king of xml separator :

<file name="...">
[content]
</file>

content is tab separated data (columns) ; data are strings

now here come the tricky part for me :

i would like to be able to create some kind of matching rules, using
regular expressions, rules should match data on one line (the smallest
data unit for me) or a set of lines, say for example :

if on this line , match first column against this regexp and match
second column
and on following line match third column
-> trigger something

so, here is how i had tried :

- having all the rules,
- build some kind of analyzer for each rule,
- keep size of longest one L,
- then read each line of the huge file one by one,
- inside a "file", create all the subsets of length <= L
- for each analyzer see if it matches any of the subsets
- if it occurs...

my trouble is here :

"for each analyzer see if it matches any of the subset"

it is really to slow, i had many many rules, and as it is "for loop
inside for loop", and inside each rule also "for loop on subsets lines"
i need to speed up that, have you any idea ?

i am thinking of having "only rules for one line" and to keep traces of
if a rule is a "ending one" (to trigger something) , or a "must
continue" , but is still unclear to me for now...

a great thing could also have been some sort of dict with regexp
keys...

(and actually it would be great if i could also use some kind of regexp
operator to tell one can skip the content of 0 to n lines before
matching, just as if in the example i had changed "following..." by
"skip at least 2 lines and match third column on next line - it would
be great, but i still have really no idea on how to even think about
that)

great thx to anybody who could help,

best

Jul 19 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
I'd propose a pyparsing implementation, but you don't give us many
specifics. Is there any chance you could post some sample data, and
one or two of the regexps you are using for matching?

-- Paul

Jul 19 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.