Connecting Tech Pros Worldwide Forums | Help | Site Map

Matching preceding-sibling axis is SLOWWWW!

Bill Cohagan
Guest
 
Posts: n/a
#1: Nov 12 '05
I have an XSLT transformation that involves a template with

match =
"w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@w:type
= 'Word.Bookmark.Start']]"

What I'm matching on is a w:p element that is:

1.) Within a w:body element (at some level).
2.) Immediately preceeded by an aml:annotation element (whose @w:type is
'Word.Bookmark.Start').

This would seem to be a fairly simple match and, in fact, when I run it on a
fairly small XML input file it finishes quickly. However, when I run it on a
large (>6MB) XML file it appears to hang -- at least it has so far never
finished and I've given it an hour or so to try on a 1GB RAM, 2GHz machine.

This performance problem would seem to indicate that MSXML (the .Net
Framework 1.1 version) doesn't do much/any optimization on axis references.
I'm guessing that it is synthesizing a node set of the entire
preceding-sibling axis (which may be thousands of elements), then indexing
into it using [1]!

I'm off to finding a workaround, perhaps using xsl:key to speed things up,
but of course xsl:key has its own set of MSXML problems. Does anyone have
any suggestions how I might avoid this problem?

Thanks in advance,
Bill

PS - BTW, I'm trying to process a rather large Word 2003 doc via its WordML
representation.



Oleg Tkachenko [MVP]
Guest
 
Posts: n/a
#2: Nov 12 '05

re: Matching preceding-sibling axis is SLOWWWW!


Bill Cohagan wrote:
[color=blue]
> I have an XSLT transformation that involves a template with
>
> match =
> "w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@w:type
> = 'Word.Bookmark.Start']]"[/color]

That's really bad match pattern. Patterns are meant to be simple due to
XSLT processing model. Instead move selection logic to applying step:

<xsl:apply-templates
select="w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@w:type
= 'Word.Bookmark.Start']]"/>
....

<xsl:template match="w:p">

To avoid clashes you can use modes:

<xsl:apply-templates
select="w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@w:type
= 'Word.Bookmark.Start']]" mode="bookmark"/>
....

<xsl:template match="w:p" mode="bookmark">
[color=blue]
> I'm off to finding a workaround, perhaps using xsl:key to speed things up,
> but of course xsl:key has its own set of MSXML problems. Does anyone have
> any suggestions how I might avoid this problem?[/color]

Never heard of any problems with keys in MSXML. Yes, keys are usually
preferred way (they allow direct access to nodes instead of tree
selection with the price of building the index).

--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com
Oleg Tkachenko [MVP]
Guest
 
Posts: n/a
#3: Nov 12 '05

re: Matching preceding-sibling axis is SLOWWWW!


Bill Cohagan wrote:
[color=blue]
> I have an XSLT transformation that involves a template with
>
> match =
> "w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@w:type
> = 'Word.Bookmark.Start']]"[/color]

That's really bad match pattern. Patterns are meant to be simple due to
XSLT processing model. Instead move selection logic to applying step:

<xsl:apply-templates
select="w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@w:type
= 'Word.Bookmark.Start']]"/>
....

<xsl:template match="w:p">

To avoid clashes you can use modes:

<xsl:apply-templates
select="w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@w:type
= 'Word.Bookmark.Start']]" mode="bookmark"/>
....

<xsl:template match="w:p" mode="bookmark">
[color=blue]
> I'm off to finding a workaround, perhaps using xsl:key to speed things up,
> but of course xsl:key has its own set of MSXML problems. Does anyone have
> any suggestions how I might avoid this problem?[/color]

Never heard of any problems with keys in MSXML. Yes, keys are usually
preferred way (they allow direct access to nodes instead of tree
selection with the price of building the index).

--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com
Bill Cohagan
Guest
 
Posts: n/a
#4: Nov 12 '05

re: Matching preceding-sibling axis is SLOWWWW!


Oleg
Thanks for the response. I don't know why you consider it a "bad pattern"
as that's what XSL is all about. The XSLT processing model doesn't
discourage nontrivial match patterns although it's certainly possible that
the MS implementation does. Certainly it's a nontrivial pattern as are many
others in my application; however it is the only one that causes the .Net
MSXML to hang/nonterminate.

As a sanity check I ran this particular template against Instant Saxon and
it completes in less than a minute -- so clearly it's not a limitation of
the language, but of the implementation. I also moved the predicates into
the template as you suggested and it still hangs in .Net.

Regards,
Bill

PS - I've encountered several xsl:key errors for which hot fixes were later
supplied. The latest problem (with KB article pending) has to do with using
current() within a predicate of a match pattern in an xsl:key. If you'd like
I'll let you know when it gets posted. The workaround for that one requires
the use of c# script in the template rather than using xsl:key. MS seems to
be not anxious to spend resources on fixing their XSL tools, although
they've been helpful in finding workarounds.


"Oleg Tkachenko [MVP]" <oleg@NO!SPAM!PLEASEtkachenko.com> wrote in message
news:%23NW7ukDXEHA.1000@TK2MSFTNGP12.phx.gbl...[color=blue]
> Bill Cohagan wrote:
>[color=green]
> > I have an XSLT transformation that involves a template with
> >
> > match =
> >[/color][/color]
"w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@w:type[color=blue][color=green]
> > = 'Word.Bookmark.Start']]"[/color]
>
> That's really bad match pattern. Patterns are meant to be simple due to
> XSLT processing model. Instead move selection logic to applying step:
>
> <xsl:apply-templates
>[/color]
select="w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@
w:type[color=blue]
> = 'Word.Bookmark.Start']]"/>
> ...
>
> <xsl:template match="w:p">
>
> To avoid clashes you can use modes:
>
> <xsl:apply-templates
>[/color]
select="w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@
w:type[color=blue]
> = 'Word.Bookmark.Start']]" mode="bookmark"/>
> ...
>
> <xsl:template match="w:p" mode="bookmark">
>[color=green]
> > I'm off to finding a workaround, perhaps using xsl:key to speed things[/color][/color]
up,[color=blue][color=green]
> > but of course xsl:key has its own set of MSXML problems. Does anyone[/color][/color]
have[color=blue][color=green]
> > any suggestions how I might avoid this problem?[/color]
>
> Never heard of any problems with keys in MSXML. Yes, keys are usually
> preferred way (they allow direct access to nodes instead of tree
> selection with the price of building the index).
>
> --
> Oleg Tkachenko [XML MVP]
> http://blog.tkachenko.com[/color]


Bill Cohagan
Guest
 
Posts: n/a
#5: Nov 12 '05

re: Matching preceding-sibling axis is SLOWWWW!


Oleg
Thanks for the response. I don't know why you consider it a "bad pattern"
as that's what XSL is all about. The XSLT processing model doesn't
discourage nontrivial match patterns although it's certainly possible that
the MS implementation does. Certainly it's a nontrivial pattern as are many
others in my application; however it is the only one that causes the .Net
MSXML to hang/nonterminate.

As a sanity check I ran this particular template against Instant Saxon and
it completes in less than a minute -- so clearly it's not a limitation of
the language, but of the implementation. I also moved the predicates into
the template as you suggested and it still hangs in .Net.

Regards,
Bill

PS - I've encountered several xsl:key errors for which hot fixes were later
supplied. The latest problem (with KB article pending) has to do with using
current() within a predicate of a match pattern in an xsl:key. If you'd like
I'll let you know when it gets posted. The workaround for that one requires
the use of c# script in the template rather than using xsl:key. MS seems to
be not anxious to spend resources on fixing their XSL tools, although
they've been helpful in finding workarounds.


"Oleg Tkachenko [MVP]" <oleg@NO!SPAM!PLEASEtkachenko.com> wrote in message
news:%23NW7ukDXEHA.1000@TK2MSFTNGP12.phx.gbl...[color=blue]
> Bill Cohagan wrote:
>[color=green]
> > I have an XSLT transformation that involves a template with
> >
> > match =
> >[/color][/color]
"w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@w:type[color=blue][color=green]
> > = 'Word.Bookmark.Start']]"[/color]
>
> That's really bad match pattern. Patterns are meant to be simple due to
> XSLT processing model. Instead move selection logic to applying step:
>
> <xsl:apply-templates
>[/color]
select="w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@
w:type[color=blue]
> = 'Word.Bookmark.Start']]"/>
> ...
>
> <xsl:template match="w:p">
>
> To avoid clashes you can use modes:
>
> <xsl:apply-templates
>[/color]
select="w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@
w:type[color=blue]
> = 'Word.Bookmark.Start']]" mode="bookmark"/>
> ...
>
> <xsl:template match="w:p" mode="bookmark">
>[color=green]
> > I'm off to finding a workaround, perhaps using xsl:key to speed things[/color][/color]
up,[color=blue][color=green]
> > but of course xsl:key has its own set of MSXML problems. Does anyone[/color][/color]
have[color=blue][color=green]
> > any suggestions how I might avoid this problem?[/color]
>
> Never heard of any problems with keys in MSXML. Yes, keys are usually
> preferred way (they allow direct access to nodes instead of tree
> selection with the price of building the index).
>
> --
> Oleg Tkachenko [XML MVP]
> http://blog.tkachenko.com[/color]


Oleg Tkachenko [MVP]
Guest
 
Posts: n/a
#6: Nov 12 '05

re: Matching preceding-sibling axis is SLOWWWW!


Bill Cohagan wrote:
[color=blue]
> Thanks for the response. I don't know why you consider it a "bad pattern"
> as that's what XSL is all about. The XSLT processing model doesn't
> discourage nontrivial match patterns although it's certainly possible that
> the MS implementation does.[/color]

Well, it doesn't say it directly, but from the decription of the
processing model "A node is processed by finding all the template rules
with patterns that match the node, and choosing the best amongst them"
it's easy to see that having complex patterns is the way to slow down
the transformation. That's pretty valid pattern though, but quite
ineffective. Why not apply templates to only nodes you are interested to
process?
[color=blue]
> Certainly it's a nontrivial pattern as are many
> others in my application; however it is the only one that causes the .Net
> MSXML to hang/nonterminate.[/color]

..NET or MSXML? That's definitely a bug anyway, provide a repro please.
[color=blue]
> As a sanity check I ran this particular template against Instant Saxon and
> it completes in less than a minute -- so clearly it's not a limitation of
> the language, but of the implementation.[/color]

Which version of .NET you are using? Try the latest one - there were
solid improvements in the resent release.
[color=blue]
> I also moved the predicates into
> the template as you suggested and it still hangs in .Net.[/color]

So that's not the reason. It should be something else.
[color=blue]
> If you'd like
> I'll let you know when it gets posted.[/color]

Yeah, that would be interesting.

--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com
Bill Cohagan
Guest
 
Posts: n/a
#7: Nov 12 '05

re: Matching preceding-sibling axis is SLOWWWW!


Oleg
See responses below

Thanks,
Bill
"Oleg Tkachenko [MVP]" <oleg@NO!SPAM!PLEASEtkachenko.com> wrote in message
news:uAWLq6nXEHA.1652@TK2MSFTNGP09.phx.gbl...[color=blue]
> Bill Cohagan wrote:
>[color=green]
> > Thanks for the response. I don't know why you consider it a "bad[/color][/color]
pattern"[color=blue][color=green]
> > as that's what XSL is all about. The XSLT processing model doesn't
> > discourage nontrivial match patterns although it's certainly possible[/color][/color]
that[color=blue][color=green]
> > the MS implementation does.[/color]
>
> Well, it doesn't say it directly, but from the decription of the
> processing model "A node is processed by finding all the template rules
> with patterns that match the node, and choosing the best amongst them"
> it's easy to see that having complex patterns is the way to slow down
> the transformation. That's pretty valid pattern though, but quite
> ineffective. Why not apply templates to only nodes you are interested to
> process?
>[/color]
[Bill] I *am* applying templates to the nodes I want to process. Please
clarify what you mean by this.
[color=blue][color=green]
> > Certainly it's a nontrivial pattern as are many
> > others in my application; however it is the only one that causes the[/color][/color]
..Net[color=blue][color=green]
> > MSXML to hang/nonterminate.[/color]
>
> .NET or MSXML? That's definitely a bug anyway, provide a repro please.[/color]

[Bill] I'm using the .Net Framework 1.1, so whatever the XML/XSL engine is
in that release. I say "hang" because it's never finished, although
certainly it might if I give it a few days (or weeks or ...). I can email
you a repro if you'd like. What email address should I use?
[color=blue]
>[color=green]
> > As a sanity check I ran this particular template against Instant Saxon[/color][/color]
and[color=blue][color=green]
> > it completes in less than a minute -- so clearly it's not a limitation[/color][/color]
of[color=blue][color=green]
> > the language, but of the implementation.[/color]
>
> Which version of .NET you are using? Try the latest one - there were
> solid improvements in the resent release.
>[color=green]
> > I also moved the predicates into
> > the template as you suggested and it still hangs in .Net.[/color]
>
> So that's not the reason. It should be something else.[/color]

[Bill] Perhaps it is. As time allows I'll try to isolate the problem by
trimming the input. Perhapse it's another bug rather than just a performance
issue.[color=blue]
>[color=green]
> > If you'd like
> > I'll let you know when it gets posted.[/color]
>
> Yeah, that would be interesting.[/color]

[Bill] Will do. Meanwhile if you google Cohagan XSL Key (looking at groups)
you should see some of the past issues I've encountered.[color=blue]
>
> --
> Oleg Tkachenko [XML MVP]
> http://blog.tkachenko.com[/color]


Oleg Tkachenko [MVP]
Guest
 
Posts: n/a
#8: Nov 12 '05

re: Matching preceding-sibling axis is SLOWWWW!


Bill Cohagan wrote:
[color=blue]
> [Bill] I *am* applying templates to the nodes I want to process. Please
> clarify what you mean by this.[/color]

Well, I meant moving complexity to <xsl:apply-templates> to simplify
template matching:

<xsl:apply-templates
select="w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@w:type
= 'Word.Bookmark.Start']]"/>
....

<xsl:template match="w:p">

instead of

<xsl:apply-templates/>

....

<xsl:template match =
"w:p[ancestor::w:body][preceding-sibling::*[1][self::aml:annotation/@w:type
= 'Word.Bookmark.Start']]">

In the former case the selection is evaluated only once, while in former
case it evaluates on each node being matched durring the
transformation (potentially hundreds or thousands times).
[color=blue]
> [Bill] I'm using the .Net Framework 1.1, so whatever the XML/XSL engine is
> in that release. I say "hang" because it's never finished, although
> certainly it might if I give it a few days (or weeks or ...). I can email
> you a repro if you'd like. What email address should I use?[/color]

Just remove NO!SPAM!PLEASE from this post address.

--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com
Closed Thread