Barry Kelly wrote:
Frans Bouma [C# MVP] wrote:
A generic
expression parser is actually not that useful. All it can do is tell
you what kind of node is found at a given position: it doesn't do
anything for you.
Uh, that's exactly what an expression parser is useful for: turning
text into a tree. And if the tree matches what the C# Linq expression
parsing does, then the language is the same as the one defined by C#.
Though, then you're re-doing what's already in C#. The thing is,
creating the tree isn't trivial either. It's not a simple parser which
builds a tree, as the query very likely refers to elements used IN the
code.
If this quest for a parser is to get dynamic queries working (queries
build up at runtime in a search for for example where based on
selections of the user various query fragments are added), there are
workarounds for this using C# without the necessity of looking at the
expression tree.
The tough element is to DO something useful AFTER a
given node or treebranch has been examined and understood.
Uh, no. The whole point of the discussion is the turning of text into
an Expression tree. Once you've done that, you're in the same
position as you would have been with writing a C# method taking an
Expression<Func<>>. You can call Compile() on it if you like, or
otherwise hand it off to your database engine - and do this at run
time, not compile time, as is the constraint with text in C# (and see
later for a counterpoint to your "100 lines" dismissal).
You can generate C# code using a template, compile it and run it in
less than 100 lines. This means that if you formulate your queries C#
code and store them as text in a file, you can load them one by one,
generate C# code at runtime at startup, which wraps these queries with
a method so you get back a IQueryable and you can use them at runtime.
No need for a parser and this isn't slow either.
The interesting bit (for the poster, as far as I can see) is the bit
*before* this, while you seem fixated on the bit after this. Both are
interesting; but you seem to be dismissing one out of hand.
Well, I've written a couple of parsers in the last few years (one
lr(n) parser generator and a couple of ll(n) ones) and although it's
interesting to write a parser from a geek standpoint, I also learned
it's also a waste of time if you can avoid it and leverage code which
is already there. Furthermore, the reason why expression trees aren't
serializable is IMHO also the reason why it's very hard to build a
USABLE expression tree from an external source which has to be working
with elements in the code you're executing.
IMHO the thing is: Linq queries aren't a separated entity in the code
which embeds them: they are connected with the code which embeds them.
This means that although you can separate them out in theory, in
practise this won't lead to anything, simply because the interesting
thing isn't the expression nor the tree, it's the consumer of the
expression tree. That consumer can't work with a tree which is created
in a void, it has to work with a tree which connects to the code
executing or at least be able to. (filter on a value of a variable for
example).
The question asked if there is a parser for a linq expression in
string format, is nice, but what I wonder is what's the bottom line
reason why this parser is absolutely needed? These strings don't fall
out of the sky and with a tree alone you don't get very far so there's
apparently an expression tree consuming object which has to be fed by
queries which can't be created by code... I then would like to know
what that scenario is. I saw people mentioning 'configuration', but I
don't understand what that has to do with things. Unless people want to
write out their linq queries in string form in a file (then you really
must hate your life), I fail to see why it can be of any value (and
even then).
If you then need to build queries at runtime, you indeed are out of
luck UNLESS you build the expression tree manually.
I don't know how you reach this conclusion. A parser builds the
expression tree automatically - that's the whole point, so that you
don't need to put together your expression tree manually to build a
query at runtime.
why would you want to build a query at runtime which can't be created
through code? It's not the parser->parse tree conversion, it's the
parser -parse tree -Linq expression tree conversion that's the
trouble: the linq expression tree has to refer to elements in the code,
or at least has to be able to. They can't serialize it to disk in any
format because of this, so re-building it at runtime would be a problem
as well.
My point is two fold:
- just because something is cool, doesnt make it worth looking into.
This is a general thing but it blurrs what's really important.
- the long road from expression tree to string to expression tree is
IMHO odd and not doable as it IS serialization/deserialization of the
tree, which isn't possible due to the nature of the tree and also IMHO
silly as you should simply create general code which builds the
expression tree with Linq based on the input. That would solve you from
having to write a parser and also would make you avoid the misery how
to tie up the expression tree with the embedding code.
But maybe I miss a very obvious use-case of string-based queries
here... String based queries in general suck bigtime (checking of
validity, maintainability), unless they're checked at compile time, and
any way to avoid them is prefered. The only way it works is when the
queries are checked at compile time so you can maintain them properly
and won't run into surprises at runtime.
That's also why I don't understand why you want to use strings
which are outputted from the expression tree in the first place:
you already have the tree!
I think you've got this backward; someone simply pointed out that
ToString() on an expression tree printed out text in what appeared to
be a canonical language for .NET expression trees. The natural
question then is: is there a parser for this language in the .NET 3.5
framework? (There doesn't appear to be.)
It isn't that someone is going to try and round-trip their tree
through ToString() and back again with a parser. That would be
completely useless, and attacking that is attacking a complete straw
man. The reference to ToString() is simply pointing out that there is
a language implicitly defined by the result of this method, so the
natural question is - where is the parser for this language?
err, why would someone need a parser for that if it ISN'T about
roundtripping? My whole understanding about needing the parser is for
roundtripping of the tostring output back to an expression tree, which
as you say, is useless.
If it's about 'I need to write my queries as strings', it's also
useless: queries which aren't checked at compile time (or at least
partly checked) are time consuming and hard to maintain.
Sure, if it was possible to have a DSL in some string format which is
embeddable at runtime into the running IL and could work together with
that IL, why not. Unfortunately, that's not the case, there's no
context switcher available for you, so both sides (C# and DSL) have to
know of one another and how to interact with the other. I don't think
that's doable with a simple parser which builds an expression tree from
a query string, because you don't have a symbol table at hand of the C#
code creating the tree.
Not only that, going from strings to expression tree
with a parser effectively re-implements what's already in the C#
compiler.
Yes, but the C# compiler is rather heavyweight and clumsy, lacking
context (definitions that exist at runtime and not necessarily in any
static assembly) and nice error reporting, and resulting in compiled
assemblies on disk - not nearly as light-weight as things like
Expression trees and DynamicMethods. Calling out to C# will end up in
producing an assembly, which you're either going to have to load (and
never release) in the current AppDomain, or awkwardly and indirectly
load at arms-length in another AppDomain, all to get a simple
Expression tree that should be available with a simple 'Expression
Parse(string text);' method.
For a single string I wouldn't go for this but then I also wouldn't
write a complete parser for this. For a lot of strings (think iBatis
with linq queries) you can do this rather easily with an in-memory
assembly compiled at runtime. Sure that's part of your appdomain but
the queries are needed at runtime in your app so keeping the assembly
around isn't a big deal.
I really fail to see why the effort is needed or even
'powerful' or useful.
Please give me a use-case scenario, as things
from configuration is too vague: what are you going to do with the
data from the config file? Why is the strings -expression tree
the only real solution to your situation?
I once worked on a system that used a lightweight databinding language
for evaluating GUI properties, business properties and data-level
constraints (among other things), where the source texts of this
language were stored in a metadata system. Each individual piece of
source code averaged no more than 30 characters long or so.
Collectively, there were more than 10,000 snippets of this source code
in a medium-size financial services (insurance) data entry
application. The system compiled these snippets at runtime, and
could load new metadata as the system was running, and transfer over
to it, without rebooting. Metadata could churn indefinitely.
Using the C# compiler for this would have been deeply painful and
slow.
the C# compiler is just used to compile the code. In the end IMHO it
doesnt' make a difference as the text has to be parsed anyway, which
IMHO is done pretty quickly by the C# compiler.
That aside, what's the real problem here I think is that a DSL is
tried to be embedded into C# at runtime, isn't this what's going on
here?
In _THAT_ particular situation, in theory I'm all for it. The sad
thing is though, there's no context switching done for you when you
move from C# space to the DSL space and back (the DSL here being the
expression tree as string).
Your example is also about a DSL which is embed in C# code and likely
a separate governing system is used to make the switch between C# and
DSL and back so you can refer to elements in the C# code in your DSL
and get results back from your DSL in C# code.
So to make this work for expression strings, this governing system has
to be written as well, and therefore your C# code has to be prepared
for it, as it is IMHO a tough struggle to get random expression strings
from disk and make them be able to refer to the elements in the
embedding code and vice versa. Because _THAT_'s what's the problem.
Parsing a string isn't that hard, it's what you want to do with the
tree you get after the parse, as it's IMHO tough to do anything with it
if it can't refer to anything on the outside. Some queries fit that
description, but often they do not.
Besides that, generating a piece of C# with the linq queries,
compile it and run it at runtime takes 100 lines of code max.
10000 snippets of code = 10000 assemblies that need loading. Sure,
there are workarounds via AppDomains and bunching texts etc., but
then you've moved into far more complex territory than what ought to
be a simple one-liner - 'Expression Parse(string)'.
well, reading the 10000 snippets is likely taking some time but you've
to do that anyway. You need 2 steps if you want to do this templated:
first load the template, then compile the template into an assembly (or
do this once and re-use the assembly), run the template which consumes
the snippets and generate C# code in-memory, then compile that C# code
and you have an assembly with classes with methods you can call to get
expression trees from. Complex? Not really, it's very easy to get this
running. Most code of these kind of code-generators is about formatting
output. That's not an issue here so the code you've to write is very
simple.
It's to illustrate that there is a way to do this already, and you can
use C# or VB.NET for the strings. The fun part of this approach is also
that you can make the template (which is also written in C#) be more
clever and for example emit methods which take parameters so you CAN
link embedding code with the linq queries (filter all orders on the id
of the customer which is selected in the grid).
A one-liner 'Expression.Parse(string)' looks great, but it's not going
to work. Where do you fix up the tree so it refers to objects,
properties and variables currently in scope?
It doesn't sound as though anybody is aware of anything... maybe
I'll see what I can cobble together with expression trees. Pity.
There are some blank spots here and there, but the overall system
isn't that obscure.
What bugs me is that I still have no clue what you're trying to
achieve.
Hopefully I've helped - the need is very, very clear to me. But then I
write compilers for a living.
As a person who writes query consuming code for a living, I do
understand the necessity for dynamic queries build at runtime, but
using strings for that isn't the answer, as it's unmaintainable and
error prone.
FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website:
http://www.llblgen.com
My .NET blog:
http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------