Lennart wrote:
[snip]
i.e. a line that ends with stmt terminator. To me it looks like a bug
in the lexical parser, but I'm not sure how it is supposed to work.
I agree this definitely looks like a bug. However, the CLP parser is a
strange beastie to say the least. Before I veer off on a long tangent
about just how weird the CLP's parser is, here's a suggestion for the
OP:
While I doubt you'll be able to work around this "properly" in the CLP,
you shouldn't have any trouble with statements including multi-line
strings with semi-colons at the end in other interfaces, e.g. ODBC,
JDBC, CLI, etc. especially as these interfaces can only execute a
single statement at a time and have no concept of a statement
terminator (the terminator is simply the end of the string containing
the statement).
For example:
$ db2 "CREATE TABLE TEST (S VARCHAR(1000) NOT NULL)"
DB20000I The SQL command completed successfully.
$ python
Python 2.5.1 (r251:54863, Oct 5 2007, 13:50:07)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.
>>import DB2
conn = DB2.connect('SA MPLE')
cur = conn.cursor()
cur.execute(" ""INSERT INTO TEST (S) VALUES ('function
fakeFunction() {
.... doSomething1();
.... doSomething2();
.... }')""")
1
>>conn.commit ()
cur.close()
conn.close( )
^D
$ db2 "select * from test2"
S
----------------------------------------------------------------...
function fakeFunction() {
doSomething1();
doSomething2();
}
1 record(s) selected.
So ... the weirdness of the CLP's parser ...
A while back I wrote a parser for DB2's SQL dialect, i.e. just the
statements listed under the SQL reference in the InfoCenter. This went
reasonably smoothly, with no surprises (except the sheer size of the
damned thing). Then I wanted to add the CLP commands to the parser (the
statements listed under the Commands topic of the InfoCenter before
they got moved somewhere in 9.5). This turned out to be nightmarishly
difficult as the CLP commands have fundamentally different syntax.
Consider the EXPORT command. You can use any of the following
legitimately with it:
EXPORT TO filename.ixf OF IXF ...
EXPORT TO 'filename.ixf' OF IXF ...
EXPORT TO "filename.i xf" OF IXF ...
EXPORT TO C:\filename.ixf OF IXF ...
So, what's the syntax? In SQL terms, the filename can be a string
literal, a quoted identifier, an unquoted identifier, or a sequence of
unquoted identifiers, operators (.), and characters that are generally
illegal in SQL (: and \). If you think that's weird, try the following:
$ db2 -t "EXPORT TO ;.IXF OF IXF SELECT * FROM SOMETABLE;"
It shouldn't work. The EXPORT command should be terminated at the first
semi-colon. But it /does/ work (and has for some time now - since v7 at
least, I think). You can get a hint as to how the CLP is parsing stuff
by breaking the statement above (remove the TO)
$ db2 -t "EXPORT ;.IXF OF IXF..."
SQL0104N An unexpected token ";.IXF" was found following "EXPORT".
Expected tokens may include: "TO". SQLSTATE=42601
So ;.IXF is considered a single token. For another hint, stick a space
in the filename and quote it:
$ db2 -t "EXPORT '; .IXF' OF IXF..."
SQL0104N An unexpected token "'; .IXF'" was found following "EXPORT".
Expected tokens may include: "TO". SQLSTATE=42601
So '; .IXF' is also a single token. It looks like the CLP tokenizer is
just doing whitespace splitting, with some extra logic to handle quoted
stuff. I wouldn't be too surprised if the CLP used a regex similar to
the following to tokenize its input:
("([^"]|"")*"|'([^']|'')*'|[^ ]+)
Given this, how does it recognize a statement terminator? Are
terminators at the end of a token are counted?
$ db2 -t "EXPORT TO IXF; OF IXF SELECT * FROM TEST;"
No - that works just fine (exports to a file called "IXF;"). Are
terminators at the end of a statement counted? (i.e. where a parser
would "expect" to find them):
$ db2 -t "EXPORT TO TEST.IXF OF IXF SELECT * FROM TEST;IMPORT FROM
TEST.IXF OF IXF REPLACE INTO TEST;"
No - this fails complaining about the table name ("TEST;IMPOR T") being
invalid. Given the ambiguities of SELECT in SQL, I tried reversing the
statments above, and again it complains (in a slightly different
manner) that TEST;EXPORT is an invalid table name. Adding spaces after
the terminators makes no difference either (well, it changes the errors
but they still aren't correctly recognized as terminators).
Basically, terminators are recognized if and only if they occur at the
end of a line, so there's some line-wise logic in there too. I'm quite
surprised that multi-line strings containing semi-colon at the end of a
line within the string worked in v7. Of all the ways I can imagine the
CLP's parser being written (given the behaviour above), a change that
broke this functionality is not a subtle or simple one - I /think/ it
would have to involve the removal of quite a bit of code.
At the end of the day, I would have much preferred it if the CLP had a
"real" parser, even if that meant that things like filenames (or
usernames or passwords) had to be specified as string literals. The
language would be more strict, but at least edge cases like multi-line
strings with terminators, or filenames containing terminators, or
multiple statements on a line, would behave predictably. Unfortunately
I suspect the demands of backward compatibility mean it's unlike to
happen.
Still, I'd also recommend opening a PMR for this issue. It should be
made to work (even if it does mean the CLP's "parser" gets even more
weird ;-).
Cheers,
Dave.