470,644 Members | 1,202 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,644 developers. It's quick & easy.

Developing a Wrapper

I am trying to write a Wrapper for our web wrapping engine, Cameleon.
Currently Cameleon is able to answer certain SQL queries but with a
restriction that all SQL queries must have a predicate. This is very
similar to the BLAST wrapper. For example, Cameleon can answer queries
like this by accessing Yahoo's IBM finance webpage

Select Headlines, LastTrade From yahoo where Ticker='IBM'

However, this query is meaningless if the predicate "Ticker" is
absent. Cameleon works by constructing the SQL into an URL and feed it
to the cameleon engine in the form of

http://interchange.mit.edu/cameleon_...r=&debug=false

Notice the SQL query is attached at the end of the URL. I have learned
to reconstruct some SQL queries from the request object that DB2 passed
to the wrapper.

In the current stage, my Cameleon wrapper can answer simple queries
like the above. However, I have trouble answering complex queries.
Below are some examples.

A query that works
- select yahoo.lasttrade from ("select companyticker from companytable
where industry='Biotechnology' and companyticker = 'ACAD'") AS x, yahoo
where yahoo.ticker = x.companyticker"

This query is answerable by the wrapper because DB2 decomposes the
query into two query requests objects and they are
1. select companyticker from companytable where
industry='Biotechnology' and companyticker ='ABGX'
2. select lasttrade from yahoo where ticker ='ABGX'

These individual queries can be answered by the wrapper.

A query that doesn't work
However with a slight modification(replacing the = operator with <),
the query is no longer answerable.
∑ select yahoo.lasttrade from ("select companyticker from
companytable where industry='Biotechnology' and companyticker <
'ACAD'") AS x, yahoo where yahoo.ticker = x.companyticker"

In this case, the query is also decomposed into two request objects.
1. select companyticker from companytable where
industry='Biotechnology' and companyticker < 'ACAD'
2. select yahoo.lasttrade from yahoo where yahoo.ticker =
The request object dump for this query fragment is:

..SELECT (LASTTRADE), (TICKER).FROM LYNNWU .YAHOO.WHERE ([<operator
kind>=[<column kind>TICKER][<unbound kind>]] ).

According to the request object, yahoo.ticker is an unbounded operator.
Obviously this query is not really a valid SQL so the wrapper can not
answer it...
My question is
Why is DB2 giving me an strange request object that can not be
reconstructed into a functional SQL? How can the wrapper be expected to
answer this kind of request? In the simple fileWrapper (given as a
sample), the wrapper simply ignored all the predicates request and
returned all the data. But in my case, I need the predicate to get any
result. How do I deal with this situation? I would like to answer such
query if possible.
2nd problem
How to handle AND/OR
∑ select Headlines, LastTrade From Yahoo where Ticker='JDSU' OR
Ticker='IBM'

The request object dump is
..SELECT (LASTTRADE), (HEADLINES) , (TICKER).FROM LYNNWU
..YAHOO.WHERE ([<operator kind>SINLIST[<column kind>TICKER][<constant
kind>JDSU][<constant kind>IBM ]]).

Cameleon engine can, however, answer parts of the query.
- select Headlines, LastTrade From Yahoo where Ticker='JDSU'
- select Headlines, LastTrade From Yahoo where Ticker='IBM
Because our engine can not handle multiple predicates such as IN, AND
and OR, I would like DB2 to send two separate request instead of using
SINLIST. Is that possible? That way I could handle the two sub queries
individually?

Feb 1 '06 #1
22 2331
li*****@gmail.com wrote:
A query that doesn't work
However with a slight modification(replacing the = operator with <),
the query is no longer answerable.
· select yahoo.lasttrade from ("select companyticker from
companytable where industry='Biotechnology' and companyticker <
'ACAD'") AS x, yahoo where yahoo.ticker = x.companyticker"

In this case, the query is also decomposed into two request objects.
1. select companyticker from companytable where
industry='Biotechnology' and companyticker < 'ACAD'
2. select yahoo.lasttrade from yahoo where yahoo.ticker =
The request object dump for this query fragment is:

.SELECT (LASTTRADE), (TICKER).FROM LYNNWU .YAHOO.WHERE ([<operator
kind>=[<column kind>TICKER][<unbound kind>]] ).

According to the request object, yahoo.ticker is an unbounded operator.
Obviously this query is not really a valid SQL so the wrapper can not
answer it...
I can't say why you have the "<unbound kind>" there and or if this is wrong.
But I would suspect that there is something more in the request object,
which should give you the information about the comparison.
My question is
Why is DB2 giving me an strange request object that can not be
reconstructed into a functional SQL? How can the wrapper be expected to
answer this kind of request? In the simple fileWrapper (given as a
sample), the wrapper simply ignored all the predicates request and
returned all the data. But in my case, I need the predicate to get any
result. How do I deal with this situation? I would like to answer such
query if possible.
Any operator that you ignore is compensated by DB2. Thus, if you ignore the
part with the "<unbound kind>", DB2 will do the comparison and filtering
for you. You just have to return the more complex result set.
2nd problem
How to handle AND/OR
· select Headlines, LastTrade From Yahoo where Ticker='JDSU' OR
Ticker='IBM'

The request object dump is
.SELECT (LASTTRADE), (HEADLINES) , (TICKER).FROM LYNNWU
.YAHOO.WHERE ([<operator kind>SINLIST[<column kind>TICKER][<constant
kind>JDSU][<constant kind>IBM ]]).

Cameleon engine can, however, answer parts of the query.
- select Headlines, LastTrade From Yahoo where Ticker='JDSU'
- select Headlines, LastTrade From Yahoo where Ticker='IBM
Because our engine can not handle multiple predicates such as IN, AND
and OR, I would like DB2 to send two separate request instead of using
SINLIST. Is that possible? That way I could handle the two sub queries
individually?


Then just don't put the OR (or whatever you can't handle) in the response
object. DB2 will see that you can't deal with it and break up the plan or
compensate the not-available functionality.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 1 '06 #2
According to the request-reply-compensate protocol, we are allowed to
accept or reject predicates. In my example, predicates are joined by
OR, and DB2 convert the two predicates into a single predicate using
SINLIST.

WHERE ([<operator kind>SINLIST[<column kind>TICKER][<constant
kind>JDSU][<constant kind>IBM ]]).


Based on my understanding, I have to either reject the entire
predicate(the entire SINLIST). But I want is to able to separate the
single predicate and accept a part of the predicate. Specifically
WHERE ticker = 'ibm'
WHERE ticker = 'jdsu'

Is there anyway to do this?

Feb 2 '06 #3
li*****@gmail.com wrote:
According to the request-reply-compensate protocol, we are allowed to
accept or reject predicates. In my example, predicates are joined by
OR, and DB2 convert the two predicates into a single predicate using
SINLIST.

WHERE ([<operator kind>SINLIST[<column kind>TICKER][<constant
kind>JDSU][<constant kind>IBM ]]).

The thing is that DB2 does not simply pass-through the statement as you
typed it in on the command line (or whereever). Instead, it combines the
multiple comparisons that are part of the OR-predicate into a single
predicate that conforms to:

ticker IN ( 'JDSU', 'IBM' )

That's what the SINLIST is all about.
Just to explain this, ff you have a predicate like

col1 = 'ABC' OR col2 = 'DEF'

then the request object will be (the indentation indicates the
tree-structure of the predicates):

Predicates: 1
-------------------------------------------
Expression 1
---------------
kind: OPERATOR
token: OR
---------------
Expression 1
---------------
kind: OPERATOR
token: =
---------------
Expression 1
---------------
kind: COLUMN
column name: COL1
---------------
Expression 2
---------------
kind: CONSTANT
data: ABC
---------------
Expression 2
---------------
kind: OPERATOR
token: =
---------------
Expression 1
---------------
kind: COLUMN
column name: COL2
---------------
Expression 2
---------------
kind: CONSTANT
data: DEF

Based on my understanding, I have to either reject the entire
predicate(the entire SINLIST). But I want is to able to separate the
single predicate and accept a part of the predicate. Specifically
WHERE ticker = 'ibm'
WHERE ticker = 'jdsu'

Is there anyway to do this?


I'm not quite sure I understand what you want to have, i.e. what is to be
sent to the remote data source. Because, if you only send the "ticker =
'JDSU'" on, then you won't get the results for "ticker = 'IBM'". So your
result would be incomplete. So what do you want to see in fenced
server/query object as the statement to be processed/sent on to the data
source?

Note that you could always reject the complete SINLIST predicate but store
the information about the OR-predicate in the response-object as part of
the execution descriptor. Then you have the predicate available during
run-time and can do whatever you want, for example send to separate
requests to the data source and return all rows.

I haven't tried this yet, but you might be able to add just one of the
predicates into the reply object. I don't know what DB2 will do then.
I'll give this a try tomorrow.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 2 '06 #4
Hi Knut,

I am also working with Lin on this.
Is there anyway to do this?


I'm not quite sure I understand what you want to have, i.e. what is to be
sent to the remote data source. Because, if you only send the "ticker =
'JDSU'" on, then you won't get the results for "ticker = 'IBM'". So your
result would be incomplete. So what do you want to see in fenced
server/query object as the statement to be processed/sent on to the data
source?


The remote data source is incapable of answering a query like
'select...where ticker="IBM" or ticker="JDSU"'
but can answer
'select ...where ticker="IBM"'
'select ...where ticker="JDSU"' one by one (even in parallel).

We would like DB2 to compensate for this limitation. Do you know if
this is possible?

If it is not possible, does that mean that DB2 expects remote data
sources to be able to answer OR type of queries at minimum.

Thanks.

Feb 3 '06 #5
Hi Knut,

Continuing on the first question regarding <unbounded kind> problem
encountered in this request object:

..SELECT (LASTTRADE), (TICKER).FROM LYNNWU .YAHOO.WHERE ([<operator
kind>=[<column kind>TICKER][<unbound kind>]] ).

The above dump actually came from the printRequest method in the
FileWrapper sample. So I assume it is printing everything knows about
the request object.... so do you have a suggestion what other
information the request object could contain?

Also you suggest that I ignore the <unkind bound> part, but I still
won't be able to answer this query from our native engine. This is
because the native source can not answer any part of the request
because the request is incomplete (our native source requires a
predicate and the predicate here is incomplete) So in this case, should
I just return a null reply, indicating that it can not answer any part
of the request? How do you handle this situation that no replies can be
generated?

Feb 3 '06 #6
Hi Knut,

I tried to return a null reply to the request
..SELECT (LASTTRADE), (TICKER).FROM LYNNWU .YAHOO.WHERE ([<operator
kind>=[<column kind>TICKER][<unbound kind>]] ).

and I got this error
SQL0901N The SQL statement failed because of a non-severe system
error.
Subsequent SQL statements can be processed. (Reason
"sqlno_crule_save_plans
[100]:rc( 0) ".) SQLSTATE=58004

It seems like that DB2 can not return any alternative plan, and
therefore I still need to process this request as much as I could... I
read in the API that RemoteQuery.getInputData() method returns a
reference to a RuntimeDataList that describes the type and location of
values to be bound to parameter markers (if any) in the SQL statement.
I was wondering if this is indeed the object that contains the
parameters to the unbounded parameter TICKER, since according to the
API, if a RequestExp object is an unbound parameter, its value is
unknown until the run-time phase when the federated server passes the
value to the wrapper... So I am assuming that the values returned from
the first request:

"select companyticker from companytable where
industry='Biotechnology' and companyticker < 'ACAD" ---(say this
returns 'ABCD')

will be used to bound the unbounded parameter TICKER at the runtime?

I tried to get values from RemoteQuery.getInputData() object but it is
null so far for this query. If I were able to retrieve the values from
the first request (value 'ABCD) then i can construct a query that will
answer the request

..SELECT (LASTTRADE), (TICKER).FROM LYNNWU .YAHOO.WHERE ([<operator
kind>=[<column kind>TICKER][<unbound kind>]] )

by modifying the query to be

..SELECT (LASTTRADE), (TICKER).FROM LYNNWU .YAHOO.WHERE ([<operator
kind>=[<column kind>TICKER][<constant>'ABCD']])

Do you know a way to get value('ABCD') of the unbounded parameter
(TICKER) at runtime when i am executing the query in the subclass of
RemoteQuery.java? Or is this at all possible?

Thank you so much for all your help

Feb 3 '06 #7
aykut firat wrote:
Hi Knut,

I am also working with Lin on this.
> Is there anyway to do this?
I'm not quite sure I understand what you want to have, i.e. what is to be
sent to the remote data source. Because, if you only send the "ticker =
'JDSU'" on, then you won't get the results for "ticker = 'IBM'". So your
result would be incomplete. So what do you want to see in fenced
server/query object as the statement to be processed/sent on to the data
source?


The remote data source is incapable of answering a query like
'select...where ticker="IBM" or ticker="JDSU"'
but can answer
'select ...where ticker="IBM"'
'select ...where ticker="JDSU"' one by one (even in parallel).

We would like DB2 to compensate for this limitation. Do you know if
this is possible?


Yes, of course it is.

(1) You reject the SINLIST predicate completely. That will increase the
amount of data shuffled around between DB2 and the data source - this is
probably something you want to avoid.

(2) You accept the SINLIST predicate as it is. Then you use the execution
descriptor to pass whatever information you need during run-time to the
remote-query object.
The remote-query object will prepare the two queries and send them to the
data source. The results from both are collected (in parallel or
sequentially) and returned to DB2. Pretty much as if the following
statement would be processed:

SELECT ... FROM ... WHERE ticker = 'IBM'
UNION [ ALL ]
SELECT ... FROM ... WHERE ticker = 'JDSU'

(3) As in (2), but you don't accept the predicate. Then DB2 will filter all
rows returned by you again - for whatever that might be good for.
If it is not possible, does that mean that DB2 expects remote data
sources to be able to answer OR type of queries at minimum.


The expectations on the remote data source is pretty much just that it can
return some data. Everything else is a bonus.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 3 '06 #8
li*****@gmail.com wrote:
Hi Knut,

I tried to return a null reply to the request
.SELECT (LASTTRADE), (TICKER).FROM LYNNWU .YAHOO.WHERE ([<operator
kind>=[<column kind>TICKER][<unbound kind>]] ).

and I got this error
SQL0901N The SQL statement failed because of a non-severe system
error.
Subsequent SQL statements can be processed. (Reason
"sqlno_crule_save_plans
[100]:rc( 0) ".) SQLSTATE=58004

It seems like that DB2 can not return any alternative plan, and
therefore I still need to process this request as much as I could... I
read in the API that RemoteQuery.getInputData() method returns a
reference to a RuntimeDataList that describes the type and location of
values to be bound to parameter markers (if any) in the SQL statement.
I was wondering if this is indeed the object that contains the
parameters to the unbounded parameter TICKER, since according to the
API, if a RequestExp object is an unbound parameter, its value is
unknown until the run-time phase when the federated server passes the
value to the wrapper... So I am assuming that the values returned from
the first request:

"select companyticker from companytable where
industry='Biotechnology' and companyticker < 'ACAD" ---(say this
returns 'ABCD')

will be used to bound the unbounded parameter TICKER at the runtime?
Correct. DB2 will most probably do a nested loop join. The outer loop is
for "select companyticker from companytable where industry='Biotechnology'
and companyticker < 'ACAD'". The results returned by that are used
row-by-row to bind the unbound-parameter for the comparison operator above.

So you could simply accept the predicate as it is and during run-time you
can complete the statement by "binding" the missing value and then pass the
request on to the data source.

Note that "TICKER" is _not_ the unbound parameter. Instead, "TICKER" is the
first parameter for the "=" operator and TICKER is of type column. The
second parameter for the "=" operator is the one that is not bound during
compile-time and will only by available at run-time.
I tried to get values from RemoteQuery.getInputData() object but it is
null so far for this query. If I were able to retrieve the values from
the first request (value 'ABCD) then i can construct a query that will
answer the request

.SELECT (LASTTRADE), (TICKER).FROM LYNNWU .YAHOO.WHERE ([<operator
kind>=[<column kind>TICKER][<unbound kind>]] )

by modifying the query to be

.SELECT (LASTTRADE), (TICKER).FROM LYNNWU .YAHOO.WHERE ([<operator
kind>=[<column kind>TICKER][<constant>'ABCD']])
Yes, but that modification must be done at run-time and not at compile-time.
Do you know a way to get value('ABCD') of the unbounded parameter
(TICKER) at runtime when i am executing the query in the subclass of
RemoteQuery.java? Or is this at all possible?


At run-time, DB2 will actually have your wrapper create two separate
RemoteQuery objects. Both will be used in parallel. The first is for the
first query. It returns a row and then the second RemoteQuery object is
told to get its rows and the value for the unbound parameter is provided
now. This allows you to complete the SQL statement that your data source
has to handle.

Once all rows are returned by the 2nd RemoteQuery object, DB2 will get back
to the 1st object, request the next row and then talk again to the 2nd
RemoteQuery object to fetch all rows with the new value for the "unbound"
parameter. And so on and so forth...

This is just a traditional nested loop join.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 3 '06 #9
Knut Stolze wrote:
li*****@gmail.com wrote:
Hi Knut,

I tried to return a null reply to the request
.SELECT (LASTTRADE), (TICKER).FROM LYNNWU .YAHOO.WHERE ([<operator
kind>=[<column kind>TICKER][<unbound kind>]] ).

You said that your data source requires you to provide a predicate, is that
right? If so, you could possible do a "table scan" with predicates like "1
= 1" or "column = column"? Just an idea for the situation that there is no
predicate that you could push down.
Once all rows are returned by the 2nd RemoteQuery object, DB2 will get
back to the 1st object, request the next row and then talk again to the
2nd RemoteQuery object to fetch all rows with the new value for the
"unbound" parameter. And so on and so forth...


It might be that DB2 does not reuse the 2nd RemoteQuery object and
constructs a new one for each nested scan. I do not (yet) know the details
in this respect.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 3 '06 #10
> (2) You accept the SINLIST predicate as it is. Then you use the execution
descriptor to pass whatever information you need during run-time to the
remote-query object.
The remote-query object will prepare the two queries and send them to the
data source. The results from both are collected (in parallel or
sequentially) and returned to DB2. Pretty much as if the following
statement would be processed:

SELECT ... FROM ... WHERE ticker = 'IBM'
UNION [ ALL ]
SELECT ... FROM ... WHERE ticker = 'JDSU'


I am a bit confused over how remote-query object prepare the two
queries; do you mean it will create it them automatically? From what I
understand, remote-query uses execution descriptor to issue requests to
the native data source... In my case I would store the head expression,
nicknames and the SINLIST(IBM, JDSU) in my execution descritpor. I am
assuming at the runtime, I can use only one element in the SINLIST to
query the native data source depending on which query was asked. But in
the remote-query object, I don't see a way to see what the actual query
is.... Is there a way to see the query in the remote-query object?

Feb 3 '06 #11
li*****@gmail.com wrote:
(2) You accept the SINLIST predicate as it is. Then you use the
execution descriptor to pass whatever information you need during
run-time to the remote-query object.
The remote-query object will prepare the two queries and send them to the
data source. The results from both are collected (in parallel or
sequentially) and returned to DB2. Pretty much as if the following
statement would be processed:

SELECT ... FROM ... WHERE ticker = 'IBM'
UNION [ ALL ]
SELECT ... FROM ... WHERE ticker = 'JDSU'
I am a bit confused over how remote-query object prepare the two
queries; do you mean it will create it them automatically? From what I
understand, remote-query uses execution descriptor to issue requests to
the native data source... In my case I would store the head expression,
nicknames and the SINLIST(IBM, JDSU) in my execution descritpor.


You are correct.
I am
assuming at the runtime, I can use only one element in the SINLIST to
query the native data source depending on which query was asked.
This restriction comes from your data source not being able to handle
multiple predicates combined with OR. At least that's how I understand it.
But in
the remote-query object, I don't see a way to see what the actual query
is.... Is there a way to see the query in the remote-query object?


What do you need the "actual query" for? You put all information that might
will need during execution time (data source, predicates, head expressions,
whatever, ...) in the exec descriptor. This descriptor is passed from
compile-time to run-time. With that, you have all this stuff available at
run-time and could construct the two queries to be sent to the remote data
source. You don't need to know what the original query was.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 3 '06 #12
Hi Knut,

Is the following an option:

(4) You reject the SINLIST predicate completely, but respond with two
replies: 1) 'select ...where ticker="IBM"'
2) 'select ...where ticker="JDSU"'

Feb 9 '06 #13
aykut firat wrote:
Hi Knut,

Is the following an option:

(4) You reject the SINLIST predicate completely, but respond with two
replies: 1) 'select ...where ticker="IBM"'
2) 'select ...where ticker="JDSU"'


For each request, you can only return a single reply object. The interface
does not allow anything else.

What I don't understand is what you want to do. If you reject the OR
predicate (= SINLIST), then DB2 will do the filtering for you. Regardless
of that, you can of course also do some or all of the filtering at the
foreign server.

My recommendation is still that you accept the SINLIST and at run-time you
(or more) will send two-queries yourself to the remote data source. Then
you will first fetch the rows of one query, followed by the rows of the 2nd
query.

The only thing you will loose is the capability to run the two queries in
parallel or rather to return both result sets in parallel.

What you could do, however, have the user rephrase such queries to use a
UNION operator. Then you get DB2 to generate two different requests and,
thus, have both remote queries be executed in parallel.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 9 '06 #14
Hi Knut,

Thank you for your reply. I read the following at
http://publib.boulder.ibm.com/infoce...d/t0009614.htm

"For a single query, the optimizer typically generates many requests
for each wrapper, each request representing a different fragment of the
original query. For each such request, the wrapper generates zero, one,
or more replies. Each reply represents a different accepted fragment.
An accepted fragment is a fragment the wrapper or data source can
evaluate itself. Each reply contains the associated cost and
cardinality estimates for the accepted fragment."

that is why I thought we could return multiple replies. Furthermore,
there is an example shown with tax and rate, which also indicates that
we can return multiple replies. Am I wrong?

I guess, even if we can do that, we cannot return a predicate that has
NOT been passed to us. Right?

The reason why I am focusing on this issue is to understand to what
extent DB2 really compensates the planning for us. Even though, in this
case the planning is trivial--namely just UNION two queries--we want to
maximally avoid planning done within the wrapper. All we want to put in
the wrapper is what portion of the sent query we can answer. But as in
this case, the expressiveness of the protocol is not rich enough in
some cases.

Feb 9 '06 #15
aykut firat wrote:
Hi Knut,

Thank you for your reply. I read the following at
http://publib.boulder.ibm.com/infoce...d/t0009614.htm
"For a single query, the optimizer typically generates many requests
for each wrapper, each request representing a different fragment of the
original query. For each such request, the wrapper generates zero, one,
or more replies. Each reply represents a different accepted fragment.
An accepted fragment is a fragment the wrapper or data source can
evaluate itself. Each reply contains the associated cost and
cardinality estimates for the accepted fragment."

that is why I thought we could return multiple replies. Furthermore,
there is an example shown with tax and rate, which also indicates that
we can return multiple replies. Am I wrong?
You are correct (and I learned something ;-)) You can "chain" several
replies together by using the Reply::set_next_reply() method. That way you
can provide several alternatives to DB2 where each alternative describes
the specific sub-fragment that the wrapper can handle.
I guess, even if we can do that, we cannot return a predicate that has
NOT been passed to us. Right?
Correct.
The reason why I am focusing on this issue is to understand to what
extent DB2 really compensates the planning for us. Even though, in this
case the planning is trivial--namely just UNION two queries--we want to
maximally avoid planning done within the wrapper.
DB2 takes care of all the planning. You just have to tell DB2 which parts
of a query you can handle in the wrapper/remote data source. And if DB2
won't rewrite the query explicitly to a UNION (ALL), you can't change that.
All we want to put in
the wrapper is what portion of the sent query we can answer.
OK.
But as in
this case, the expressiveness of the protocol is not rich enough in
some cases.


Why do you think so? I still don't understand what you are missing.

Either you can handle OR-predicates or not, And DB2 doesn't care if this is
handled by the remote data source or in the wrapper. If you don't handle
it, DB2 will compensate for it, i.e. just do a table scan and filter
itself. If you can handle it, DB2 takes your results. The "in-between"
would be to tell DB2 you don't handle it but still do some or all of it.
Whether that makes sense or not might heavily depend on the wrapper.

The whole planning is done during query compilation/optimization. The DB2
optimizer might actually rewrite the query to a UNION and ask the wrapper
again for replies to certain (new) query fragments. You can have a greater
influence if you return cost-informations in the reply objects.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 10 '06 #16
Hi Knut,

Thanks again for your detailed reply.

"Either you can handle OR-predicates or not, And DB2 doesn't care if
this is
handled by the remote data source or in the wrapper. If you don't
handle
it, DB2 will compensate for it, i.e. just do a table scan and filter
itself. If you can handle it, DB2 takes your results. The
"in-between"
would be to tell DB2 you don't handle it but still do some or all of
it.
Whether that makes sense or not might heavily depend on the wrapper. "

The problem is we REQUIRE a predicate like Ticker='IBM' to process the
query. Since the query goes to a web source like Yahoo Finance, which
has millions of web pages, we cannot do a table scan and reply by
rejecting the predicate(s). When the request is a SINLIST, we have to
be intelligent enough to recognize that with some tweaking the request
can be handled -- not with the abilities of the remote data source, but
our custom kind of mini planner and executioner within the wrapper--.
We were hoping that the wrapper would need not be inteligent at all,
but it seems like it may have to be sophisticated in some situations.
This specific example we posed is of course simple, but I'll let you
know if we encounter more tricky cases.

Thanks again for your comments, they have been very helpful.

Regards.

Feb 10 '06 #17
aykut firat wrote:
Hi Knut,

Thanks again for your detailed reply.

"Either you can handle OR-predicates or not, And DB2 doesn't care if
this is
handled by the remote data source or in the wrapper. If you don't
handle
it, DB2 will compensate for it, i.e. just do a table scan and filter
itself. If you can handle it, DB2 takes your results. The
"in-between"
would be to tell DB2 you don't handle it but still do some or all of
it.
Whether that makes sense or not might heavily depend on the wrapper. "

The problem is we REQUIRE a predicate like Ticker='IBM' to process the
query.
Can you use a predicate like "ticker = ticker"?
Since the query goes to a web source like Yahoo Finance, which
has millions of web pages, we cannot do a table scan and reply by
rejecting the predicate(s).
That's understandable. Nevertheless, you should be aware that DB2 might not
give you a predicate at all that you could push down. For those cases you
will have to figure something out anyways.
When the request is a SINLIST, we have to
be intelligent enough to recognize that with some tweaking the request
can be handled


I would not say there is "tweaking" involved but rather "you have to figure
out how to process such a predicate". Some predicates are simpler than
others.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 13 '06 #18
At run-time, DB2 will actually have your wrapper create two separate
RemoteQuery objects. Both will be used in parallel. The first is for the
first query. It returns a row and then the second RemoteQuery object is
told to get its rows and the value for the unbound parameter is provided
now. This allows you to complete the SQL statement that your data source
has to handle.

Once all rows are returned by the 2nd RemoteQuery object, DB2 will get back
to the 1st object, request the next row and then talk again to the 2nd
RemoteQuery object to fetch all rows with the new value for the "unbound"
parameter. And so on and so forth...

This is just a traditional nested loop join.

Hi Knut,

I understand the logic behind the nested loop. However I am having
technical difficulties in getting the rows from the first request
object which is needed to bind the unbounded parameters in the 2nd
request object. Looking at the API for RemoteQuery, i see a method
public final RuntimeDataList getInputData(). Is this the method to get
the rows from the first query? So far, I have been getting null values
from this method. I was wondering if there is a flag that can indicate
that the data is ready for the 2nd query to process?

Thank you so much for all your help

Feb 21 '06 #19
li*****@gmail.com wrote:
I understand the logic behind the nested loop. However I am having
technical difficulties in getting the rows from the first request
object which is needed to bind the unbounded parameters in the 2nd
request object. Looking at the API for RemoteQuery, i see a method
public final RuntimeDataList getInputData(). Is this the method to get
the rows from the first query?
I would think so, yes.
So far, I have been getting null values
from this method.
You get null values in Java or SQL NULLs?
I was wondering if there is a flag that can indicate
that the data is ready for the 2nd query to process?


The DB2 query gateway won't call the remote query object before that.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 21 '06 #20

Knut Stolze wrote:
li*****@gmail.com wrote:
I understand the logic behind the nested loop. However I am having
technical difficulties in getting the rows from the first request
object which is needed to bind the unbounded parameters in the 2nd
request object. Looking at the API for RemoteQuery, i see a method
public final RuntimeDataList getInputData(). Is this the method to get
the rows from the first query?


I would think so, yes.
So far, I have been getting null values
from this method.


You get null values in Java or SQL NULLs?


I tried to use getInputData() method within CameleonQuery.open() which
implements RemoteQuery.ope()n And I have been getting JAVA null values
for the request that requires unbounded parameters: SELECT
(LASTTRADE), (TICKER) FROM LYNNWU .YAHOO
WHERE ([<operator kind>=[<column kind>TICKER][<unbound kind>]

The query i was trying to execute is

select yahoo.lasttrade from ("select companyticker from
companytable where industry='Biotechnology' and companyticker <
'ACAD'") AS x, y
ahoo where yahoo.ticker = x.companyticker"

and DB2 processed the request
SELECT COMPANYTICKER, INDUSTRY FROM COMPANYTABLE WHERE INDUSTRY =
Biotechnology
before trying to answer the 2nd request: SELECT LASTTRADE, TICKER FROM
YAHOO WHERE TICKER =

I was wondering if there is a flag that can indicate
that the data is ready for the 2nd query to process?


The DB2 query gateway won't call the remote query object before that.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany


Feb 21 '06 #21
li*****@gmail.com wrote:

Knut Stolze wrote:
li*****@gmail.com wrote:
> I understand the logic behind the nested loop. However I am having
> technical difficulties in getting the rows from the first request
> object which is needed to bind the unbounded parameters in the 2nd
> request object. Looking at the API for RemoteQuery, i see a method
> public final RuntimeDataList getInputData(). Is this the method to get
> the rows from the first query?


I would think so, yes.
> So far, I have been getting null values
> from this method.


You get null values in Java or SQL NULLs?


I tried to use getInputData() method within CameleonQuery.open() which
implements RemoteQuery.open() And I have been getting JAVA null values
for the request that requires unbounded parameters: SELECT
(LASTTRADE), (TICKER) FROM LYNNWU .YAHOO
WHERE ([<operator kind>=[<column kind>TICKER][<unbound kind>]


That's exactly how I would do it as well.

If you are getting null pointers (or whatever Java likes to name its
pointers), I would probably recommend that you open a PMR to get this
investigated.

You could also collect a db2 trace and send this to me directly. I could
have a look, but I don't know if I would be able to find anything helpful.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 21 '06 #22
li*****@gmail.com wrote:

Knut Stolze wrote:
li*****@gmail.com wrote:
> I understand the logic behind the nested loop. However I am having
> technical difficulties in getting the rows from the first request
> object which is needed to bind the unbounded parameters in the 2nd
> request object. Looking at the API for RemoteQuery, i see a method
> public final RuntimeDataList getInputData(). Is this the method to get
> the rows from the first query?


I would think so, yes.
> So far, I have been getting null values
> from this method.


You get null values in Java or SQL NULLs?


I tried to use getInputData() method within CameleonQuery.open() which
implements RemoteQuery.ope()n And I have been getting JAVA null values
for the request that requires unbounded parameters: SELECT
(LASTTRADE), (TICKER) FROM LYNNWU .YAHOO
WHERE ([<operator kind>=[<column kind>TICKER][<unbound kind>]


I found something but I'm not sure if (a) my interpretation is correct and
(b) it is applicable to your case.

The Request_Exp::unbound kind is used if the table/quantifier accessed is
not in the FROM clause of the query fragment. You have the following
query:

SELECT yahoo.lasttrade
FROM ( SELECT companyticker
FROM companytable
WHERE industry='Biotechnology' AND
companyticker < 'ACAD'" ) AS x,
yahoo
WHERE yahoo.ticker = x.companyticker

So my guess is that DB2 does not (yet) figure out that the query could be
rewritten like this:

SELECT yahoo.lasttrade
FROM yahoo
WHERE yahoo.ticker IN
( SELECT companyticker
FROM companytable
WHERE industry='Biotechnology' AND
companyticker < 'ACAD'" )

Now, maybe DB2 does figure this out but still does not push-down the
correlation between the local table and the foreign data source to the
wrapper, i.e. companyticker in your case.

Btw, I haven't found nothing that deals with unbound parameters in the DRDA,
Oracle, or SQL Server wrappers. That means either I missed it or there is
nothing there.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Feb 24 '06 #23

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Edvard Majakari | last post: by
12 posts views Thread by Egil M?ller | last post: by
4 posts views Thread by Stephen | last post: by
9 posts views Thread by Julien Biezemans | last post: by
16 posts views Thread by utab | last post: by
1 post views Thread by Korara | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.