On Jul 2, 7:11 pm, bstjean <bstj...@yahoo.comwrote:
Quote:
Hi everyone,
>
I am trying to find an efficient way to perform a special query. Let
me explain what I want.
>
Let's say we are looking for all description that match "this is the
target". In fact, I want to find records that match those 4 words in
this sequence disregarding the number of spaces (I mean spaces, tabs,
Cr, Lf, etc) between them.
>
This has to be done without REGEX (would be too easy!). Besides
throwing a bunch of REPLACE(REPLACE(REPLACE())) to strip separators,
anyone has a better idea on how to do that in SQL, only in SQL with no
UDF, just plain DB2 SQL ???
>
To make it more clear, here's a more detailed example.
>
If the searched string is "this is the target", I would expect results
like :
>
"this[9 spaces]is[1 tab]the[1 carriage return][1 line feed]target"
"this[3 tabs][1 space]is[2 carrage returns]the[2 spaces]target"
"this[1 space]is[16 spaces]the[2 tabs][11 spaces][1 carriage return][1
tab]target"
"this[80 spaces]is[4 line feeds]the[5 spaces]target"
"this[1 space]is[1 space]the[1 space]target"
>
Well, you get the picture. For practical reasons, we can assume that
no word is going to be separated by more than 80 separator
characters. Separators are : space, tab, line feed, carriage return.
The number of separators bewteen each word can be anything <= 80.
>
As I said, all this has to be done in "plain" SQL, i.e. no UDF, no
REGEX, just plain basic SQL.
>
thank you
I assume you have good reasons for the restrictions you add ;-) The
following will be terribly inefficient and is only a sketch, you will
have to fill in the details your self. Oh, and btw, I'll use a sql
function but it can be expanded inline. Given this function
(originally posted by Knut):
CREATE FUNCTION elements ( string varchar(100) )
RETURNS TABLE ( ordinal INTEGER, index INTEGER )
LANGUAGE SQL
DETERMINISTIC
NO EXTERNAL ACTION
CONTAINS SQL
RETURN
WITH t(ordinal, index) AS
( VALUES ( 0, 0 )
UNION ALL
SELECT ordinal+1, COALESCE(NULLIF(
LOCATE(' ', string, index+1), 0), LENGTH(string)
+1)
FROM t
-- to prevent a warning condition for infinite recursion
WHERE ordinal < 500 AND
LOCATE(' ', string, index+1) <0 )
SELECT ordinal, index
FROM t
@
As you can see it only handles spaces, but it can be expanded to
handle more tokens (add union blocks). Now we can produce the
following table:
[lelle@53dbd181 src]$ db2 "select index from table ( elements
( 'this is the target ')) x where index 0"
INDEX
-----------
5
6
7
8
11
12
13
17
18
25
10 record(s) selected.
Finding intervals is relatively easy:
with T(n) as (
select index from table ( elements ( 'this is the target '))
x where index 0
) select lb.min_n, min(ub.max_n) max_n from (
select n as min_n from T T1
where not exists (
select 1 from T T2 where T1.n = T2.n + 1
)
) lb, (
select n as max_n from T T3
where not exists (
select 1 from T T4 where T3.n = T4.n - 1
)
) ub
where ub.max_n >= lb.min_n
group by lb.min_n
MIN_N 2
----------- -----------
5 8
11 13
17 18
25 25
use substr and some arithmetic for the rest.
/Lennart