By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
458,131 Members | 1,282 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 458,131 IT Pros & Developers. It's quick & easy.

thoughts on regular expression matching

P: n/a
I'm playing with the ICU library (slow weekend :-) and it's quite simple
to expose the regular expression functionality for Unicode databases.

Presuming that the majority is not on unicode, would a developerWorks
article which supports ASCII only for non unicode DBs be useful?
(Otherwise I'll have to do some serious digging for translation tables)

Here is an example of what I've got:

-- validate Canadian postal code:
db2 => values matches('L0H1H0', '[A-Z][0-9][A-Z][0-9][A-Z][0-9]');

1
------
1

1 record(s) selected.

Other functions possible are listen here:
http://icu.sourceforge.net/userguide/regexp.html

Input welcome
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/
Jun 18 '06 #1
Share this Question
Share on Google+
20 Replies


P: n/a
Serge Rielau wrote:
I'm playing with the ICU library (slow weekend :-) and it's quite simple
to expose the regular expression functionality for Unicode databases.

Presuming that the majority is not on unicode, would a developerWorks
article which supports ASCII only for non unicode DBs be useful?
(Otherwise I'll have to do some serious digging for translation tables)

Here is an example of what I've got:

-- validate Canadian postal code:
db2 => values matches('L0H1H0', '[A-Z][0-9][A-Z][0-9][A-Z][0-9]');

1
------
1

1 record(s) selected.

Other functions possible are listen here:
http://icu.sourceforge.net/userguide/regexp.html

Input welcome
Serge

It would indeed. I miss RE pattern matching in DB2. I've been using
v2/sed/awk/perl for too long. I know it is too much to ask for, but do
the functions exposed include anything like PHP's ereg_replace?
Jun 19 '06 #2

P: n/a
Hello.

I think the easiest way to get regex functionality in DB2 is using some
java package (or using built-in java 1.4 functionality).
You have to write fairly simple java UDFs using this package to get
match, group, replace regex functionality.
I have been doing so since DB2 v6.1.
Packages:
First of all it was IBM's regex4j.jar (no replace functionality), now
it is package from http://jregex.sourceforge.net.

If someone needs samples, write at
mark(dot)b(at)mail(dot)ru

Sincerely,
Mark B.
It would indeed. I miss RE pattern matching in DB2. I've been using
v2/sed/awk/perl for too long. I know it is too much to ask for, but do
the functions exposed include anything like PHP's ereg_replace?


Jun 19 '06 #3

P: n/a
Serge Rielau wrote:
I'm playing with the ICU library (slow weekend :-) and it's quite simple
to expose the regular expression functionality for Unicode databases.

Presuming that the majority is not on unicode, would a developerWorks
article which supports ASCII only for non unicode DBs be useful?
(Otherwise I'll have to do some serious digging for translation tables)

Here is an example of what I've got:

-- validate Canadian postal code:
db2 => values matches('L0H1H0', '[A-Z][0-9][A-Z][0-9][A-Z][0-9]');

1
------
1

1 record(s) selected.

Other functions possible are listen here:
http://icu.sourceforge.net/userguide/regexp.html


There is a regex-article here: http://tinyurl.com/pyzto

It builds on C function. Carrying them to Java and Unicode may be a good
idea, though.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Jun 19 '06 #4

P: n/a
Serge Rielau wrote:
I'm playing with the ICU library (slow weekend :-) and it's quite simple
to expose the regular expression functionality for Unicode databases.

Presuming that the majority is not on unicode, would a developerWorks
article which supports ASCII only for non unicode DBs be useful?
(Otherwise I'll have to do some serious digging for translation tables)

Here is an example of what I've got:

-- validate Canadian postal code:
db2 => values matches('L0H1H0', '[A-Z][0-9][A-Z][0-9][A-Z][0-9]');

1
------
1

1 record(s) selected.

Other functions possible are listen here:
http://icu.sourceforge.net/userguide/regexp.html

Input welcome
Serge


Hi!

It would be usefull, but I'd VERY MUCH like to see that in UNICODE, since
all our databases user UTF-8.

Best regards,
Kovi

--
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
| Gregor Kovac | Gr**********@mikropis.si |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| In A World Without Fences Who Needs Gates? |
| Experience Linux. |
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
Jun 19 '06 #5

P: n/a
Gregor Kovač wrote:
Serge Rielau wrote:
I'm playing with the ICU library (slow weekend :-) and it's quite simple
to expose the regular expression functionality for Unicode databases.

Presuming that the majority is not on unicode, would a developerWorks
article which supports ASCII only for non unicode DBs be useful?
(Otherwise I'll have to do some serious digging for translation tables)

Here is an example of what I've got:

-- validate Canadian postal code:
db2 => values matches('L0H1H0', '[A-Z][0-9][A-Z][0-9][A-Z][0-9]');

1
------
1

1 record(s) selected.

Other functions possible are listen here:
http://icu.sourceforge.net/userguide/regexp.html

Input welcome
Serge


Hi!

It would be usefull, but I'd VERY MUCH like to see that in UNICODE, since
all our databases user UTF-8.

Best regards,
Kovi

That's the trivial part. :-)

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/
Jun 19 '06 #6

P: n/a
Knut Stolze wrote:
Serge Rielau wrote:
I'm playing with the ICU library (slow weekend :-) and it's quite simple
to expose the regular expression functionality for Unicode databases.

Presuming that the majority is not on unicode, would a developerWorks
article which supports ASCII only for non unicode DBs be useful?
(Otherwise I'll have to do some serious digging for translation tables)

Here is an example of what I've got:

-- validate Canadian postal code:
db2 => values matches('L0H1H0', '[A-Z][0-9][A-Z][0-9][A-Z][0-9]');

1
------
1

1 record(s) selected.

Other functions possible are listen here:
http://icu.sourceforge.net/userguide/regexp.html


There is a regex-article here: http://tinyurl.com/pyzto

It builds on C function. Carrying them to Java and Unicode may be a good
idea, though.

Fascinating, I had no clue this article was out there.
Now it talks about "PERL" semantics. Are there differences?
Does PCRE support Unicode?
Is there binary code available somewhere for PCRE? I can't find it at
least...

Cheers
Serge


--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/
Jun 19 '06 #7

P: n/a
Serge Rielau wrote:

I wrote this article 3 years ago. So please don't ask me any details
Now it talks about "PERL" semantics. Are there differences?
Perl has some additional metacharacters (compared to traditional regexps).
For example, \s, \w, ... are there. At a first glance, it is very similar
to ICU. There is at least one difference, though: \b matches in Perl the
"beginning/end of a word". Maybe there are some more such things...
Does PCRE support Unicode?
Yes, it does.
The man page states:
The current implementation of PCRE (release 6.x) corresponds
approximately with Perl 5.8, including support for UTF-8
encoded strings and Unicode general category properties. However,
this support has to be explicitly enabled; it is not the default.
Is there binary code available somewhere for PCRE? I can't find it at
least...


There is something for Windows. I don't know about any precompiled packages
for the UNIX platforms (except the RPMs delivered for the various Linux
distributions).

p.s: http://www.pcre.org/

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Jun 19 '06 #8

P: n/a
I could use installation instructions.

I've tried to use the Java oro library but no matter what I do DB2
can't find the jars with the UDFs I put in it's jar directory.

I studied several articles on IBMs site but nothing worked for me.

I couldn't even create a UDF to return an integer value, say 5.

Jun 20 '06 #9

P: n/a
Hello.

Jar directory???
If you need some jar (zip) for your UDF you have to register this jar
by calling SP
sqlj.install_jar(...);
If you bundle your own classes into jar you have to do the same with
this jar.
If you do not bundle your own classes into jar you have to place your
class file into
sqllib/function directory.
And don't forget that you have to place your class into the same
directory structure inside sqllib/function directory according the
package name of your java class.

Sincerely,
Mark B.
I could use installation instructions.

I've tried to use the Java oro library but no matter what I do DB2
can't find the jars with the UDFs I put in it's jar directory.

I studied several articles on IBMs site but nothing worked for me.

I couldn't even create a UDF to return an integer value, say 5.


Jun 20 '06 #10

P: n/a
> If you need some jar (zip) for your UDF you have to register this jar by calling SP sqlj.install_jar(...);

Uh huh. "..." That's not much help.

....

I'm not certain what it means to have EXECUTE privilige on
SQLJ.INSTALL_JAR but I can run java programs.

I tried various:

call sqlj.install_jar('file://W:/Program
Files/IBM/SQLLIB/FUNCTION/",UdfUtils.jar',0);

And used this SQL to register RegEx as:

CREATE FUNCTION RegExp(pattern VARCHAR(2048), string CLOB(10M))
RETURNS INTEGER
FENCED
VARIANT
NO SQL
LANGUAGE JAVA
PARAMETER STYLE JAVA
EXTERNAL NAME 'com.rh.bb.udf.utils.RegExp!Match'

If you bundle your own classes into jar you have to do the same with this jar.
I named the jar file UdfUtils.jar .

I placed it in:

W:\Program Files\IBM\SQLLIB\FUNCTION/UdfUtils.jar

And FYI :

jar tvf UdfUtils.jar
0 Tue Mar 28 01:39:38 PST 2006 META-INF/
55 Tue Mar 28 01:39:38 PST 2006 META-INF/MANIFEST.MF
0 Tue Mar 28 01:39:38 PST 2006 com/
0 Tue Mar 28 01:39:38 PST 2006 com/rh/
0 Tue Mar 28 01:39:38 PST 2006 com/rh/bb/
0 Tue Mar 28 01:39:38 PST 2006 com/rh/bb/udf/
0 Tue Mar 28 01:39:38 PST 2006 com/rh/bb/udf/utils/
703 Tue Mar 28 01:39:38 PST 2006 com/rh/bb/udf/utils/RegExp.class

Here is RegExp.java:

package com.rh.bb.udf.utils;

import java.sql.Clob;

import java.util.regex.Pattern;
import java.util.regex.Matcher;

import COM.ibm.db2.app.UDF;

public class RegExp extends UDF {

public static int Match(java.lang.String pattern, java.sql.Clob
clob)
{
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(
clob.getSubString(Long.parseLong("1"), (int)clob.length()) );
if ( m.matches() ) {
return 1;
}
return 0;
}
}

I also couldn't get the simpler testUDF method to work. (It's not in
the jar shown above but once it was.) Here it is:

package com.rh.bb.udf.utils;

import java.sql.*;

import java.lang.*;

import COM.ibm.db2.app.*;

class DB2Appl1 extends UDF {

public static int testUDF(int value1, int value2)
{
try
{
return(value1 & value2);
}
catch (Exception e)
{
return 0;
}
}
}

upon executing RegEx(table.someclob) , or testUDF, nothing not a zero,
not a one is returned, and there are no error messages.

I couldn't find any log files that had any traces suggesting errors.
If you do not bundle your own classes into jar you have to place your
class file into sqllib/function directory.
And don't forget that you have to place your class into the same
directory structure inside sqllib/function directory according the
package name of your java class.


Didn't have to do that.

Jun 21 '06 #11

P: n/a
Ok, lets start from simple example.
testUDF returns sum to 2 arguments.
If you extends UDF class in your java UDF, you have to do these things:
1.
---
package com.rh.bb.udf.utils;
import java.sql.*;
import COM.ibm.db2.app.*;
class DB2Appl1 extends UDF
{
public void testUDF(int value1, int value2, int result)
{
try
{
set(3, value1 + value2);
}
catch (Exception e)
{
set(3, 0);
}
}
}
---
Place compiled class file into sqllib/function/com/rh/bb/udf/utils
directory.
2.
---
CREATE FUNCTION testUDF(int a1, int a2)
RETURNS int
EXTERNAL NAME 'com.rh.bb.udf.utils.DB2Appl1!testUDF'
LANGUAGE java
PARAMETER STYLE db2general
DETERMINISTIC
FENCED
NO SQL
NO EXTERNAL ACTION;
---
3.
---
select testUDF(1, 2) from sysibm.sysdummy1;
---

Is it works?

Jun 22 '06 #12

P: n/a
Thanks. Here are results so far:

1.

I modified testUDF() so it would compile. I removed the code in the
catch clause since set throws an Exception :

package com.rh.bb.udf.utils;
import java.sql.*;
import COM.ibm.db2.app.*;
class DB2Appl1 extends UDF
{
public void testUDF(int value1, int value2, int result)
{
try
{
set(3, value1 + value2);
}
catch (Exception e)
{
// do nothing
}
}
}
BTW, is it OK to change test testUDF to ?

public void testUDF(int value1, int value2, int result) throws
Exception {
set(3, value1 + value2);
}
Place compiled class file into sqllib/function/com/rh/bb/udf/utils directory.
Got it.

Do I need to restart the database? That wasn't in the instructions.
I executed the following script (under DB2-8.2 ) which you were kind
enough to write:
---
CREATE FUNCTION testUDF(int a1, int a2)
RETURNS int
EXTERNAL NAME 'com.rh.bb.udf.utils.DB2Appl1!testUDF'
LANGUAGE java
PARAMETER STYLE db2general
DETERMINISTIC
FENCED
NO SQL
NO EXTERNAL ACTION;
This results in the following SQL error:
SQL0204N "A1" is an undefined name. LINE NUMBER1. SQLSTATE=42704
The Java method testUDF has three arguments while the SQL testUDF
function has 2 arguments. Is that intended?

---
3.
---
select testUDF(1, 2) from sysibm.sysdummy1;
---

Is it works?


I didn't get that far.

Jun 22 '06 #13

P: n/a
See answers below.

BTW, is it OK to change test testUDF to ?

public void testUDF(int value1, int value2, int result) throws
Exception {
set(3, value1 + value2);
}

Yes.

Place compiled class file into sqllib/function/com/rh/bb/udf/utils directory.
Got it.

Do I need to restart the database? That wasn't in the instructions.


No. Do this:
call sqlj.refresh_classes();

I executed the following script (under DB2-8.2 ) which you were kind
enough to write:
---
CREATE FUNCTION testUDF(int a1, int a2)
RETURNS int
EXTERNAL NAME 'com.rh.bb.udf.utils.DB2Appl1!testUDF'
LANGUAGE java
PARAMETER STYLE db2general
DETERMINISTIC
FENCED
NO SQL
NO EXTERNAL ACTION;


This results in the following SQL error:
SQL0204N "A1" is an undefined name. LINE NUMBER1. SQLSTATE=42704


Excuse me.
Should be:
CREATE FUNCTION testUDF(a1 int, a2 int)
....

Sincerely,
Mark B.

Jun 23 '06 #14

P: n/a
Steps 1,2,3, and 4 completed.

I'm using AQT as my database interface tool. (I wouldn't guess that
should matter.)

Executing:

select testUDF(1, 2) from sysibm.sysdummy1

results in a column without any content.

Executing:

select 7,testUDF(1, 2) from sysibm.sysdummy1

results in two columns without any content, not even the 7.

Jun 23 '06 #15

P: n/a
I haven't ever heard about this tool.
Try this select from db2 command line processor.
Does it work from CLP?
Steps 1,2,3, and 4 completed.

I'm using AQT as my database interface tool. (I wouldn't guess that
should matter.)

Executing:

select testUDF(1, 2) from sysibm.sysdummy1

results in a column without any content.

Executing:

select 7,testUDF(1, 2) from sysibm.sysdummy1

results in two columns without any content, not even the 7.


Jun 28 '06 #16

P: n/a
Whoo hooo! We're cooking with gasoline!

I set things up so I could excute the function from a SQL statement
embedded in a Java program. Just using AQT wasn't doing the job.

At first I got a bunch of SQLCODE -4304 errors.

The trick was decyphering the error message in the format
<name>:<spec-name>:<class>:<reason-code>.

I fixed a few class name type-os (reason-code 1 was saying the class
wasn't in my CLASSPATH).

I also placed a jar file having DB2App1.class in my NT CLASSPATH into
SQLLIB/java/UdfUtils.jar, storing it in the same directory as other IBM
jar files.

So testUDF is good to go in both Java JDBC and from the AQT GUI.

Question 1:

I created two version of testUDF.

public void testUDF(int value1, int value2, int result)
public void testUDF(int value1, int value2)

Off hand I'm not certain which method ran. What was the significance of
creating a version with a third argument "int result" ?

Question 2:

Where can I find the API (javadocs) to COM.ibm.db2.app.*;

Question 3:

Does set only work with integers?

Question 4:

Do you have any examples that are more complicated than this simple
one?

What do you want to try next?

Jun 29 '06 #17

P: n/a
Here is another example.

This one comes from:
http://www.dbforums.com/archive/index.php/t-447057.html

In this case:

package com.rh.bb.udf.utils;

import java.sql.*;
import COM.ibm.db2.app.*;

class DB2Appl1 extends UDF
{

public static String AllTrim (String s, String c) throws Exception
{
int len = s.length();
int st = 0;
char[] val=s.toCharArray();
char fch=c.charAt(0);
while ((st < len) && (val[st] ==fch)) {st++;}
while ((st < len) && (val[len - 1] == fch)) {len--;}
return s.substring(st, len);
}
}

CREATE FUNCTION AllTrim (VARCHAR(4000), VARCHAR(1)) RETURNS
VARCHAR(4000)
FENCED
LANGUAGE JAVA
PARAMETER STYLE JAVA
EXTERNAL NAME 'com.rh.bb.udf.utils.DB2Appl1!AllTrim'
NO EXTERNAL ACTION DETERMINISTIC NO SQL;
call sqlj.refresh_classes();

select AllTrim('aaaabb','a') from sysibm.sysdummy1;

Here's the error :

bad SQL grammar [select AllTrim('aaabbb';,'a') DD from
sysibm.sysdummy1]; nested exception is com.ibm.db2.jcc.b.SqlException:
DB2 SQL error: SQLCODE: -4306, SQLSTATE: 42724, SQLERRMC:
EMMMCL01.ALLTRIM;SQL060628174547500;AllTrim;(Ljava/lang/String;Ljava/l"
type="org.springframework.jdbc.BadSqlGrammarExcept ion
I gather that the function signature in the CREATE statement doesn't
match the Java version. Now what?

Thanks.

Jun 29 '06 #18

P: n/a
gi*******************@yahoo.com wrote:
Whoo hooo! We're cooking with gasoline!

I set things up so I could excute the function from a SQL statement
embedded in a Java program. Just using AQT wasn't doing the job.

At first I got a bunch of SQLCODE -4304 errors.

The trick was decyphering the error message in the format
<name>:<spec-name>:<class>:<reason-code>.

I fixed a few class name type-os (reason-code 1 was saying the class
wasn't in my CLASSPATH).

I also placed a jar file having DB2App1.class in my NT CLASSPATH into
SQLLIB/java/UdfUtils.jar, storing it in the same directory as other IBM
jar files.

So testUDF is good to go in both Java JDBC and from the AQT GUI.

Question 1:

I created two version of testUDF.

public void testUDF(int value1, int value2, int result)
public void testUDF(int value1, int value2)

Off hand I'm not certain which method ran. What was the significance of
creating a version with a third argument "int result" ?

Question 2:

Where can I find the API (javadocs) to COM.ibm.db2.app.*;

Question 3:

Does set only work with integers?

Question 4:

Do you have any examples that are more complicated than this simple
one?

What do you want to try next?

have you looked at sqllib/samples?

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/
Jun 29 '06 #19

P: n/a
Use
---
public class DB2Appl1
---
instead of
---
class DB2Appl1 extends UDF
-----------------------------
Remember these things for java UDFs:
1. If you use PAREMETER STYLE JAVA you:
-have to declare your methods as static
-have to use return clause in java method
-don't have to extend UDF class
2. if you use PAREMETER STYLE DB2GENERAL you:
-have to extend UDF class
-have to declare your method as void
-have to use last paremeter of your method for returning a value and
set them by "set" method implemented in UDF class
-aren't allowed to declare this last parameter in your CREATE FUNCTION
statement
-----------------------------

class DB2Appl1 extends UDF
{

public static String AllTrim (String s, String c) throws Exception
{
int len = s.length();
int st = 0;
char[] val=s.toCharArray();
char fch=c.charAt(0);
while ((st < len) && (val[st] ==fch)) {st++;}
while ((st < len) && (val[len - 1] == fch)) {len--;}
return s.substring(st, len);
}
}

CREATE FUNCTION AllTrim (VARCHAR(4000), VARCHAR(1)) RETURNS
VARCHAR(4000)
FENCED
LANGUAGE JAVA
PARAMETER STYLE JAVA
EXTERNAL NAME 'com.rh.bb.udf.utils.DB2Appl1!AllTrim'
NO EXTERNAL ACTION DETERMINISTIC NO SQL;
call sqlj.refresh_classes();

select AllTrim('aaaabb','a') from sysibm.sysdummy1;

Here's the error :

bad SQL grammar [select AllTrim('aaabbb';,'a') DD from
sysibm.sysdummy1]; nested exception is com.ibm.db2.jcc.b.SqlException:
DB2 SQL error: SQLCODE: -4306, SQLSTATE: 42724, SQLERRMC:
EMMMCL01.ALLTRIM;SQL060628174547500;AllTrim;(Ljava/lang/String;Ljava/l"
type="org.springframework.jdbc.BadSqlGrammarExcept ion
I gather that the function signature in the CREATE statement doesn't
match the Java version. Now what?

Thanks.


Jun 29 '06 #20

P: n/a
Assuming DB2 on Linux is compiled with gcc, only Linux users are so
lucky to have a free DB2 compliant C compiler.

Assuming you have compatible C compiler, the libraries you'd build
using C probably won't be cross platform complatible.

I'd recommend passing over the C RegEx implementations they aren't
likely to be helpful to anyone aside from hobbyists..

Also, note that JRegex does not have any DB2 specific methods. JRegex
is a Perl 5 compatible regular expression package.

My 2 cents.

Jun 29 '06 #21

This discussion thread is closed

Replies have been disabled for this discussion.