473,782 Members | 2,479 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

parsing a comma-separatinf list

AK
Once upon a time there was a table:

CREATE TABLE VENDOR(VENDOR_I D INT, NAME VARCHAR(50), STATE CHAR(2))@

in a while the developers realized that a vendor may be present in
xseveral states, so the structure was changed to

CREATE TABLE VENDOR(VENDOR_I D INT, NAME VARCHAR(50), STATE_LIST
VARCHAR(150))@

STATE)LIST could contain a list of comma-separated values like this:
CA,WA,OR
Not the best approach, but out here in the field things like this
happen all the time. Everybody were absolutely happy until somebody
required a query like this:
SELECT * FROM VENDOR WHERE STATE IN ('IL', 'WI','MI')
SELECT COUNT(*), STATE FROM VENDOR GROUP BY STATE
and so on
Given a dayabase structure developed by a skilled professional, that
would be a piece of cake, but not this time.

Moving to a better structure

CREATE TABLE VENDOR(VENDOR_I D INT, NAME VARCHAR(50))@
CREATE TABLE VENDOR_IN_STATE (VENDOR_ID INT, STATE CHAR(2))@

nobody wanted to change the front end application.
As a workaround, I created a view and INSTEAD OF triggers that worked
like this:

INSERT INTO VENDOR_IN_STATE _VIEW VALUES(1, 'ABC INC.', 'AK,AL,IL')@

SELECT * FROM VENDOR
VENDOR_ID NAME
----------- --------------------------------------------------
1 ABC INC.
1 record(s) selected.

SELECT * FROM VENDOR_IN_STATE
VENDOR_ID STATE
----------- -----
1 AK
1 AL
1 IL
3 record(s) selected.

I used recursion to define both the view and the triggers:

CREATE FUNCTION PARSE_LIST(C_LI ST VARCHAR(100))
RETURNS TABLE(TOKEN VARCHAR(100))
SPECIFIC PARSE_LIST
RETURN
WITH PARSED_LIST(STE P, TOKEN, REMAINDER,LEFTM OST_COMMA)
AS(
VALUES(0, '',C_LIST,3)
UNION ALL
SELECT STEP+1 AS STEP,
CASE WHEN LEFTMOST_COMMA> 0 THEN
CAST(LEFT(REMAI NDER,LEFTMOST_C OMMA-1) AS CHAR(2))
ELSE
REMAINDER
END AS TOKEN,
CASE WHEN LEFTMOST_COMMA> 0 THEN
CAST(SUBSTR(REM AINDER,LEFTMOST _COMMA+1) AS VARCHAR(100))
ELSE
NULL
END AS REMAINDER,
LOCATE(',',SUBS TR(REMAINDER,LO CATE(',',REMAIN DER)+1)) AS
LEFTMOST_COMMA
FROM PARSED_LIST
WHERE REMAINDER IS NOT NULL
)
SELECT TOKEN FROM PARSED_LIST WHERE STEP>0

I wrapped the recursive query in UDF so that it could be reused:

It is very easy to invoke this table UDF:

SELECT * FROM TABLE(PARSE_LIS T('AK,AR,IL,OH' )) AS PARSE_LIST
TOKEN
------
AK
AR
IL

OH

I user the UDF in an INSTEAD OF trigger

CREATE TRIGGER VENDOR_IN_STATE _I
INSTEAD OF INSERT
ON VENDOR_IN_STATE _VIEW
REFERENCING NEW AS N
FOR EACH ROW MODE DB2SQL
BEGIN ATOMIC
DECLARE VENDOR_FOUND INT;
SET VENDOR_FOUND=(S ELECT COUNT(*) FROM VENDOR WHERE NAME=N.NAME);
IF NOT(VENDOR_FOUN D>0)
THEN
INSERT INTO VENDOR(VENDOR_I D, NAME) VALUES (N.VENDOR_ID, N.NAME);
END IF;
INSERT INTO VENDOR_IN_STATE (VENDOR_ID, STATE)
SELECT N.VENDOR_ID, PARSE_LIST.TOKE N FROM
TABLE(PARSE_LIS T(N.STATE_LIST) ) AS PARSE_LIST;
END @

I would really appreciate any feedback.
Are there any simpler approaches?
Nov 12 '05 #1
15 5740
AK <ak************ @yahoo.com> wrote:
Once upon a time there was a table:

CREATE TABLE VENDOR(VENDOR_I D INT, NAME VARCHAR(50), STATE CHAR(2))@

in a while the developers realized that a vendor may be present in
xseveral states, so the structure was changed to [...] I would really appreciate any feedback.
Are there any simpler approaches?


That looks pretty much straight forward to me and similar to what I wrote
once:
http://www-106.ibm.com/developerwork...03stolze1.html

The only alternative that might be a bit faster would be to use an external
table function, written in C or Java. That way, you could avoid the
recursion in SQL. I really don't know if performance would improve or not,
however.

--
Knut Stolze
Information Integration
IBM Germany / University of Jena
Nov 12 '05 #2
why not keep the table as is (i.e. with comma separated states) and
create a view which will have the names of the states in different
rows. should be easier as u wont need a instead of trigger.

view can be created like

select vendor_id,subst r(state,locate( ',',state,a * 3)+1,2) from
vendor,table(va lues(1),(2),(3) ,(4),(5)) as vendor_ids(a) where a <=
length(state)/3

Knut Stolze <st****@de.ibm. com> wrote in message news:<bs******* ***@fsuj29.rz.u ni-jena.de>...
AK <ak************ @yahoo.com> wrote:
Once upon a time there was a table:

CREATE TABLE VENDOR(VENDOR_I D INT, NAME VARCHAR(50), STATE CHAR(2))@

in a while the developers realized that a vendor may be present in
xseveral states, so the structure was changed to

[...]
I would really appreciate any feedback.
Are there any simpler approaches?


That looks pretty much straight forward to me and similar to what I wrote
once:
http://www-106.ibm.com/developerwork...03stolze1.html

The only alternative that might be a bit faster would be to use an external
table function, written in C or Java. That way, you could avoid the
recursion in SQL. I really don't know if performance would improve or not,
however.

Nov 12 '05 #3
subaga <su******@yahoo .com> wrote:
why not keep the table as is (i.e. with comma separated states) and
create a view which will have the names of the states in different
rows. should be easier as u wont need a instead of trigger.

view can be created like

select vendor_id,subst r(state,locate( ',',state,a * 3)+1,2) from
vendor,table(va lues(1),(2),(3) ,(4),(5)) as vendor_ids(a) where a <=
length(state)/3


That only works if:
(1) the states are always exactly 2 characters long - if they are not, then
your calculation doesn't work.
(2) you have at most 5 states - ok, you could open this up by adding more
rows to "vendor_ids "; depending on the actual data, you might need several
1000 rows (VARCHARs can be up to 32K long!), and that might not be simpler
then.

--
Knut Stolze
Information Integration
IBM Germany / University of Jena
Nov 12 '05 #4
AK
Knut,

thanks for answering! I loved your article!
Nov 12 '05 #5
AK
su******@yahoo. com (subaga) wrote in message news:<f6******* *************** **@posting.goog le.com>...
why not keep the table as is (i.e. with comma separated states) and
create a view which will have the names of the states in different
rows. should be easier as u wont need a instead of trigger.

view can be created like

select vendor_id,subst r(state,locate( ',',state,a * 3)+1,2) from
vendor,table(va lues(1),(2),(3) ,(4),(5)) as vendor_ids(a) where a <=
length(state)/3


In addition to what Knut has already writtenm,

if there is no table for states, htere are no indexes and statistics
on it, all the queries involving states run much slower. Also there
are quite a few books on the advantages of normalization, such as
written by Joe Celko
Nov 12 '05 #6
>> in a while the developers realized that a vendor may be present in
several states, so the structure was changed to :

CREATE TABLE Vendors
(vendor_id INTEGER NOT NULL PRIMARY KEY,
vendor_name VARCHAR(35) NOT NULL,
state_list VARCHAR(150)NOT NULL);

STATE_LIST could contain a list of comma-separated values <<

Hopefully, someone took the idiot out to a wall, read him the part in
the most basic RDBMS book they could find about First Normal Form and
shot him. I like stories with happy endings.

Passing a list of parameters to a stored procedure can be done by
putting them into a string with a separator. I like to use the
traditional comma. Let's assume that you have a whole table full of
such parameter lists:

CREATE TABLE InputStrings
(keycol CHAR(10) NOT NULL PRIMARY KEY,
input_string VARCHAR(255) NOT NULL);

INSERT INTO InputStrings VALUES ('first', '12,34,567,896' );
INSERT INTO InputStrings VALUES ('second', '312,534,997,89 6');
...

It makes life easier if the lists in the input strings start and end
with a comma. You will also need a table called Sequence, which is a
set of integers from 1 to (n).

CREATE VIEW Breakdown (keycol, parm_nbr, parameter)
AS
SELECT keycol, COUNT(S2.seq),
CAST (SUBSTRING (',' + I1.input_string + ',', MAX(S1.seq + 1),
(S2.seq - MAX(S1.seq + 1)))
AS INTEGER)
FROM InputStrings AS I1, Sequence AS S1, Sequence AS S2
WHERE SUBSTRING (',' + I1.input_string + ',' , S1.seq, 1) = ','
AND SUBSTRING (',' + I1.input_string + ',' , S2.seq, 1) = ','
AND S1.seq < S2.seq
AND S2.seq <= LEN (I1.input_strin g) + 2
GROUP BY I1.keycol, I1.input_string , S2.seq;

The S1 and S2 copies of Sequence are used to locate bracketing pairs
of commas, and the entire set of substrings located between them is
extracts and cast as integers in one non-procedural step. The trick
is to be sure that the left hand comma of the bracketing pair is the
closest one to the second comma. The place column tells you the
relative position of the value in the input string.

YOu can put this into a VIEW until you can fix the schema and perform
the execution.
Nov 12 '05 #7
--CELKO-- wrote:
in a while the developers realized that a vendor may be present in


several states, so the structure was changed to :

CREATE TABLE Vendors
(vendor_id INTEGER NOT NULL PRIMARY KEY,
vendor_name VARCHAR(35) NOT NULL,
state_list VARCHAR(150)NOT NULL);

STATE_LIST could contain a list of comma-separated values <<

Hopefully, someone took the idiot out to a wall, read him the part in
the most basic RDBMS book they could find about First Normal Form and
shot him. I like stories with happy endings.

Passing a list of parameters to a stored procedure can be done by
putting them into a string with a separator. I like to use the
traditional comma. Let's assume that you have a whole table full of
such parameter lists:

CREATE TABLE InputStrings
(keycol CHAR(10) NOT NULL PRIMARY KEY,
input_string VARCHAR(255) NOT NULL);

INSERT INTO InputStrings VALUES ('first', '12,34,567,896' );
INSERT INTO InputStrings VALUES ('second', '312,534,997,89 6');
...

It makes life easier if the lists in the input strings start and end
with a comma. You will also need a table called Sequence, which is a
set of integers from 1 to (n).

CREATE VIEW Breakdown (keycol, parm_nbr, parameter)
AS
SELECT keycol, COUNT(S2.seq),
CAST (SUBSTRING (',' + I1.input_string + ',', MAX(S1.seq + 1),
(S2.seq - MAX(S1.seq + 1)))
AS INTEGER)
FROM InputStrings AS I1, Sequence AS S1, Sequence AS S2
WHERE SUBSTRING (',' + I1.input_string + ',' , S1.seq, 1) = ','
AND SUBSTRING (',' + I1.input_string + ',' , S2.seq, 1) = ','
AND S1.seq < S2.seq
AND S2.seq <= LEN (I1.input_strin g) + 2
GROUP BY I1.keycol, I1.input_string , S2.seq;

The S1 and S2 copies of Sequence are used to locate bracketing pairs
of commas, and the entire set of substrings located between them is
extracts and cast as integers in one non-procedural step. The trick
is to be sure that the left hand comma of the bracketing pair is the
closest one to the second comma. The place column tells you the
relative position of the value in the input string.

YOu can put this into a VIEW until you can fix the schema and perform
the execution.


With the exception that I prefer vertical bars, or other characters that
couldn't legitimately be in the data stream I agree. I think commas have
too many common uses to be of value as a separator.
--
Daniel Morgan
http://www.outreach.washington.edu/e...ad/oad_crs.asp
http://www.outreach.washington.edu/e...oa/aoa_crs.asp
da******@x.wash ington.edu
(replace 'x' with a 'u' to reply)

Nov 12 '05 #8
AK
Joe,

I loved the idea to use a sequence table, many thanks!
I would say that there is no need for aggregation either.
Given this table expression:

(SELECT
ROWNUMBER() OVER(ORDER BY S1.SEQ) AS SEPARATOR_NUM,
S1.SEQ AS SEPARATOR_POS
FROM SEQUENCE_TAB S1
WHERE
SUBSTRING (',' + I1.input_string + ',' , S1.seq, 1) = ','
) AS SEPARATORS

then we could self-join it like this:

SELECT
(S1.SEPARATOR_P OS + 1) AS TOKEN_FROM
(S2.SEPARATOR_P OS - S1.SEPARATOR_PO S - 1) AS TOKEN.LENGTH
FROM SEPATATORS S1
JOIN SEPARATORS S2
ON (S1.SEPARATOR_N UM + 1)=S2.SEPARATOR _NUM

What do you think?
Nov 12 '05 #9
>> What do you think? <<

Neat but too proprietary for my taste. Remember, I am the guy who
pushes pure SQL-92 code.
Nov 12 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
5704
by: felciano | last post by:
Hi -- I am trying to use the csv module to parse a column of values containing comma-delimited values with unusual escaping: AAA, BBB, CCC (some text, right here), DDD I want this to come back as:
2
17947
by: GIMME | last post by:
I can't figure an expression needed to parse a string. This problem arrises from parsing Excel csv files ... The expression must parse a string based upon comma delimiters, but if a comma appears in double quotes it should not be used for parsing. For example in the simple case we'd have : $a='hello,brave,world';
8
1723
by: Darius Fatakia | last post by:
Hello, I have a file that I have opened for reading and this file contains lines with several different types of constraint information. For example, here are a few lines: length(0) = 10 Duration of task 0 is 10. needs(16,1) Operation 16 uses resource 1. before(49,9) Operation 49 must be before operation 9.
26
6878
by: SL33PY | last post by:
Hi, I'm having a problem parsing strings (comming from a flat text input file) to doubles. the code: currentImportDetail.Result = CType(line.Substring(7, 8).Trim(" "), System.Double) What is in my Watch:
2
2771
by: Trip | last post by:
Hello all, Is there anyway to store an array to MS Access 2002 without parsing the entire array row-by-row. For example, Oracle allows you to store BLOBs (binary large objects). I would like to be able to say in VB/VBA With rstName .AddNew !fieldName = ArrayName .Update
15
9560
by: VMI | last post by:
I'm parsing a comma-delimited record but I want it to do something if some of the string is between "". How can I do this? With the Excel import it does it correct. I'm using String.Split(). Basically, this is what I want to do: Use string.Split() on the whole string UNLESS the string is in between double-quotes. The part of the string in-between the "" will be ignored by String.Split Thanks.
4
1392
by: igotyourdotnet | last post by:
I have a question. I'm reading a CSV file that is uploading to my SQL db, I'm parsing out the file line by line. I'm getting the values and putting them into an arrayList seperate by commas. The problem I'm having is that one of the data values has commas in it so its blowing up on the other fields. How can I remove the commas from my string if they exist? example: Getting this BMW, Used, 325C, $19,252.00, Smith
4
1814
by: william | last post by:
Hello, I've imported an excel spreadsheet with a Name column which is formatted as Last, First, MI. Some examples I have in the Name column: Smith, Ellen P. Jones, Mary Jane Blackman-Pearson, Betsy D. Wright, George
9
1766
by: Jasper | last post by:
Hi, I have multiple data files which need parsing in realtime so high performance is *crucial*. I dont have a format definition, but from what I can see there is a hierarchy of data. Each data field is named thus <"name":(the <are mine). The data can be quoted text or unquoted text or a composite hierarcy field. Each name/data pair is terminated by a comma unless it is the last in the
4
1651
mickey0
by: mickey0 | last post by:
hello, I have to parsing a string like this: char * = "10 20 30 40"; and put its number into a vector<double> I thought to use strchr() and atof() and it's seems ok but to say the truth the line could be at times a little different: char * = "10 20 30 40 "; char * = "10,20,30,40"; char * = "10, 20,30, 40 "; char * = "10 20,30 40 "; all these above are valid line; instead this isn't': char * = "10 ...
0
9641
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9480
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10313
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10146
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10080
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8968
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6735
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5378
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3643
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.