473,555 Members | 2,307 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

functional dependencies from column correlation


(querying here and c.d.t didn't turn up anything, so......)

let's say you've got an existing table in an existing db. said
table has a unique index of 12 columns, 3 "data" columns, and
millions of rows. this all goes over the wire to a browser client,
since no one knows what to decompose the table into (a priori or
analytically).

the examples one sees are stone obvious: teachers and classrooms and
the like. here, the data is crufty and pre-dates most working with
it.

taking an empirical approach to finding FDs seems to amount to
finding the columns with high correlation in the data, then beating
on the SMEs until they make sense of it.

are there any obvious tools (O/S or not) that do this?? or must we
all roll our own?

thx,
btdb

Nov 12 '05 #1
7 1962
BobTheDatabaseB oy wrote:
(querying here and c.d.t didn't turn up anything, so......)

let's say you've got an existing table in an existing db. said
table has a unique index of 12 columns, 3 "data" columns, and
millions of rows. this all goes over the wire to a browser client,
since no one knows what to decompose the table into (a priori or
analytically).

the examples one sees are stone obvious: teachers and classrooms and
the like. here, the data is crufty and pre-dates most working with
it.

taking an empirical approach to finding FDs seems to amount to
finding the columns with high correlation in the data, then beating
on the SMEs until they make sense of it.

are there any obvious tools (O/S or not) that do this?? or must we
all roll our own?

thx,
btdb

I ran this through google: inferring functional dependencies db2
results in a lot of stuff.
These one may be a good start:
www.cs.yorku.ca/~jarek/papers/icde01/paper.ps
http://www.cs.yorku.ca/~jarek/papers/sigmod01/paper.ps
http://www.sigmod.org/sigmod/record/...jeanmpetit.pdf
....
Also presumably this is datamining turf, so hunting in that area may
yield something.

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 12 '05 #2
> are there any obvious tools (O/S or not) that do this?? or must
we all roll our own?


The tool that I hear about most often is 'evoke'
http://www.evokesoft.com/
But there are others in the data profiling category that can probably
help here.

Personally, I find that the productivity loss experienced in doing it
by hand is typically less than the productivity loss of getting a
purchase through procurement in most large organizations. ;-)

Also, there's so much that a profiling tool can't determine: missing
data, pending requirements, how the data is interpreted by other
systems, etc, etc. Still, if I had a large enough undocumented and
misunderstood system, I'd definitely look for any help possible.

Serge, DB2 Cube Views can describe functional dependency constraints.
Any chance that these constraints will find their way into udb?

ken

Nov 12 '05 #3

mea culpa, mea culpa, mea maxima culpa.

inferring i didn't use, deriving, and the like. i'll
surely follow the bread crumbs.

thx,
btdb

Serge Rielau wrote:
BobTheDatabaseB oy wrote:
(querying here and c.d.t didn't turn up anything, so......)

let's say you've got an existing table in an existing db. said
table has a unique index of 12 columns, 3 "data" columns, and
millions of rows. this all goes over the wire to a browser client,
since no one knows what to decompose the table into (a priori or
analytically).

the examples one sees are stone obvious: teachers and classrooms and the like. here, the data is crufty and pre-dates most working with
it.

taking an empirical approach to finding FDs seems to amount to
finding the columns with high correlation in the data, then beating
on the SMEs until they make sense of it.

are there any obvious tools (O/S or not) that do this?? or must we
all roll our own?

thx,
btdb

I ran this through google: inferring functional dependencies db2
results in a lot of stuff.
These one may be a good start:
www.cs.yorku.ca/~jarek/papers/icde01/paper.ps
http://www.cs.yorku.ca/~jarek/papers/sigmod01/paper.ps
http://www.sigmod.org/sigmod/record/...jeanmpetit.pdf
...
Also presumably this is datamining turf, so hunting in that area may
yield something.

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab


Nov 12 '05 #4
kenfar wrote:
are there any obvious tools (O/S or not) that do this?? or must
we all roll our own?

The tool that I hear about most often is 'evoke'
http://www.evokesoft.com/
But there are others in the data profiling category that can probably
help here.

Personally, I find that the productivity loss experienced in doing it
by hand is typically less than the productivity loss of getting a
purchase through procurement in most large organizations. ;-)

Also, there's so much that a profiling tool can't determine: missing
data, pending requirements, how the data is interpreted by other
systems, etc, etc. Still, if I had a large enough undocumented and
misunderstood system, I'd definitely look for any help possible.

Serge, DB2 Cube Views can describe functional dependency constraints.
Any chance that these constraints will find their way into udb?

ken

Probability 100%.
Looking into my crystal ball I see them in... DB2 V8.2

check-condition:

|--+- search-condition ----------+--------------------------------|
'- | functional-dependency | -'
Check out CREATE TABLE

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 12 '05 #5
Serge wrote:
Probability 100%.
Looking into my crystal ball I see them in... DB2 V8.2


Cool! Though I don't know how I missed that. Must have been all the
cough-syrup i was on last month. :-)

This will be very fun to play with.

Ken

Nov 12 '05 #6
kenfar wrote:
Serge wrote:
Probability 100%.
Looking into my crystal ball I see them in... DB2 V8.2

Cool! Though I don't know how I missed that. Must have been all the
cough-syrup i was on last month. :-)

This will be very fun to play with.

Ken

It's a new toy for the optimizer. I'm not sure when it looks at it.

Cheers
Serge

--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 12 '05 #7
> It's a new toy for the optimizer. I'm not sure when it looks at it.

I believe that it should look at it when performing large queries
against fact tables - to better determine if db2 can rewrite the query
to hit a summary table instead. At least according to this:
http://publib.boulder.ibm.com/infoce...lap/coptfd.htm

If I was running cube views that I suppose it would also assist in the
construction of these views.

But it also means that I can now generate, from the catalog tables, a
new type of automated tests for data validation. Given a couple of
dozen dimensions, each typically a 2-3 level deep hierarchy, that's a
lot of testing.

ken

Nov 12 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
2765
by: Prince Kumar | last post by:
I have the following query, which is takling couple of minutes to return 500+ rows! I have all the proper indexes and statistics is upto date. I correlation after the OR clause is the culprit. I have tried a few ways to get the sum and then join with the table pr. But am not getting the exact result. Can someone show me the way to...
0
3150
by: Mechain Marc | last post by:
I would like to know (if possible, there is no explanation in the = documentation) the exact meanning of the column "time" in the "show = processlist" command. Why is it sometime so hight ? When is it reinitialize (and why) ? Is there any correlation between the time column and the variables "wait = timeout" and "interactive timeout" ?
3
8101
by: Dan Hartshorn | last post by:
Anybody know if there is a system function that can be used from a stored procedure that determines if a given primary key has existing dependencies? I want to make a check for this and if there are none, I will delete the record. If there are, I will change a field called bitStatus from 1 to 0. Enterprise Mgr. does something like this under...
5
11491
by: malcolm | last post by:
Example, suppose you have these 2 tables (NOTE: My example is totally different, but I'm simply trying to setup the a simpler version, so excuse the bad design; not the point here) CarsSold { CarsSoldID int (primary key) MonthID int DealershipID int NumberCarsSold int
1
5892
by: Ben | last post by:
I have written a procedure which calls the CORREL function of Excel to run correlation analysis on two arrays, then populate a table with the resulting correlation coefficient. This process loops through several records and recordsets. The procedure works well for awhile (sometimes upwards of 10,000 times!), but then mysteriously begins...
4
2856
by: John Fabiani | last post by:
From the 7.4 docs: A column can be referenced in the form correlation.columnname correlation is the name of a table (possibly qualified with a schema name),or an alias for a table defined by means of a FROM clause, or one of the key words NEW or OLD. (NEW and OLD can only appear in rewrite rules, while other correlation names can be...
1
11462
by: Gary Wessle | last post by:
Hi I wrote a code to calculate the correlation between 2 sequence of double numbers presented as vectors. did I go about this the right way? thanks #include <vector> using std::vector;
18
22376
by: robert | last post by:
Is there a ready made function in numpy/scipy to compute the correlation y=mx+o of an X and Y fast: m, m-err, o, o-err, r-coef,r-coef-err ? Or a formula to to compute the 3 error ranges? -robert PS: numpy.corrcoef computes only the bare coeff:
3
10321
by: Vivek | last post by:
Hi, Can someone give an example of a query which produces the following warning: SQL0012W Correlation without qualification has occurred for the column "<column>".
0
7622
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
8060
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7588
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
1
5452
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5171
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3596
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
2037
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1159
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
865
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.