473,385 Members | 1,846 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Query help: need at least one row for each value in each of two columns

The purpose, as you can probably guess, is to produce a set of sample
documents from a large document run. The data row has a CLUB column and an
IFC column; I want a set of samples that contains at least one of each CLUB
and at least one of each IFC, but no more than necessary.

Example schema and data:

CREATE TABLE mDATA (
ID INTEGER,
CLUB CHAR(7),
IFC CHAR(4)
);

INSERT INTO mDATA (ID,CLUB,IFC) values (6401715,'AARPRAT','IC17')
INSERT INTO mDATA (ID,CLUB,IFC) values (1058337,'AARPRAT','IC17')
INSERT INTO mDATA (ID,CLUB,IFC) values (459443,'AARPPRT','IC25')
INSERT INTO mDATA (ID,CLUB,IFC) values (4018210,'AARPBAS','IG21')
INSERT INTO mDATA (ID,CLUB,IFC) values (2430656,'AARPBAS','IG21')
INSERT INTO mDATA (ID,CLUB,IFC) values (6802081,'AARPPRD','IG29')
INSERT INTO mDATA (ID,CLUB,IFC) values (4236511,'AARPPRD','IG29')
INSERT INTO mDATA (ID,CLUB,IFC) values (2162104,'AARPBAS','IG21')
INSERT INTO mDATA (ID,CLUB,IFC) values (2073679,'AARPPRD','IG29')
INSERT INTO mDATA (ID,CLUB,IFC) values (8148891,'AARPBAS','IG21')
INSERT INTO mDATA (ID,CLUB,IFC) values (1868445,'AARPBAS','IG21')
INSERT INTO mDATA (ID,CLUB,IFC) values (6749213,'AARPBAS','IG21')
INSERT INTO mDATA (ID,CLUB,IFC) values (8363621,'AARPPUP','IG29')

Nov 12 '05 #1
4 1570
This will produce a sample containing at least one row of each value in the
two columns. It isn't guaranteed to give the *minimum* subset but it should
contain at most C + I - 1 rows, where C is the number of distinct Clubs and
I the number of distinct IFCs.

SELECT id,club,ifc
FROM mdata AS M
WHERE id
IN
(SELECT MIN(id)
FROM mdata
GROUP BY club
UNION ALL
SELECT MIN(id)
FROM mdata AS M2
GROUP BY ifc
HAVING NOT EXISTS
(SELECT *
FROM mdata
WHERE id IN
(SELECT MIN(id)
FROM mdata
GROUP BY club)
AND ifc = M2.ifc))

I'm assuming that ID is unique and that there are no NULLs in any of the
columns.

I'm not sure it's possible to find the minimum subset every time unless you
use an iterative solution.

--
David Portas
SQL Server MVP
--
Nov 12 '05 #2
On Mon, 10 May 2004 17:29:20 +0100, David Portas wrote:
This will produce a sample containing at least one row of each value in the
two columns. It isn't guaranteed to give the *minimum* subset but it should
contain at most C + I - 1 rows, where C is the number of distinct Clubs and
I the number of distinct IFCs.
[snip]
I'm not sure it's possible to find the minimum subset every time unless you
use an iterative solution.


Thanks. Can you give me an idea as to what an iterative solution might look
like? I played around with the 6-step system below, and found that the
number of rows resulting depended on both the order of the insert queries
AND on whether I used MAX() or MIN(). I deduce that this so far is very
dependent on the exact data involved.

That said, the resulting row count only varied from 403 to 410, for a set
where C=325, I=117... so I'm bettering the worst-case (441 by your formula)
but not by a whole lot. Do you think it could theoretically be better?

(table name changed from mData to mmm; hope that's not confusing)

/* first step: unique clubs */
SELECT Min(mmm.RandomID) AS RandomID, mmm.CLUB, Min(mmm.IFC) AS IFC
INTO AAA
FROM mmm
GROUP BY mmm.CLUB
HAVING (((Count(*))=1));

/* second step: unique IFCs where club not already there*/
INSERT INTO AAA ( RandomID, CLUB, IFC )
SELECT Min(mmm.RandomID) AS RandomID, Min(mmm.CLUB) AS CLUB, mmm.IFC AS IFC
FROM mmm
GROUP BY mmm.IFC
HAVING (((Min(mmm.CLUB)) Not In (select club from aaa)) AND
((Count(*))=1));

/* intermezzo: views for missing IFCs, missing Clubs */
CREATE VIEW A_MissingClubs AS
SELECT mmm.CLUB
FROM mmm LEFT JOIN AAA ON mmm.CLUB = AAA.CLUB
WHERE (((AAA.CLUB) Is Null))
GROUP BY mmm.CLUB;

CREATE VIEW A_MissingIFCs AS
SELECT mmm.IFC
FROM mmm LEFT JOIN AAA ON mmm.IFC = AAA.IFC
WHERE (((AAA.IFC) Is Null))
GROUP BY mmm.IFC;

/* third step: distinct missing clubs that are also missing IFCs */
INSERT INTO AAA ( RandomID, CLUB, IFC )
SELECT Min(mmm.RandomID) AS MinOfRandomID, mmm.CLUB, Min(mmm.IFC) AS IFC
FROM (mmm INNER JOIN A_MissingClubs ON mmm.CLUB = A_MissingClubs.CLUB)
INNER JOIN A_MissingIFCs ON mmm.IFC = A_MissingIFCs.IFC
GROUP BY mmm.CLUB;

/* fourth step: distinct missing IFCs that are also missing clubs */
INSERT INTO AAA ( RandomID, CLUB, IFC )
SELECT Min(mmm.RandomID) AS MinOfRandomID, Min(mmm.CLUB) AS CLUB, mmm.IFC
AS IFC
FROM (mmm INNER JOIN A_MissingClubs ON mmm.CLUB = A_MissingClubs.CLUB)
INNER JOIN A_MissingIFCs ON mmm.IFC = A_MissingIFCs.IFC
GROUP BY mmm.IFC;

/* fifth step: remaining missing IFCs */
INSERT INTO AAA ( RandomID, CLUB, IFC )
SELECT Min(mmm.RandomID) AS MinOfRandomID, Min(mmm.CLUB) AS CLUB, mmm.IFC
AS IFC
FROM mmm INNER JOIN A_MissingIFCs ON mmm.IFC = A_MissingIFCs.IFC
GROUP BY mmm.IFC;

/* sixth step: remaining missing CLUBs */
INSERT INTO AAA ( RandomID, CLUB, IFC )
SELECT Min(mmm.RandomID) AS MinOfRandomID, mmm.CLUB, Min(mmm.IFC) AS IFC
FROM mmm INNER JOIN A_MissingClubs ON mmm.CLUB = A_MissingClubs.CLUB
GROUP BY mmm.CLUB;
Nov 12 '05 #3
I assume that you meant to have a real table, with keys and
constraints instead of what you posted, so let's fix it.

CREATE TABLE Mdata
(m_id INTEGER NOT NULL PRIMARY KEY,
club CHAR(7) NOT NULL,
ifc CHAR(4) NOT NULL);

CREATE TABLE Samples
(m_id INTEGER NOT NULL PRIMARY KEY,
club CHAR(7) NOT NULL,
ifc CHAR(4) NOT NULL);

DELETE FROM Mdata;
INSERT INTO Mdata VALUES (6401715, 'aarprat', 'ic17');
INSERT INTO Mdata VALUES (1058337, 'aarprat', 'ic17');
INSERT INTO Mdata VALUES (459443, 'aarpprt', 'ic25');
INSERT INTO Mdata VALUES (4018210, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (2430656, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (6802081, 'aarpprd', 'ig29');
INSERT INTO Mdata VALUES (4236511, 'aarpprd', 'ig29');
INSERT INTO Mdata VALUES (2162104, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (2073679, 'aarpprd', 'ig29');
INSERT INTO Mdata VALUES (8148891, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (1868445, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (6749213, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (8363621, 'aarppup', 'ig29');
INSERT INTO Mdata VALUES (9999, 'aarppup', 'ic17'); -- redudant
I want a set of samples that contains at least one of each CLUB

and at least one of each IFC, but no more than necessary. <<

There can be more than one minimal solution, but if we go with the
minimum m_id number, that will give us a start which we can prune.
INSERT INTO Samples
SELECT MIN(m_id), club, ifc
FROM Mdata
GROUP BY club, ifc;

=============================
9999 aarppup ic17 <== redundant row
1058337 aarprat ic17 <== ifc
459443 aarpprt ic25
1868445 aarpbas ig21
2073679 aarpprd ig29
8363621 aarppup ig29 <== club

a VALUES (9999, 'aarppup', 'ic17'); -- redudant

We can find the candidate rows for redundancy removal:

SELECT MIN(m_id), ifc FROM Sample
GROUP BY ifc
HAVING COUNT(*) > 1

and likewise extra clubs:

SELECT MIN(m_id), club FROM Sample
GROUP BY club
HAVING COUNT(*) > 1;

Put it together:

SELECT m_id
FROM (SELECT MIN(m_id) FROM Sample GROUP BY ifc
HAVING COUNT(*) > 1
UNION ALL
SELECT MIN(m_id)FROM Sample GROUP BY club
HAVING COUNT(*) > 1) AS R(m_id)
GROUP BY R.m_id
HAVING COUNT(*) > 1;

and the final nightmare query:

BEGIN
INSERT INTO Sample
SELECT MIN(m_id) AS m_id, club, ifc
FROM Mdata
GROUP BY club, ifc;
SELECT MIN(m_id) AS m_id, club, ifc
FROM Mdata
GROUP BY club, ifc
HAVING MIN(m_id)
NOT IN
(SELECT R.m_id
FROM (SELECT MIN(m_id) FROM Sample GROUP BY ifc
HAVING COUNT(*) > 1
UNION ALL
SELECT MIN(m_id)FROM Sample GROUP BY club
HAVING COUNT(*) > 1) AS R(m_id)
GROUP BY R.m_id
HAVING COUNT(*) > 1);
END;

Warning: I have not tested this for all cases!! If we had INTERSECT
and some other SQL-92 operators, this could be more compact. I also
have no proof this is minimal, either.
Nov 12 '05 #4
On 11 May 2004 13:52:25 -0700, --CELKO-- wrote:
Warning: I have not tested this for all cases!! If we had INTERSECT
and some other SQL-92 operators, this could be more compact. I also
have no proof this is minimal, either.


Thanks for the help and the effort, but no, it's not minimal; not even
close. Using the 6-step method I outlined in my message Monday
(news:1f*****************************@40tude.net) on my real run of 52776
invoices, I ended up with a set of 403 samples. Your first INSERT step
gave 648 samples, which the "nightmare" query reduced to 629 samples.
However, in order to get it to run on my MS SQL 6.5 server, I had to modify
it a bit and split it up; I may have made an error. Here was my final
query batch; is it equivalent to what you were writing?

SELECT m_id into #celko
FROM (SELECT MIN(Sample.m_id) "m_id" FROM Sample GROUP BY ifc
HAVING COUNT(*) > 1
UNION ALL
SELECT MIN(m_id)FROM Sample GROUP BY club
HAVING COUNT(*) > 1) "Z" group by m_id having count(*) > 1
go
SELECT MIN(m_id) AS m_id, club, ifc
FROM Mdata
GROUP BY club, ifc
HAVING MIN(m_id)
NOT IN (select m_id from #celko)
Interestingly, David Portas's solution produced a result of 405 samples,
which was almost as good as my 6-step solution.

The fact that my final app needs to run with an MS Access backend (please
control your gagging :) ) rather than a SQL Server backend will probably
make me end up with my 6-step solution, unless anyone comes with something
both better and simpler. :) But I thank you both for all your time and
suggestions; this has been educational.
Nov 12 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: netpurpose | last post by:
I need to extract data from this table to find the lowest prices of each product as of today. The product will be listed/grouped by the name only, discarding the product code - I use...
8
by: Lauren Quantrell | last post by:
In VBA, I constructed the following to update all records in tblmyTable with each records in tblmyTableTEMP having the same UniqueID: UPDATE tblMyTable RIGHT JOIN tblMyTableTEMP ON...
5
by: Ross Presser | last post by:
The purpose, as you can probably guess, is to produce a set of sample documents from a large document run. The data row has a CLUB column and an IFC column; I want a set of samples that contains...
11
by: Eugenio | last post by:
Excuse me in advance fo my little English. I've got this stored procedure **************************************************************************** ********** declare @Azienda as...
5
by: Rated R1 | last post by:
I wrote this before in the NGs, so I am going to paste the responses that I got and see if someone can please help me. Email me and we can set something up as Id even be willing to pay for your...
7
by: serge | last post by:
How can I run a single SP by asking multiple sales question either by using the logical operator AND for all the questions; or using the logical operator OR for all the questions. So it's always...
24
by: clare at snyder.on.ca | last post by:
I have a SQL query I need to design to select name and email addresses for policies that are due and not renewed in a given time period. The problem is, the database keeps the information for every...
5
by: Wired Hosting News | last post by:
I tried to be breif and give a scenario so as not to be overlooked because it was soooo long. Let me give you real world. I am a manufacturer of goods and produce 11 items that are distributed...
0
by: RCapps | last post by:
When running the below SQL Query I keep getting the following error: Server: Msg 4924, Level 16, State 1, Line 1 ALTER TABLE DROP COLUMN failed because column 'ContractDef' does not exist in table...
4
by: Haas C | last post by:
Hi all, I have a table with two columns, labeled Year and Loss. In the Year field, I have the numbers 1 to 10,000, each which can or cannot repeat. In the Loss column, i have numbers...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.