I assume that you meant to have a real table, with keys and
constraints instead of what you posted, so let's fix it.
CREATE TABLE Mdata
(m_id INTEGER NOT NULL PRIMARY KEY,
club CHAR(7) NOT NULL,
ifc CHAR(4) NOT NULL);
CREATE TABLE Samples
(m_id INTEGER NOT NULL PRIMARY KEY,
club CHAR(7) NOT NULL,
ifc CHAR(4) NOT NULL);
DELETE FROM Mdata;
INSERT INTO Mdata VALUES (6401715, 'aarprat', 'ic17');
INSERT INTO Mdata VALUES (1058337, 'aarprat', 'ic17');
INSERT INTO Mdata VALUES (459443, 'aarpprt', 'ic25');
INSERT INTO Mdata VALUES (4018210, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (2430656, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (6802081, 'aarpprd', 'ig29');
INSERT INTO Mdata VALUES (4236511, 'aarpprd', 'ig29');
INSERT INTO Mdata VALUES (2162104, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (2073679, 'aarpprd', 'ig29');
INSERT INTO Mdata VALUES (8148891, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (1868445, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (6749213, 'aarpbas', 'ig21');
INSERT INTO Mdata VALUES (8363621, 'aarppup', 'ig29');
INSERT INTO Mdata VALUES (9999, 'aarppup', 'ic17'); -- redudant
I want a set of samples that contains at least one of each CLUB
and at least one of each IFC, but no more than necessary. <<
There can be more than one minimal solution, but if we go with the
minimum m_id number, that will give us a start which we can prune.
INSERT INTO Samples
SELECT MIN(m_id), club, ifc
FROM Mdata
GROUP BY club, ifc;
=============================
9999 aarppup ic17 <== redundant row
1058337 aarprat ic17 <== ifc
459443 aarpprt ic25
1868445 aarpbas ig21
2073679 aarpprd ig29
8363621 aarppup ig29 <== club
a VALUES (9999, 'aarppup', 'ic17'); -- redudant
We can find the candidate rows for redundancy removal:
SELECT MIN(m_id), ifc FROM Sample
GROUP BY ifc
HAVING COUNT(*) > 1
and likewise extra clubs:
SELECT MIN(m_id), club FROM Sample
GROUP BY club
HAVING COUNT(*) > 1;
Put it together:
SELECT m_id
FROM (SELECT MIN(m_id) FROM Sample GROUP BY ifc
HAVING COUNT(*) > 1
UNION ALL
SELECT MIN(m_id)FROM Sample GROUP BY club
HAVING COUNT(*) > 1) AS R(m_id)
GROUP BY R.m_id
HAVING COUNT(*) > 1;
and the final nightmare query:
BEGIN
INSERT INTO Sample
SELECT MIN(m_id) AS m_id, club, ifc
FROM Mdata
GROUP BY club, ifc;
SELECT MIN(m_id) AS m_id, club, ifc
FROM Mdata
GROUP BY club, ifc
HAVING MIN(m_id)
NOT IN
(SELECT R.m_id
FROM (SELECT MIN(m_id) FROM Sample GROUP BY ifc
HAVING COUNT(*) > 1
UNION ALL
SELECT MIN(m_id)FROM Sample GROUP BY club
HAVING COUNT(*) > 1) AS R(m_id)
GROUP BY R.m_id
HAVING COUNT(*) > 1);
END;
Warning: I have not tested this for all cases!! If we had INTERSECT
and some other SQL-92 operators, this could be more compact. I also
have no proof this is minimal, either.