Steve Atkins wrote:
On Mon, Sep 20, 2004 at 02:27:41PM +0000, Matthew Wilson wrote:
I'm a noob SQL user, crossing over from SAS. I have a table with about
200k rows and one of the columns is empssn, which holds the employee
social security number. The same empssn may appear in lots of different
rows. I want to get a list of the 40 top empssns, sorted by the number
of times they appear in the table. I also want a list of the very rarest
empssns (ones that only appear once or twice).
Can anyone help me with this? BTW, this isn't a homework problem.
select empssn, count(*) from table
group by empssn
order by count(*) desc limit 40;
and
select empssn, count(*) from table
group by empssn
having count(*) < 3;
may be close to what you're looking for.
Cheers,
Steve
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
to get the 10 rarest empssns, it should be fine to do:
SELECT empssn, count(*) from table
GROUP BY empssn
ORDER BY count(*) ASC LIMIT 10;
the only thing here change from the previous post is the ASC (ascending)
or DESC (descending) sort order.
the having count(*) < 3 is good but will only return rows if there are
empssn that only occur less then 3 times in the table.
however with a table with 200 000 records there is a chance/risk that
not any empssn occurs less then 3 times, in such a case the query with
the HAVING clause will return zero rows whereas the LIMIT 10 still will
give the 10 least frequent.