problem with grouping (?)

2 New Member

Hello to all,

I am quite new to sql and I don't understand why a query has a certain output.I apologise for the long post but I'd rather describe exactly what I am doing in, hoping that someone can spot the problem that I fail to.
I have created and populated two tables in my database that look like:

Table: binding_sites

id start_bindsite end_bindsite
1 10212250 10212257
2 10105334 10105368
3 101556063 101556073

Table:genome

gene_start gene_end gene_name
10212248 10312253 gene1
10212258 10322250 gene2
101556070 102556060 gene3

What I want is to find which gene from table genome is closer to each of the sites 1, 2, 3 in table binding_sites.

I have created a nested query that calculates the linear distances between the gene_start and both the start_bindsite and end_bindsite and then with the outer select I wish to choose the minimum distance. The problem is that when I group by the start_bindsite field the output is indeed for each binding site the minimum distance but the gene_name is always gene1 , whereas for the last two binding sites it is gene2 and gene3 respectively that should be presented. And when grouping by other fields or combination of other fields doesn't seem to help.

The query:

Expand|Select|Wrap|Line Numbers

 select   temp.start, temp.end, temp.gene_name, min(temp.distance) as minimum_distance

          from (select  start_bindsite, end_bindsite, gene_name, abs(gene_start-start_bindsite) as distance

                   from genome, binding_sites

                   group by  gene_name, start_bindsite

                   union 

                  select  start_bindsite, end_bindsite, gene_name, abs(gene_start-end_bindsite) as distance

                  from genome, binding_sites 

                  group by  gene_name, start_bindsite) as temp

        group by  temp.start

And the output
+-----------+-----------+-----------+------------------+
| start | end | gene_name | minimum_distance |
+------------+-----------+-----------+------------------+
| 10105334 | 10105368 | gene1 | 106880 |
| 10212250 | 10212257 | gene1 | 1 |
| 101556063 | 101556073 | gene1 | 3 |
+-----------+-----------+------------+------------------+
3 rows in set (0.00 sec)

Does anyone have any idea what am I doing wrong here? Any hint or suggestion would be hugely appreciated!!
Thanks for your time in advance,

Eva

Jul 12 '08 #1

Subscribe Reply

1471

coolsti

310

Contributor

Without really getting into your particular query, I suspect the problem you have is similar to the problem statement of this thread:

http://bytes.com/forum/thread819577.html

and that the solution to your problem lies in my answer to this thread.

The problem is that your select query calculates the minimum distance as an output variable, and then you join your tables and do your group by. Your output before doing the group by then has the columns you have asked to be selected including this minimum distance. But when you then do the group by "id", MySQL has no way of knowing which of the rows for each "id" should be selected, and so in general will take the first row for each "id", in which case your gene name is wrong, although the minimum distance value is correct.

If I am correct in the above, there are two solutions to the problem. If you expect your output query to always only contain one row, you can do it in an easy way. Drop the "group by" clause, and just select what you wish, calculate your distance and give it some alias name, let us say "distance", and then add the clause "order by distance limit 1". This will not attempt to group by anything, but will order the rows in ascending order by the distance, and since you want the minimum distance, you limit the output to only the first row.

The above only works if your output will contain only one row. That is, you are limiting your query to handle only one "id" at a time. If you wish to have multiple rows in your output, say the row that corresponds to the minimum distance for many values of "id", then you need to use the technique I suggest in the thread I mention above.

Good luck!

Jul 13 '08 #2

evablue

New Member

Hello again,

Thank you so much for your reply, it was really helpgul.
I solved the problem after all by applying an inner join to the table described as temp and the table occuring from my whole previous query (only this time I had the same fields to the select and the group by ;-) ).
Thanks again.

Cheers,

Eva

Jul 13 '08 #3

Similar topics

display problem when query is empty!?

by: Jerome | last post by:

Hi, I've got the following problem: I want my ASP page to display a certain number (based on a COUNT query), it works fine if the result is at least 1! If there are no records to be counted...