459,707 Members | 1,605 Online
Need help? Post your question and get tips & solutions from a community of 459,707 IT Pros & Developers. It's quick & easy.

# How to find max value to find partially duplicate rows

 P: 33 I have a table1 id col1 col2 1 NC_001A>T 5 2 NC_001C>G 4 3 NC_001G>C 3 4 NC_001_98_G>C 1 Now, I want to have max col2 value for partially duplicate col1 value for example if I take partial value for col 1 values NC_001A>T, NC_001C>G and NC_OO1G>C, I can write following sql query Expand|Select|Wrap|Line Numbers  select col1, MAX (col2) from table1 where col1 = ('NC_001')group by col1  Result: id col1 col2 1 NC_001A>T 5 But I am not sure how to find max value for all similar partically duplicate rows i.e NC_002, NC_003 and so on in the table Thanking in advance. Sep 12 '12 #1

I found a smarter way to do this. The below query fetches the row from col1 value where no other row exists with the same col1 value and a greater col2 values for that.

Expand|Select|Wrap|Line Numbers
1.
2. SELECT *
3. FROM table1 as t1
4. LEFT OUTER JOIN table1 AS t2
5. ON (left(t1.col1,LEN(t1.col1) - 3)= left(t2.col1,LEN(t2.col1) - 3) AND t1.col2 < t2.col2)
6. WHERE t2.col1 IS Null
7.
8.

13 Replies

 Expert Mod 10K+ P: 12,430 First off, you have to define "partially duplicate" since that can mean a hundred different things. Sep 12 '12 #2

 P: 33 Partially duplicate values are found in similar pattern in col1. for example consider NC_001A>T, NC_001C>G are one pair partially similar NC_002A>T, NC_002C_G are another pair partially similar they all differ by having expression such as C>G or G>T Sep 12 '12 #3

 Expert Mod 5K+ P: 5,397 ChinTan I think that you're going to have to start by defining what is the same about the data and what makes it the largest. -z Sep 12 '12 #4

 Expert Mod 10K+ P: 12,430 That still doesn't explain exactly what you mean by "partially duplicate". In your first post, is that fourth item (NC_001_98_G>C) a partial duplicate? Are you defining partial duplicate as the first 6 characters? Which would mean the fourth item is a duplicate. Or do you mean that if you strip off the last 3 characters, if the remainder matches, then it's a duplicate? In which case the fourth item is not a duplicate? Or do you have some other definition of duplicate? Sep 13 '12 #5

 P: 33 Sorry I was not clear but please exclude row 4 NC_001_98_G>C 1. Let just concentrate on values which are same when stripping of last three character (i.e A>T or it could be c>G or G>T etc..) "Or do you mean that if you strip off the last 3 characters, if the remainder matches, then it's a duplicate?" Yes, I want to write a sql query that matches these values and find max 'col2' value for them. Is that clear? Sep 13 '12 #6

 Expert Mod 10K+ P: 12,430 What you can do then is use a combination of the left function with the len function to strip off the last 3 characters. Use that in place of your col1 in your previous query and that will give you what you want. Sep 13 '12 #7

 P: 33 Expand|Select|Wrap|Line Numbers select left(col1,LEN(col1) - 3), MAX (col2) from table1 group by col1   The above the query results in: NC_001 5 which is max from 1 NC_001A>T 5 2 NC_001C>G 4 3 NC_001G>C 3 now how can I modify query to produce result NC_001A>T 5 which is max from Sep 13 '12 #8

 P: 33 Oh I am sorry the query failed. Can you correct me what I am doing wrong Expand|Select|Wrap|Line Numbers select left(col1,LEN(col1) - 3), MAX (col2) from table1 group by col1 I have tried to use where clause but it failed Sep 13 '12 #9

 Expert Mod 10K+ P: 12,430 There's no where clause in your query. You can get the original value by joining it back to the table on the partial key and value. Sep 13 '12 #10

 P: 33 can you modify query to give correct result Expand|Select|Wrap|Line Numbers select left(col1,LEN(col1) - 3) as , MAX (col2) from table1 group by col1   Sep 13 '12 #11

 Expert Mod 10K+ P: 12,430 You need to group by your new calculated field. You also need to give your calculated field an alias. Sep 13 '12 #12

 P: 33 I found a smarter way to do this. The below query fetches the row from col1 value where no other row exists with the same col1 value and a greater col2 values for that. Expand|Select|Wrap|Line Numbers   SELECT * FROM table1 as t1 LEFT OUTER JOIN table1 AS t2 ON (left(t1.col1,LEN(t1.col1) - 3)= left(t2.col1,LEN(t2.col1) - 3) AND t1.col2 < t2.col2) WHERE t2.col1 IS Null     Sep 17 '12 #13

 Expert 2.5K+ P: 2,878 Make sure t2.col1 does not allow NULL. Also, if this will be executed against a large table, use NOT EXISTS instead. Good Luck!!! ~~ CK Sep 17 '12 #14