By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,190 Members | 2,204 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,190 IT Pros & Developers. It's quick & easy.

How to find max value to find partially duplicate rows

P: 33
I have a table1

id col1 col2
1 NC_001A>T 5
2 NC_001C>G 4
3 NC_001G>C 3
4 NC_001_98_G>C 1

Now, I want to have max col2 value for partially duplicate col1 value

for example if I take partial value for col 1 values NC_001A>T, NC_001C>G and NC_OO1G>C, I can write following sql query

Expand|Select|Wrap|Line Numbers
  1.  select col1, MAX (col2) from table1 where col1 = ('NC_001')group by col1 
Result:
id col1 col2
1 NC_001A>T 5



But I am not sure how to find max value for all similar partically duplicate rows i.e NC_002, NC_003 and so on in the table

Thanking in advance.
Sep 12 '12 #1

✓ answered by chintan85

I found a smarter way to do this. The below query fetches the row from col1 value where no other row exists with the same col1 value and a greater col2 values for that.


Expand|Select|Wrap|Line Numbers
  1.  
  2. SELECT *
  3. FROM table1 as t1
  4. LEFT OUTER JOIN table1 AS t2
  5. ON (left(t1.col1,LEN(t1.col1) - 3)= left(t2.col1,LEN(t2.col1) - 3) AND t1.col2 < t2.col2)
  6. WHERE t2.col1 IS Null
  7.  
  8.  

Share this Question
Share on Google+
13 Replies


Rabbit
Expert Mod 10K+
P: 12,359
First off, you have to define "partially duplicate" since that can mean a hundred different things.
Sep 12 '12 #2

P: 33
Partially duplicate values are found in similar pattern in col1.

for example consider
NC_001A>T, NC_001C>G are one pair partially similar
NC_002A>T, NC_002C_G are another pair partially similar

they all differ by having expression such as C>G or G>T
Sep 12 '12 #3

zmbd
Expert Mod 5K+
P: 5,397
ChinTan
I think that you're going to have to start by defining what is the same about the data and what makes it the largest.

-z
Sep 12 '12 #4

Rabbit
Expert Mod 10K+
P: 12,359
That still doesn't explain exactly what you mean by "partially duplicate". In your first post, is that fourth item (NC_001_98_G>C) a partial duplicate? Are you defining partial duplicate as the first 6 characters? Which would mean the fourth item is a duplicate. Or do you mean that if you strip off the last 3 characters, if the remainder matches, then it's a duplicate? In which case the fourth item is not a duplicate? Or do you have some other definition of duplicate?
Sep 13 '12 #5

P: 33
Sorry I was not clear but please exclude row 4 NC_001_98_G>C 1.
Let just concentrate on values which are same when stripping of last three character (i.e A>T or it could be c>G or G>T etc..)

"Or do you mean that if you strip off the last 3 characters, if the remainder matches, then it's a duplicate?"

Yes, I want to write a sql query that matches these values and find max 'col2' value for them.

Is that clear?
Sep 13 '12 #6

Rabbit
Expert Mod 10K+
P: 12,359
What you can do then is use a combination of the left function with the len function to strip off the last 3 characters. Use that in place of your col1 in your previous query and that will give you what you want.
Sep 13 '12 #7

P: 33
Expand|Select|Wrap|Line Numbers
  1. select left(col1,LEN(col1) - 3), MAX (col2) from table1 group by col1
  2.  
The above the query results in:
NC_001 5

which is max from
1 NC_001A>T 5
2 NC_001C>G 4
3 NC_001G>C 3


now how can I modify query to produce result
NC_001A>T 5

which is max from
Sep 13 '12 #8

P: 33
Oh I am sorry the query failed. Can you correct me what I am doing wrong

Expand|Select|Wrap|Line Numbers
  1. select left(col1,LEN(col1) - 3), MAX (col2) from table1 group by col1
I have tried to use where clause but it failed
Sep 13 '12 #9

Rabbit
Expert Mod 10K+
P: 12,359
There's no where clause in your query.

You can get the original value by joining it back to the table on the partial key and value.
Sep 13 '12 #10

P: 33
can you modify query to give correct result
Expand|Select|Wrap|Line Numbers
  1. select left(col1,LEN(col1) - 3) as , MAX (col2) from table1 group by col1
  2.  
Sep 13 '12 #11

Rabbit
Expert Mod 10K+
P: 12,359
You need to group by your new calculated field. You also need to give your calculated field an alias.
Sep 13 '12 #12

P: 33
I found a smarter way to do this. The below query fetches the row from col1 value where no other row exists with the same col1 value and a greater col2 values for that.


Expand|Select|Wrap|Line Numbers
  1.  
  2. SELECT *
  3. FROM table1 as t1
  4. LEFT OUTER JOIN table1 AS t2
  5. ON (left(t1.col1,LEN(t1.col1) - 3)= left(t2.col1,LEN(t2.col1) - 3) AND t1.col2 < t2.col2)
  6. WHERE t2.col1 IS Null
  7.  
  8.  
Sep 17 '12 #13

ck9663
Expert 2.5K+
P: 2,878
Make sure t2.col1 does not allow NULL. Also, if this will be executed against a large table, use NOT EXISTS instead.

Good Luck!!!


~~ CK
Sep 17 '12 #14

Post your reply

Sign in to post your reply or Sign up for a free account.