473,406 Members | 2,451 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

How to find max value to find partially duplicate rows

I have a table1

id col1 col2
1 NC_001A>T 5
2 NC_001C>G 4
3 NC_001G>C 3
4 NC_001_98_G>C 1

Now, I want to have max col2 value for partially duplicate col1 value

for example if I take partial value for col 1 values NC_001A>T, NC_001C>G and NC_OO1G>C, I can write following sql query

Expand|Select|Wrap|Line Numbers
  1.  select col1, MAX (col2) from table1 where col1 = ('NC_001')group by col1 
Result:
id col1 col2
1 NC_001A>T 5



But I am not sure how to find max value for all similar partically duplicate rows i.e NC_002, NC_003 and so on in the table

Thanking in advance.
Sep 12 '12 #1

✓ answered by chintan85

I found a smarter way to do this. The below query fetches the row from col1 value where no other row exists with the same col1 value and a greater col2 values for that.


Expand|Select|Wrap|Line Numbers
  1.  
  2. SELECT *
  3. FROM table1 as t1
  4. LEFT OUTER JOIN table1 AS t2
  5. ON (left(t1.col1,LEN(t1.col1) - 3)= left(t2.col1,LEN(t2.col1) - 3) AND t1.col2 < t2.col2)
  6. WHERE t2.col1 IS Null
  7.  
  8.  

13 6280
Rabbit
12,516 Expert Mod 8TB
First off, you have to define "partially duplicate" since that can mean a hundred different things.
Sep 12 '12 #2
Partially duplicate values are found in similar pattern in col1.

for example consider
NC_001A>T, NC_001C>G are one pair partially similar
NC_002A>T, NC_002C_G are another pair partially similar

they all differ by having expression such as C>G or G>T
Sep 12 '12 #3
zmbd
5,501 Expert Mod 4TB
ChinTan
I think that you're going to have to start by defining what is the same about the data and what makes it the largest.

-z
Sep 12 '12 #4
Rabbit
12,516 Expert Mod 8TB
That still doesn't explain exactly what you mean by "partially duplicate". In your first post, is that fourth item (NC_001_98_G>C) a partial duplicate? Are you defining partial duplicate as the first 6 characters? Which would mean the fourth item is a duplicate. Or do you mean that if you strip off the last 3 characters, if the remainder matches, then it's a duplicate? In which case the fourth item is not a duplicate? Or do you have some other definition of duplicate?
Sep 13 '12 #5
Sorry I was not clear but please exclude row 4 NC_001_98_G>C 1.
Let just concentrate on values which are same when stripping of last three character (i.e A>T or it could be c>G or G>T etc..)

"Or do you mean that if you strip off the last 3 characters, if the remainder matches, then it's a duplicate?"

Yes, I want to write a sql query that matches these values and find max 'col2' value for them.

Is that clear?
Sep 13 '12 #6
Rabbit
12,516 Expert Mod 8TB
What you can do then is use a combination of the left function with the len function to strip off the last 3 characters. Use that in place of your col1 in your previous query and that will give you what you want.
Sep 13 '12 #7
Expand|Select|Wrap|Line Numbers
  1. select left(col1,LEN(col1) - 3), MAX (col2) from table1 group by col1
  2.  
The above the query results in:
NC_001 5

which is max from
1 NC_001A>T 5
2 NC_001C>G 4
3 NC_001G>C 3


now how can I modify query to produce result
NC_001A>T 5

which is max from
Sep 13 '12 #8
Oh I am sorry the query failed. Can you correct me what I am doing wrong

Expand|Select|Wrap|Line Numbers
  1. select left(col1,LEN(col1) - 3), MAX (col2) from table1 group by col1
I have tried to use where clause but it failed
Sep 13 '12 #9
Rabbit
12,516 Expert Mod 8TB
There's no where clause in your query.

You can get the original value by joining it back to the table on the partial key and value.
Sep 13 '12 #10
can you modify query to give correct result
Expand|Select|Wrap|Line Numbers
  1. select left(col1,LEN(col1) - 3) as , MAX (col2) from table1 group by col1
  2.  
Sep 13 '12 #11
Rabbit
12,516 Expert Mod 8TB
You need to group by your new calculated field. You also need to give your calculated field an alias.
Sep 13 '12 #12
I found a smarter way to do this. The below query fetches the row from col1 value where no other row exists with the same col1 value and a greater col2 values for that.


Expand|Select|Wrap|Line Numbers
  1.  
  2. SELECT *
  3. FROM table1 as t1
  4. LEFT OUTER JOIN table1 AS t2
  5. ON (left(t1.col1,LEN(t1.col1) - 3)= left(t2.col1,LEN(t2.col1) - 3) AND t1.col2 < t2.col2)
  6. WHERE t2.col1 IS Null
  7.  
  8.  
Sep 17 '12 #13
ck9663
2,878 Expert 2GB
Make sure t2.col1 does not allow NULL. Also, if this will be executed against a large table, use NOT EXISTS instead.

Good Luck!!!


~~ CK
Sep 17 '12 #14

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: Russell | last post by:
I'm using MySQL 4.1.1 I've inherited a database which has some (almost) duplicate rows. The databse is like this. userID userPosition userDepartment
1
by: g_chime | last post by:
How can I make MySQL reject duplicate rows when inserting? The table is quite large: 100+ rows and millions of rows. MySQL doesn't allow me to create a UNIQUE INDEX for all the rows (16 rows...
3
by: dan graziano | last post by:
Hi, How do you suggest is the best way to check for duplicate rows in an access table. And once one knows if there are duplicates, to remove all but one. In my access table, there are 5...
1
by: TaeHo Yoo | last post by:
I have a table that has more than 1 milion rows so practically it is impossible to remove all duplicate rows by hand. Could you help me to remove those duplicate rows at all? This table doesn't...
1
by: Asha | last post by:
greetings, does anyone have any idea how to delete duplicate rows in a dataset gracefully (the fast and easy way)
6
by: Bob Stearns | last post by:
I am getting duplicate rows back from a select distinct statement of the form: SELECT DISTINCT 'jhough', '000111', t0.bhid FROM (SELECT lots of good stuff) t0 LEFT OUTER JOIN another_table ...
7
by: Jon Maz | last post by:
Hi, I have a MySql problem I hope someone can help me with. I'm trying to run an update on a linking table, the update is running into a Primary Key constraint violation, and in my workaround...
1
by: hung tran | last post by:
Hi, Here is the code to eliminate duplicate rows, but what if I want to keep them and elimiante all other the unique ? public void RemoveDuplicateRows(DataTable dTable,string colName) {...
5
jamesd0142
by: jamesd0142 | last post by:
My manager and I where looking at some complex code to eliminate duplicate records in a database table. then it hit me how its done easily... so i thought i'd share it... In English:...
1
by: Phil Latio | last post by:
I have a number of spreadsheets, each with between 1000-6000 rows (each row is a property) and they all need to be combined into a single database. Each spreadsheet contains slightly different...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.