473,287 Members | 1,663 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,287 software developers and data experts.

Cron job to remove logically redundant entries in Postgres SQL

I have a requirement to delete records from a Postgres SQL table which has more than 200 million records. The table is not having any primary key.

The sample table (Bookmark is the name of table) content is as below:

Expand|Select|Wrap|Line Numbers
  1.     systemId     filename            mindatetime                    maxdatetime
  2.       70277        monitor_1.dat    2019-04-21 08:00:00 AM        2019-04-21 03:10:00 PM
  3.       10006        monitor_2.dat    2019-04-25 10:00:00 AM        2019-04-25 11:30:00 AM
  4.       10006        monitor_3.dat    2019-04-28 08:00:00 AM        2019-04-28 10:00:00 AM
  5.       10006        monitor_3.dat    2019-04-28 09:00:00 AM        2019-04-28 11:00:00 AM
  6.       10006        monitor_3.dat    2019-04-28 07:00:00 AM        2019-04-28 04:00:00 PM
  7.       8368        monitor_1.dat    2019-05-21 11:00:00 AM        2019-05-21 11:30:00 AM
  8.       8368        monitor_7.dat    2019-05-21 06:00:00 AM        2019-05-21 11:00:00 AM
  9.       8368        monitor_5.dat    2019-05-23 08:00:00 AM        2019-05-23 10:00:00 AM

The cron job should run on a given schedule to delete the records which are logically redundant.

To explain this let's take the case of systemId '10006' where filename is 'monitor_3.dat' having 3 entries with min and max date timestamp of the same day.

Logically we can delete the entries having mindatetime 08:00:00 AM and 09:00:00 AM, maxdatetime 10:00:00 AM, 11:00:00 AM as that interval is being covered by the other entry which has mindatetime as 7 AM and maxdatetime as 4 PM.

So those entries would fall under this interval and the job should identify such entries in the entire table and delete them.

My resultant output table content in this case should be:

Expand|Select|Wrap|Line Numbers
  1.     systemId     filename            mindatetime                    maxdatetime
  2.       70277        monitor_1.dat    2019-04-21 08:00:00 AM        2019-04-21 03:10:00 PM
  3.       10006        monitor_2.dat    2019-04-25 10:00:00 AM        2019-04-25 11:30:00 AM
  4.       10006        monitor_3.dat    2019-04-28 07:00:00 AM        2019-04-28 04:00:00 PM
  5.       8368        monitor_1.dat    2019-05-21 11:00:00 AM        2019-05-21 11:30:00 AM
  6.       8368        monitor_7.dat    2019-05-21 06:00:00 AM        2019-05-21 11:00:00 AM
  7.       8368        monitor_5.dat    2019-05-23 08:00:00 AM        2019-05-23 10:00:00 AM
The table size is more than 20Gb on disk so I was exploring writing a sql procedure or job to achieve this but not able to make much progress. Any ideas or suggestions for overcoming this complex scenario?
Jun 14 '19 #1
1 1842
Rabbit
12,516 Expert Mod 8TB
What happens if they only partially overlap? What happens if they fully overlap but over two different entries? What happens if they overlap but the start and end are the same? You need to clearly define the requirements otherwise you're going to run into trouble down the road.

Whatever the case may be, the answer will probably be to join the table to itself to find overlapping entries. How you formulate that join will depend on what you need to happen in the scenarios above.
Jun 14 '19 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: MLH | last post by:
I never quite figured out how to reconfigure it to automatically delete redundant entries. Of course, one cannot always blatantly blow redundant records away w/o regard to which one it is that you...
2
by: Robert Fitzpatrick | last post by:
I'm sure this has been discussed many times. I can find references to the problem in the archives, but decided to query the list here instead of sifting through archives all day. Is there a way...
5
by: Prabu Subroto | last post by:
Dear my friends... I am using SuSE Linux 9.1 and postgres. I am a beginner in postgres, usually I use MySQL. I have 3 tables : appointment, appointment0 and appointment1. the fields of...
7
by: Jed Parsons | last post by:
Hi, I'm using the logging module for the first time. I'm using it from within Zope Extensions. My problem is that, for every event logged, the logger is producing multiple identical entries...
6
by: BG Mahesh | last post by:
hi I am using PHP 5.0.4 on OpenSuse 10.x. I have the following piece of code, $sp1 = $_SERVER; $sp1 is set correctly when I execute file.php thru the browser. But when I run that script...
1
by: lynux | last post by:
Hye, I'm quite new in postgres.I have 2 problems that i do not how to solve. Hope somebody can help me. 1. Although i put my zone field is unique, postgres sometimes redundant my data...
2
by: =?ISO-8859-15?Q?Ma=EBl_Benjamin_Mettler?= | last post by:
Hello list! I need to repopulate PyGTK ComboBox on a regular basis. In order to do so I have to remove all the entries and then add the new ones. I tried to remove all entries like that: def...
12
by: joestevens232 | last post by:
Hello Im having problems figuring out how to remove the duplicate entries in an array...Write a program that accepts a sequence of integers (some of which may repeat) as input into an array. Write...
3
by: YoungJohn | last post by:
I'm extracting data from a database to create a table. My table includes a text field 'Postcode' for postcodes. Sometimes the extracted postcodes are in the format SL37HY and in other...
3
by: davidiwharper | last post by:
Hello everyone, I am running a database maintenance script to remove old entries from a log file. The script runs as expected when initiated manually: ./maintain.pl $HOME/website/database/...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: marcoviolo | last post by:
Dear all, I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.