473,402 Members | 2,061 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,402 software developers and data experts.

Cron job to remove logically redundant entries in Postgres SQL

I have a requirement to delete records from a Postgres SQL table which has more than 200 million records. The table is not having any primary key.

The sample table (Bookmark is the name of table) content is as below:

Expand|Select|Wrap|Line Numbers
  1.     systemId     filename            mindatetime                    maxdatetime
  2.       70277        monitor_1.dat    2019-04-21 08:00:00 AM        2019-04-21 03:10:00 PM
  3.       10006        monitor_2.dat    2019-04-25 10:00:00 AM        2019-04-25 11:30:00 AM
  4.       10006        monitor_3.dat    2019-04-28 08:00:00 AM        2019-04-28 10:00:00 AM
  5.       10006        monitor_3.dat    2019-04-28 09:00:00 AM        2019-04-28 11:00:00 AM
  6.       10006        monitor_3.dat    2019-04-28 07:00:00 AM        2019-04-28 04:00:00 PM
  7.       8368        monitor_1.dat    2019-05-21 11:00:00 AM        2019-05-21 11:30:00 AM
  8.       8368        monitor_7.dat    2019-05-21 06:00:00 AM        2019-05-21 11:00:00 AM
  9.       8368        monitor_5.dat    2019-05-23 08:00:00 AM        2019-05-23 10:00:00 AM

The cron job should run on a given schedule to delete the records which are logically redundant.

To explain this let's take the case of systemId '10006' where filename is 'monitor_3.dat' having 3 entries with min and max date timestamp of the same day.

Logically we can delete the entries having mindatetime 08:00:00 AM and 09:00:00 AM, maxdatetime 10:00:00 AM, 11:00:00 AM as that interval is being covered by the other entry which has mindatetime as 7 AM and maxdatetime as 4 PM.

So those entries would fall under this interval and the job should identify such entries in the entire table and delete them.

My resultant output table content in this case should be:

Expand|Select|Wrap|Line Numbers
  1.     systemId     filename            mindatetime                    maxdatetime
  2.       70277        monitor_1.dat    2019-04-21 08:00:00 AM        2019-04-21 03:10:00 PM
  3.       10006        monitor_2.dat    2019-04-25 10:00:00 AM        2019-04-25 11:30:00 AM
  4.       10006        monitor_3.dat    2019-04-28 07:00:00 AM        2019-04-28 04:00:00 PM
  5.       8368        monitor_1.dat    2019-05-21 11:00:00 AM        2019-05-21 11:30:00 AM
  6.       8368        monitor_7.dat    2019-05-21 06:00:00 AM        2019-05-21 11:00:00 AM
  7.       8368        monitor_5.dat    2019-05-23 08:00:00 AM        2019-05-23 10:00:00 AM
The table size is more than 20Gb on disk so I was exploring writing a sql procedure or job to achieve this but not able to make much progress. Any ideas or suggestions for overcoming this complex scenario?
Jun 14 '19 #1
1 1845
Rabbit
12,516 Expert Mod 8TB
What happens if they only partially overlap? What happens if they fully overlap but over two different entries? What happens if they overlap but the start and end are the same? You need to clearly define the requirements otherwise you're going to run into trouble down the road.

Whatever the case may be, the answer will probably be to join the table to itself to find overlapping entries. How you formulate that join will depend on what you need to happen in the scenarios above.
Jun 14 '19 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: MLH | last post by:
I never quite figured out how to reconfigure it to automatically delete redundant entries. Of course, one cannot always blatantly blow redundant records away w/o regard to which one it is that you...
2
by: Robert Fitzpatrick | last post by:
I'm sure this has been discussed many times. I can find references to the problem in the archives, but decided to query the list here instead of sifting through archives all day. Is there a way...
5
by: Prabu Subroto | last post by:
Dear my friends... I am using SuSE Linux 9.1 and postgres. I am a beginner in postgres, usually I use MySQL. I have 3 tables : appointment, appointment0 and appointment1. the fields of...
7
by: Jed Parsons | last post by:
Hi, I'm using the logging module for the first time. I'm using it from within Zope Extensions. My problem is that, for every event logged, the logger is producing multiple identical entries...
6
by: BG Mahesh | last post by:
hi I am using PHP 5.0.4 on OpenSuse 10.x. I have the following piece of code, $sp1 = $_SERVER; $sp1 is set correctly when I execute file.php thru the browser. But when I run that script...
1
by: lynux | last post by:
Hye, I'm quite new in postgres.I have 2 problems that i do not how to solve. Hope somebody can help me. 1. Although i put my zone field is unique, postgres sometimes redundant my data...
2
by: =?ISO-8859-15?Q?Ma=EBl_Benjamin_Mettler?= | last post by:
Hello list! I need to repopulate PyGTK ComboBox on a regular basis. In order to do so I have to remove all the entries and then add the new ones. I tried to remove all entries like that: def...
12
by: joestevens232 | last post by:
Hello Im having problems figuring out how to remove the duplicate entries in an array...Write a program that accepts a sequence of integers (some of which may repeat) as input into an array. Write...
3
by: YoungJohn | last post by:
I'm extracting data from a database to create a table. My table includes a text field 'Postcode' for postcodes. Sometimes the extracted postcodes are in the format SL37HY and in other...
3
by: davidiwharper | last post by:
Hello everyone, I am running a database maintenance script to remove old entries from a log file. The script runs as expected when initiated manually: ./maintain.pl $HOME/website/database/...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.