large crontab database design - PostgreSQL Database

David Garamond

I was thinking on how one would design an optimal (performance-wise)
database of large number of schedules with crontab-like semantic. There
will potentially be hundreds of thousands or even millions of "crontab"
entries, and there will be a query run once every minute to determine
which entries are to be run for that minute. The primary goal is, of
course, for the query to run as fast as possible.

My first stab is, straightforward enough, like below:

create table schedule (
id int not null primary key,
command text
);

create table schedule_detail(
schedule_id int not null references schedule(id),
minute smallint not null,
hour smallint not null,
day smallint not null,
month smallint not null,
year smallint not null,
dayofweek smallint not null
);

create index idx_schedule_detail_schedule_id on
schedule_detail(schedule_id);
create index idx_schedule_detail_minute on
schedule_detail(minute);
create index idx_schedule_detail_hour on
schedule_detail(hour);
create index idx_schedule_detail_day on
schedule_detail(day);
create index idx_schedule_detail_month on
schedule_detail(month);
create index idx_schedule_detail_year on
schedule_detail(year);
create index idx_schedule_detail_dayofweek on
schedule_detail(dayofweek);

A "multiple items" syntax like '1,2,3' in a crontab time field will be
presented with several records in the schedule_detail table. An "every
N" syntax like '*/5' will be represented by a negative number '-5' in
the database field.

For example, this crontab entry:

# every three hours except on weekends (sat/sun)
0 */3 * * 1,2,3,4,5 CHECK-THE-COPIER-MACHINE

will translate to these records:

insert into schedule values (1, 'CHECK-THE-COPIER-MACHINE');
insert into schedule_detail values (1, 0, -3, -1, -1, -1, 1);
insert into schedule_detail values (1, 0, -3, -1, -1, -1, 2);
insert into schedule_detail values (1, 0, -3, -1, -1, -1, 3);
insert into schedule_detail values (1, 0, -3, -1, -1, -1, 4);
insert into schedule_detail values (1, 0, -3, -1, -1, -1, 5);

The query will then be:

select distinct schedule_id,command from schedule_detail
left join schedule on schedule_id=schedule.id
where
(year=extract(year from current_date) or
(year<0 and cast(extract(year from current_date) as smallint)
% year = 0)) and
(month=extract(month from current_date) or
(month<0 and cast(extract(month from current_date) as smallint)
% month = 0)) and
(day=extract(month from current_date) or
(day<0 and cast(extract(day from current_date) as smallint)
% day = 0)) and
(hour=extract(hour from current_time) or
(hour<0 and cast(extract(hour from current_time) as smallint)
% hour = 0)) and
(minute=extract(minute from current_time) or
(minute<0 and cast(extract(minute from current_time) as smallint)
% month = 0));

Several questions:

1. Any idea for a better design? The selectivity of each field is not so
great (only 60 different values for minute, 24 for day, etc).

2. What if we want to add support for things like "every 3 hours after
1:00AM" (1:00, 4:00, 7:00, and so on) or "beginning from 24 Jan 2004 and
every 12 days after that." We won't be able to use the negative number
and modulo trick, or even the crontab-like database fields for this.

3. I think I have a race condition in my query (because I call the
current_date & current_time function several times), but I don't know
how to fix this without resorting to procedure/function.

--
dave

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #1

Subscribe Post Reply

8403

Csaba Nagy

David,

I have another solution working (not using data base persistence, but
the idea is the same).
We have the next run time saved along with the original cron-tab string,
which is left as it is (not broken down to any details). The system
polls for the earliest "next runtime", and executes the associated taks
if the time is not in the future, otherwise does nothing.
Then find the next run time for the just executed task based on it's
time table, and update it in the schedule table.
Of course in a database version this would involve the locking of the
executed task's row, so no other machine/thread will execute it at the
same time (I assume it is about some high concurrency system). I'm not
sure what kind of race conditions can this cause...

Just my 2c,
Csaba.
On Tue, 2004-02-03 at 15:52, David Garamond wrote:

I was thinking on how one would design an optimal (performance-wise)
database of large number of schedules with crontab-like semantic. There
will potentially be hundreds of thousands or even millions of "crontab"
entries, and there will be a query run once every minute to determine
which entries are to be run for that minute. The primary goal is, of
course, for the query to run as fast as possible.

My first stab is, straightforward enough, like below:

create table schedule (
id int not null primary key,
command text
);

create table schedule_detail(
schedule_id int not null references schedule(id),
minute smallint not null,
hour smallint not null,
day smallint not null,
month smallint not null,
year smallint not null,
dayofweek smallint not null
);

create index idx_schedule_detail_schedule_id on
schedule_detail(schedule_id);
create index idx_schedule_detail_minute on
schedule_detail(minute);
create index idx_schedule_detail_hour on
schedule_detail(hour);
create index idx_schedule_detail_day on
schedule_detail(day);
create index idx_schedule_detail_month on
schedule_detail(month);
create index idx_schedule_detail_year on
schedule_detail(year);
create index idx_schedule_detail_dayofweek on
schedule_detail(dayofweek);

A "multiple items" syntax like '1,2,3' in a crontab time field will be
presented with several records in the schedule_detail table. An "every
N" syntax like '*/5' will be represented by a negative number '-5' in
the database field.

For example, this crontab entry:

# every three hours except on weekends (sat/sun)
0 */3 * * 1,2,3,4,5 CHECK-THE-COPIER-MACHINE

will translate to these records:

insert into schedule values (1, 'CHECK-THE-COPIER-MACHINE');
insert into schedule_detail values (1, 0, -3, -1, -1, -1, 1);
insert into schedule_detail values (1, 0, -3, -1, -1, -1, 2);
insert into schedule_detail values (1, 0, -3, -1, -1, -1, 3);
insert into schedule_detail values (1, 0, -3, -1, -1, -1, 4);
insert into schedule_detail values (1, 0, -3, -1, -1, -1, 5);

The query will then be:

select distinct schedule_id,command from schedule_detail
left join schedule on schedule_id=schedule.id
where
(year=extract(year from current_date) or
(year<0 and cast(extract(year from current_date) as smallint)
% year = 0)) and
(month=extract(month from current_date) or
(month<0 and cast(extract(month from current_date) as smallint)
% month = 0)) and
(day=extract(month from current_date) or
(day<0 and cast(extract(day from current_date) as smallint)
% day = 0)) and
(hour=extract(hour from current_time) or
(hour<0 and cast(extract(hour from current_time) as smallint)
% hour = 0)) and
(minute=extract(minute from current_time) or
(minute<0 and cast(extract(minute from current_time) as smallint)
% month = 0));

Several questions:

1. Any idea for a better design? The selectivity of each field is not so
great (only 60 different values for minute, 24 for day, etc).

2. What if we want to add support for things like "every 3 hours after
1:00AM" (1:00, 4:00, 7:00, and so on) or "beginning from 24 Jan 2004 and
every 12 days after that." We won't be able to use the negative number
and modulo trick, or even the crontab-like database fields for this.

3. I think I have a race condition in my query (because I call the
current_date & current_time function several times), but I don't know
how to fix this without resorting to procedure/function.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #2

David Garamond

Csaba,

Csaba Nagy wrote:

We have the next run time saved along with the original cron-tab string,
which is left as it is (not broken down to any details). The system
polls for the earliest "next runtime", and executes the associated taks
if the time is not in the future, otherwise does nothing.

A nice approach, thanks!

--
dave
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #3

Similar topics

crontab manipulation from php

by: Arkascha | last post by:

Maybe someone can give me a short help with this... I got a server application in a LAMP environment doing batch tasks. A cronjob serves as a regular trigger, a metronom. The trigged...

PHP

crontab help

by: David Bruno | last post by:

I set up a crontab using cpanel8 for the first time. It's just to run a very simple php script that sends an email and has one include() function. The first try at the crontab produced a...

PHP

how to run crontab from php skript as a different user

by: Frank R. Suchy | last post by:

Hi, I want a php-script to maintain (some of) "my" cron jobs. Therefore it has to modify some crontab.txt (no problem) and has to execute crontab. But since php runs as the apache-user it...

PHP

Is there a "Large Scale Python Software Design" ?

by: Andrea Griffini | last post by:

I did it. I proposed python as the main language for our next CAD/CAM software because I think that it has all the potential needed for it. I'm not sure yet if the decision will get through, but...

Python

appended crontab entries with py script

by: rbt | last post by:

How can I safely append a crontab entry to a crontab file progammatically with Python? I need to handle crontabs that currently have entries and crontabs that are empty. Also, I'd like this to...

Python

Working with Large vs Small databases

by: Salad | last post by:

Every now and then I see ads that state something like "Experience with Large Databases ...multi-gig...blah-de-blah" And I have to laugh. What's the difference between a large or small database? ...

Microsoft Access / VBA

Execute crontab as user Joe using php...

by: Chr1s | last post by:

How do I execute a crontab command using php (not CLI) as user Joe? I don't have any problem getting output from commands such as 'ls' using passthru and exec but I am stumped with crontab. ...

PHP

still get a <defunct> using a python script in the crontab

by: martijn | last post by:

H! I have made a program that is checking if a program is running or not. If the program is not running then it must start the program again. in the /etc/crontab: * * * * ...

Python

crontab library

by: Martin Marcher | last post by:

Hello, is anyone aware of a crontab library. Possibly even more complete, something that will let me create/manipulate/delete crontab entries in a nice way and install the new crontab...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General