public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: pve-devel@lists.proxmox.com
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: [pve-devel] [PATCH manager v2 1/6] replace systemd timer with pvescheduler daemon
Date: Mon,  8 Nov 2021 14:07:53 +0100	[thread overview]
Message-ID: <20211108130758.160914-3-d.csapak@proxmox.com> (raw)
In-Reply-To: <20211108130758.160914-1-d.csapak@proxmox.com>

From: Thomas Lamprecht <t.lamprecht@proxmox.com>

The whole thing is already prepared for this, the systemd timer was
just a fixed periodic timer with a frequency of one minute. And we
just introduced it as the assumption was made that less memory usage
would be generated with this approach, AFAIK.

But logging 4+ lines just about that the timer was started, even if
it does nothing, and that 24/7 is not to cheap and a bit annoying.

So in a first step add a simple daemon, which forks of a child for
running jobs once a minute.
This could be made still a bit more intelligent, i.e., look if we
have jobs tor run before forking - as forking is not the cheapest
syscall. Further, we could adapt the sleep interval to the next time
we actually need to run a job (and sending a SIGUSR to the daemon if
a job interval changes such, that this interval got narrower)

We try to sync running on minute-change boundaries at start, this
emulates systemd.timer behaviour, we had until now. Also user can
configure jobs on minute precision, so they probably expect that
those also start really close to a minute change event.
Could be adapted to resync during running, to factor in time drift.
But, as long as enough cpu cycles are available we run in correct
monotonic intervalls, so this isn't a must, IMO.

Another improvement could be locking a bit more fine grained, i.e.
not on a per-all-local-job-runs basis, but per-job (per-guest?)
basis, which would improve temporary starvement  of small
high-periodic jobs through big, less peridoci jobs.
We argued that it's the user fault if such situations arise, but they
can evolve over time without noticing, especially in compolexer
setups.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 PVE/Service/Makefile          |   2 +-
 PVE/Service/pvescheduler.pm   | 118 ++++++++++++++++++++++++++++++++++
 bin/Makefile                  |   6 +-
 bin/pvescheduler              |  28 ++++++++
 debian/postinst               |   3 +-
 services/Makefile             |   3 +-
 services/pvescheduler.service |  16 +++++
 services/pvesr.service        |   8 ---
 services/pvesr.timer          |  12 ----
 9 files changed, 171 insertions(+), 25 deletions(-)
 create mode 100755 PVE/Service/pvescheduler.pm
 create mode 100755 bin/pvescheduler
 create mode 100644 services/pvescheduler.service
 delete mode 100644 services/pvesr.service
 delete mode 100644 services/pvesr.timer

diff --git a/PVE/Service/Makefile b/PVE/Service/Makefile
index fc1cdb14..f66ceb81 100644
--- a/PVE/Service/Makefile
+++ b/PVE/Service/Makefile
@@ -1,6 +1,6 @@
 include ../../defines.mk
 
-SOURCES=pvestatd.pm pveproxy.pm pvedaemon.pm spiceproxy.pm
+SOURCES=pvestatd.pm pveproxy.pm pvedaemon.pm spiceproxy.pm pvescheduler.pm
 
 all:
 
diff --git a/PVE/Service/pvescheduler.pm b/PVE/Service/pvescheduler.pm
new file mode 100755
index 00000000..96d84e20
--- /dev/null
+++ b/PVE/Service/pvescheduler.pm
@@ -0,0 +1,118 @@
+package PVE::Service::pvescheduler;
+
+use strict;
+use warnings;
+
+use POSIX qw(WNOHANG);
+use PVE::SafeSyslog;
+use PVE::API2::Replication;
+
+use PVE::Daemon;
+use base qw(PVE::Daemon);
+
+my $cmdline = [$0, @ARGV];
+my %daemon_options = (stop_wait_time => 180, max_workers => 0);
+my $daemon = __PACKAGE__->new('pvescheduler', $cmdline, %daemon_options);
+
+my $finish_jobs = sub {
+    my ($self) = @_;
+    foreach my $cpid (keys %{$self->{jobs}}) {
+	my $waitpid = waitpid($cpid, WNOHANG);
+	if (defined($waitpid) && ($waitpid == $cpid)) {
+	    delete ($self->{jobs}->{$cpid});
+	}
+    }
+};
+
+my $get_sleep_time = sub {
+    my ($calculate_offset) = @_;
+    my $time = 60;
+
+    if ($calculate_offset) {
+	# try to run near minute boundaries, makes more sense to the user as he
+	# configures jobs with minute precision
+	my ($current_seconds) = localtime;
+	$time = (60 - $current_seconds) if (60 - $current_seconds >= 5);
+    }
+
+    return $time;
+};
+
+sub run {
+    my ($self) = @_;
+
+    my $jobs= {};
+    $self->{jobs} = $jobs;
+
+    my $old_sig_chld = $SIG{CHLD};
+    local $SIG{CHLD} = sub {
+	local ($@, $!, $?); # do not overwrite error vars
+	$finish_jobs->($self);
+	$old_sig_chld->(@_) if $old_sig_chld;
+    };
+
+    my $run_jobs = sub {
+	my $child = fork();
+	if (!defined($child)) {
+	    die "fork failed: $!\n";
+	} elsif ($child == 0) {
+	    $self->after_fork_cleanup();
+	    PVE::API2::Replication::run_jobs(undef, sub {}, 0, 1);
+	    POSIX::_exit(0);
+	}
+
+	$jobs->{$child} = 1;
+    };
+
+    PVE::Jobs::setup_dirs();
+
+    for (my $count = 1000;;$count++) {
+	last if $self->{shutdown_request};
+
+	$run_jobs->();
+
+	my $sleep_time;
+	if ($count >= 1000) {
+	    $sleep_time = $get_sleep_time->(1);
+	    $count = 0;
+	} else {
+	    $sleep_time = $get_sleep_time->(0);
+	}
+
+	my $slept = 0; # SIGCHLD interrupts sleep, so we need to keep track
+	while ($slept < $sleep_time) {
+	    last if $self->{shutdown_request};
+	    $slept += sleep($sleep_time - $slept);
+	}
+    }
+
+    # jobs have a lock timeout of 60s, wait a bit more for graceful termination
+    my $timeout = 0;
+    while (keys %$jobs > 0 && $timeout < 75) {
+	kill 'TERM', keys %$jobs;
+	$timeout += sleep(5);
+    }
+    # ensure the rest gets stopped
+    kill 'KILL', keys %$jobs if (keys %$jobs > 0);
+}
+
+sub shutdown {
+    my ($self) = @_;
+
+    syslog('info', 'got shutdown request, signal running jobs to stop');
+
+    kill 'TERM', keys %{$self->{jobs}};
+    $self->{shutdown_request} = 1;
+}
+
+$daemon->register_start_command();
+$daemon->register_stop_command();
+$daemon->register_status_command();
+
+our $cmddef = {
+    start => [ __PACKAGE__, 'start', []],
+    stop => [ __PACKAGE__, 'stop', []],
+    status => [ __PACKAGE__, 'status', [], undef, sub { print shift . "\n";} ],
+};
+
+1;
diff --git a/bin/Makefile b/bin/Makefile
index df30b27c..12cb4671 100644
--- a/bin/Makefile
+++ b/bin/Makefile
@@ -6,7 +6,7 @@ export NOVIEW=1
 PERL_DOC_INC_DIRS=..
 include /usr/share/pve-doc-generator/pve-doc-generator.mk
 
-SERVICES = pvestatd pveproxy pvedaemon spiceproxy
+SERVICES = pvestatd pveproxy pvedaemon spiceproxy pvescheduler
 CLITOOLS = vzdump pvesubscription pveceph pveam pvesr pvenode pvesh pve6to7
 
 SCRIPTS =  			\
@@ -52,6 +52,10 @@ pve6to7.1:
 	printf ".TH PVE6TO7 1\n.SH NAME\npve6to7 \- Proxmox VE upgrade checker script for 6.4 to 7.x\n" > $@
 	printf ".SH SYNOPSIS\npve6to7 [--full]\n" >> $@
 
+pvescheduler.8:
+	# FIXME: add to doc-generator
+	echo ".TH pvescheduler 8" > $@
+
 pveversion.1.pod: pveversion
 pveupgrade.1.pod: pveupgrade
 pvereport.1.pod: pvereport
diff --git a/bin/pvescheduler b/bin/pvescheduler
new file mode 100755
index 00000000..5f2bde52
--- /dev/null
+++ b/bin/pvescheduler
@@ -0,0 +1,28 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use PVE::Service::pvescheduler;
+
+use PVE::RPCEnvironment;
+use PVE::SafeSyslog;
+
+$SIG{'__WARN__'} = sub {
+    my $err = $@;
+    my $t = $_[0];
+    chomp $t;
+    print STDERR "$t\n";
+    syslog('warning', "%s", $t);
+    $@ = $err;
+};
+
+my $prepare = sub {
+    my $rpcenv = PVE::RPCEnvironment->init('priv');
+
+    $rpcenv->init_request();
+    $rpcenv->set_language($ENV{LANG});
+    $rpcenv->set_user('root@pam');
+};
+
+PVE::Service::pvescheduler->run_cli_handler(prepare => $prepare);
diff --git a/debian/postinst b/debian/postinst
index c4df64b5..7be1aa3d 100755
--- a/debian/postinst
+++ b/debian/postinst
@@ -79,6 +79,7 @@ case "$1" in
     deb-systemd-invoke reload-or-try-restart pvestatd.service
     deb-systemd-invoke reload-or-try-restart pveproxy.service
     deb-systemd-invoke reload-or-try-restart spiceproxy.service
+    deb-systemd-invoke reload-or-try-restart pvescheduler.service
 
     exit 0;;
 
@@ -102,7 +103,7 @@ case "$1" in
 
     # same as dh_systemd_enable (code copied)
 
-    UNITS="pvedaemon.service pveproxy.service spiceproxy.service pvestatd.service pvebanner.service pvesr.timer pve-daily-update.timer"
+    UNITS="pvedaemon.service pveproxy.service spiceproxy.service pvestatd.service pvebanner.service pvescheduler.service pve-daily-update.timer"
     NO_RESTART_UNITS="pvenetcommit.service pve-guests.service"
 
     for unit in ${UNITS} ${NO_RESTART_UNITS}; do
diff --git a/services/Makefile b/services/Makefile
index a3d04a2f..b46c7119 100644
--- a/services/Makefile
+++ b/services/Makefile
@@ -13,8 +13,7 @@ SERVICES=			\
 	pve-storage.target	\
 	pve-daily-update.service\
 	pve-daily-update.timer	\
-	pvesr.service		\
-	pvesr.timer
+	pvescheduler.service
 
 .PHONY: install
 install: ${SERVICES}
diff --git a/services/pvescheduler.service b/services/pvescheduler.service
new file mode 100644
index 00000000..11769e80
--- /dev/null
+++ b/services/pvescheduler.service
@@ -0,0 +1,16 @@
+[Unit]
+Description=Proxmox VE scheduler
+ConditionPathExists=/usr/bin/pvescheduler
+Wants=pve-cluster.service
+After=pve-cluster.service
+After=pve-storage.target
+
+[Service]
+ExecStart=/usr/bin/pvescheduler start
+ExecStop=/usr/bin/pvescheduler stop
+PIDFile=/var/run/pvescheduler.pid
+KillMode=process
+Type=forking
+
+[Install]
+WantedBy=multi-user.target
diff --git a/services/pvesr.service b/services/pvesr.service
deleted file mode 100644
index cbaed1ca..00000000
--- a/services/pvesr.service
+++ /dev/null
@@ -1,8 +0,0 @@
-[Unit]
-Description=Proxmox VE replication runner
-ConditionPathExists=/usr/bin/pvesr
-After=pve-cluster.service
-
-[Service]
-Type=oneshot
-ExecStart=/usr/bin/pvesr run --mail 1
diff --git a/services/pvesr.timer b/services/pvesr.timer
deleted file mode 100644
index 01d7b9c7..00000000
--- a/services/pvesr.timer
+++ /dev/null
@@ -1,12 +0,0 @@
-[Unit]
-Description=Proxmox VE replication runner
-
-[Timer]
-AccuracySec=1
-RemainAfterElapse=no
-
-[Timer]
-OnCalendar=minutely
-
-[Install]
-WantedBy=timers.target
\ No newline at end of file
-- 
2.30.2





  parent reply	other threads:[~2021-11-08 13:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-08 13:07 [pve-devel] [PATCH cluster/manager v2] add scheduling daemon for pvesr + vzdump (and more) Dominik Csapak
2021-11-08 13:07 ` [pve-devel] [PATCH cluster v2 1/1] add 'jobs.cfg' to observed files Dominik Csapak
2021-11-09 17:18   ` [pve-devel] applied: " Thomas Lamprecht
2021-11-08 13:07 ` Dominik Csapak [this message]
2021-11-08 13:07 ` [pve-devel] [PATCH manager v2 2/6] add PVE/Jobs to handle VZDump jobs Dominik Csapak
2021-11-08 13:07 ` [pve-devel] [PATCH manager v2 3/6] pvescheduler: run jobs from jobs.cfg Dominik Csapak
2021-11-08 13:07 ` [pve-devel] [PATCH manager v2 4/6] api/backup: refactor string for all days Dominik Csapak
2021-11-08 13:07 ` [pve-devel] [PATCH manager v2 5/6] api/backup: handle new vzdump jobs Dominik Csapak
2021-11-08 13:07 ` [pve-devel] [PATCH manager v2 6/6] ui: dc/backup: show id+schedule instead of dow+starttime Dominik Csapak
2021-11-09 14:17 ` [pve-devel] [PATCH cluster/manager v2] add scheduling daemon for pvesr + vzdump (and more) Dylan Whyte
2021-11-10  9:38   ` Dominik Csapak
2021-11-10 11:05     ` Thomas Lamprecht
2021-11-09 16:55 ` Aaron Lauterer
2021-11-10 20:42 ` [pve-devel] applied-series: " Thomas Lamprecht
2021-11-11 11:35 ` [pve-devel] " Fabian Ebner
2021-11-11 11:46   ` Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211108130758.160914-3-d.csapak@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal