public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: Fabian Ebner <f.ebner@proxmox.com>,
	pve-devel@lists.proxmox.com,
	Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [pve-devel] [PATCH manager 1/7] replace systemd timer with pvescheduler daemon
Date: Tue, 2 Nov 2021 10:26:42 +0100	[thread overview]
Message-ID: <87e84f43-9d3e-ce92-fb2d-8c39a5a23dcc@proxmox.com> (raw)
In-Reply-To: <24333729-1c38-ebaa-7dad-73ca338fd087@proxmox.com>

On 10/29/21 14:05, Fabian Ebner wrote:
> Am 07.10.21 um 10:27 schrieb Dominik Csapak:
>> From: Thomas Lamprecht <t.lamprecht@proxmox.com>
>>
>> The whole thing is already prepared for this, the systemd timer was
>> just a fixed periodic timer with a frequency of one minute. And we
>> just introduced it as the assumption was made that less memory usage
>> would be generated with this approach, AFAIK.
>>
>> But logging 4+ lines just about that the timer was started, even if
>> it does nothing, and that 24/7 is not to cheap and a bit annoying.
>>
>> So in a first step add a simple daemon, which forks of a child for
>> running jobs once a minute.
>> This could be made still a bit more intelligent, i.e., look if we
>> have jobs tor run before forking - as forking is not the cheapest
>> syscall. Further, we could adapt the sleep interval to the next time
>> we actually need to run a job (and sending a SIGUSR to the daemon if
>> a job interval changes such, that this interval got narrower)
>>
>> We try to sync running on minute-change boundaries at start, this
>> emulates systemd.timer behaviour, we had until now. Also user can
>> configure jobs on minute precision, so they probably expect that
>> those also start really close to a minute change event.
>> Could be adapted to resync during running, to factor in time drift.
>> But, as long as enough cpu cycles are available we run in correct
>> monotonic intervalls, so this isn't a must, IMO.
>>
>> Another improvement could be locking a bit more fine grained, i.e.
>> not on a per-all-local-job-runs basis, but per-job (per-guest?)
>> basis, which would improve temporary starvement  of small
>> high-periodic jobs through big, less peridoci jobs.
>> We argued that it's the user fault if such situations arise, but they
>> can evolve over time without noticing, especially in compolexer
>> setups.
>>
>> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>>   PVE/Service/Makefile          |   2 +-
>>   PVE/Service/pvescheduler.pm   | 102 ++++++++++++++++++++++++++++++++++
>>   bin/Makefile                  |   6 +-
>>   bin/pvescheduler              |  28 ++++++++++
>>   debian/postinst               |   3 +-
>>   services/Makefile             |   3 +-
>>   services/pvescheduler.service |  16 ++++++
>>   services/pvesr.service        |   8 ---
>>   services/pvesr.timer          |  12 ----
>>   9 files changed, 155 insertions(+), 25 deletions(-)
>>   create mode 100644 PVE/Service/pvescheduler.pm
> 
> Just noting: All the other .pm-files in PVE/Service have the executable 
> flag set.
> 
>>   create mode 100755 bin/pvescheduler
>>   create mode 100644 services/pvescheduler.service
>>   delete mode 100644 services/pvesr.service
>>   delete mode 100644 services/pvesr.timer
>>
>> diff --git a/PVE/Service/Makefile b/PVE/Service/Makefile
>> index fc1cdb14..f66ceb81 100644
>> --- a/PVE/Service/Makefile
>> +++ b/PVE/Service/Makefile
>> @@ -1,6 +1,6 @@
>>   include ../../defines.mk
>> -SOURCES=pvestatd.pm pveproxy.pm pvedaemon.pm spiceproxy.pm
>> +SOURCES=pvestatd.pm pveproxy.pm pvedaemon.pm spiceproxy.pm 
>> pvescheduler.pm
>>   all:
>> diff --git a/PVE/Service/pvescheduler.pm b/PVE/Service/pvescheduler.pm
>> new file mode 100644
>> index 00000000..ce55c45a
>> --- /dev/null
>> +++ b/PVE/Service/pvescheduler.pm
>> @@ -0,0 +1,102 @@
>> +package PVE::Service::pvescheduler;
>> +
>> +use strict;
>> +use warnings;
>> +
>> +use POSIX qw(WNOHANG);
>> +use PVE::SafeSyslog;
>> +use PVE::API2::Replication;
>> +
>> +use PVE::Daemon;
>> +use base qw(PVE::Daemon);
>> +
>> +my $cmdline = [$0, @ARGV];
>> +my %daemon_options = (stop_wait_time => 180, max_workers => 0);
>> +my $daemon = __PACKAGE__->new('pvescheduler', $cmdline, 
>> %daemon_options);
>> +
>> +my $finish_jobs = sub {
>> +    my ($self) = @_;
>> +    foreach my $cpid (keys %{$self->{jobs}}) {
>> +    my $waitpid = waitpid($cpid, WNOHANG);
>> +    if (defined($waitpid) && ($waitpid == $cpid)) {
>> +        delete ($self->{jobs}->{$cpid});
>> +    }
>> +    }
>> +};
>> +
>> +sub run {
>> +    my ($self) = @_;
>> +
>> +    my $jobs= {};
>> +    $self->{jobs} = $jobs;
>> +
>> +    my $old_sig_chld = $SIG{CHLD};
>> +    local $SIG{CHLD} = sub {
>> +    local ($@, $!, $?); # do not overwrite error vars
>> +    $finish_jobs->($self);
>> +    $old_sig_chld->(@_) if $old_sig_chld;
>> +    };
>> +
>> +    my $logfunc = sub { syslog('info', $_[0]) };
>> +
>> +    my $run_jobs = sub {
>> +    my $child = fork();
>> +    if (!defined($child)) {
>> +        die "fork failed: $!\n";
>> +    } elsif ($child == 0) {
>> +        $self->after_fork_cleanup();
>> +        PVE::API2::Replication::run_jobs(undef, $logfunc, 0, 1);
> 
> Like this, the whole replication log ends up in syslog.

yes, you're right, we probably don't want that in the syslog
(same for vzdump jobs later)

> 
>> +        POSIX::_exit(0);
>> +    }
>> +
>> +    $jobs->{$child} = 1;
>> +    };
>> +
>> +    # try to run near minute boundaries, makes more sense to the user 
>> as he
>> +    # configures jobs with minute precision
>> +    my ($current_seconds) = localtime;
>> +    sleep(60 - $current_seconds) if (60 - $current_seconds >= 5);
>> +
>> +    for (;;) {
>> +    last if $self->{shutdown_request};
>> +
>> +    $run_jobs->();
>> +
>> +    my $sleep_time = 60;
>> +    my $slept = 0; # SIGCHLD interrupts sleep, so we need to keep track
>> +    while ($slept < $sleep_time) {
>> +        last if $self->{shutdown_request};
>> +        $slept += sleep($sleep_time - $slept);
>> +    }
>> +    }
> 
> Won't the loop drift away from the minute boundary over time? The rest 
> of the loop takes time to execute, which pushes in the positive 
> direction, while the fact that the accuracy when interrupted is only 
> whole seconds pushes in the negative direction. But we can't really rely 
> on those to cancel each other out.
> 

yes, imho the fix would be to pull the correction we have from before
the loop to inside the loop. (maybe only every X loops, since the drift
from the few forks + sleep should not amount to much per iteration)

>> +
>> +    # jobs have a lock timeout of 60s, wait a bit more for graceful 
>> termination
>> +    my $timeout = 0;
>> +    while (keys %$jobs > 0 && $timeout < 75) {
>> +    kill 'TERM', keys %$jobs;
>> +    $timeout += sleep(5);
>> +    }
>> +    # ensure the rest gets stopped
>> +    kill 'KILL', keys %$jobs if (keys %$jobs > 0);
>> +}
>> +
>> +sub shutdown {
>> +    my ($self) = @_;
>> +
>> +    syslog('info', 'got shutdown request, signal running jobs to stop');
>> +
>> +    kill 'TERM', keys %{$self->{jobs}};
>> +    $self->{shutdown_request} = 1;
>> +}
>> +
>> +$daemon->register_start_command();
>> +$daemon->register_stop_command();
>> +$daemon->register_status_command();
>> +
>> +our $cmddef = {
>> +    start => [ __PACKAGE__, 'start', []],
>> +    stop => [ __PACKAGE__, 'stop', []],
>> +    status => [ __PACKAGE__, 'status', [], undef, sub { print shift . 
>> "\n";} ],
>> +};
>> +
>> +1;
>> diff --git a/bin/Makefile b/bin/Makefile
>> index df30b27c..12cb4671 100644
>> --- a/bin/Makefile
>> +++ b/bin/Makefile
>> @@ -6,7 +6,7 @@ export NOVIEW=1
>>   PERL_DOC_INC_DIRS=..
>>   include /usr/share/pve-doc-generator/pve-doc-generator.mk
>> -SERVICES = pvestatd pveproxy pvedaemon spiceproxy
>> +SERVICES = pvestatd pveproxy pvedaemon spiceproxy pvescheduler
>>   CLITOOLS = vzdump pvesubscription pveceph pveam pvesr pvenode pvesh 
>> pve6to7
>>   SCRIPTS =              \
>> @@ -52,6 +52,10 @@ pve6to7.1:
>>       printf ".TH PVE6TO7 1\n.SH NAME\npve6to7 \- Proxmox VE upgrade 
>> checker script for 6.4 to 7.x\n" > $@
>>       printf ".SH SYNOPSIS\npve6to7 [--full]\n" >> $@
>> +pvescheduler.8:
>> +    # FIXME: add to doc-generator
>> +    echo ".TH pvescheduler 8" > $@
>> +
>>   pveversion.1.pod: pveversion
>>   pveupgrade.1.pod: pveupgrade
>>   pvereport.1.pod: pvereport
>> diff --git a/bin/pvescheduler b/bin/pvescheduler
>> new file mode 100755
>> index 00000000..5f2bde52
>> --- /dev/null
>> +++ b/bin/pvescheduler
>> @@ -0,0 +1,28 @@
>> +#!/usr/bin/perl
>> +
>> +use strict;
>> +use warnings;
>> +
>> +use PVE::Service::pvescheduler;
>> +
>> +use PVE::RPCEnvironment;
>> +use PVE::SafeSyslog;
>> +
>> +$SIG{'__WARN__'} = sub {
>> +    my $err = $@;
>> +    my $t = $_[0];
>> +    chomp $t;
>> +    print STDERR "$t\n";
>> +    syslog('warning', "%s", $t);
>> +    $@ = $err;
>> +};
>> +
>> +my $prepare = sub {
>> +    my $rpcenv = PVE::RPCEnvironment->init('priv');
>> +
>> +    $rpcenv->init_request();
>> +    $rpcenv->set_language($ENV{LANG});
>> +    $rpcenv->set_user('root@pam');
>> +};
>> +
>> +PVE::Service::pvescheduler->run_cli_handler(prepare => $prepare);
>> diff --git a/debian/postinst b/debian/postinst
>> index aed4da3f..f6407178 100755
>> --- a/debian/postinst
>> +++ b/debian/postinst
>> @@ -79,6 +79,7 @@ case "$1" in
>>       deb-systemd-invoke reload-or-try-restart pvestatd.service
>>       deb-systemd-invoke reload-or-try-restart pveproxy.service
>>       deb-systemd-invoke reload-or-try-restart spiceproxy.service
>> +    deb-systemd-invoke reload-or-try-restart pvescheduler.service
>>       exit 0;;
>> @@ -102,7 +103,7 @@ case "$1" in
>>       # same as dh_systemd_enable (code copied)
>> -    UNITS="pvedaemon.service pveproxy.service spiceproxy.service 
>> pvestatd.service pvebanner.service pvesr.timer pve-daily-update.timer"
>> +    UNITS="pvedaemon.service pveproxy.service spiceproxy.service 
>> pvestatd.service pvebanner.service pvescheduler.service 
>> pve-daily-update.timer"
>>       NO_RESTART_UNITS="pvenetcommit.service pve-guests.service"
>>       for unit in ${UNITS} ${NO_RESTART_UNITS}; do
>> diff --git a/services/Makefile b/services/Makefile
>> index a3d04a2f..b46c7119 100644
>> --- a/services/Makefile
>> +++ b/services/Makefile
>> @@ -13,8 +13,7 @@ SERVICES=            \
>>       pve-storage.target    \
>>       pve-daily-update.service\
>>       pve-daily-update.timer    \
>> -    pvesr.service        \
>> -    pvesr.timer
>> +    pvescheduler.service
>>   .PHONY: install
>>   install: ${SERVICES}
>> diff --git a/services/pvescheduler.service 
>> b/services/pvescheduler.service
>> new file mode 100644
>> index 00000000..11769e80
>> --- /dev/null
>> +++ b/services/pvescheduler.service
>> @@ -0,0 +1,16 @@
>> +[Unit]
>> +Description=Proxmox VE scheduler
>> +ConditionPathExists=/usr/bin/pvescheduler
>> +Wants=pve-cluster.service
>> +After=pve-cluster.service
>> +After=pve-storage.target
>> +
>> +[Service]
>> +ExecStart=/usr/bin/pvescheduler start
>> +ExecStop=/usr/bin/pvescheduler stop
>> +PIDFile=/var/run/pvescheduler.pid
>> +KillMode=process
>> +Type=forking
>> +
>> +[Install]
>> +WantedBy=multi-user.target
>> diff --git a/services/pvesr.service b/services/pvesr.service
>> deleted file mode 100644
>> index cbaed1ca..00000000
>> --- a/services/pvesr.service
>> +++ /dev/null
>> @@ -1,8 +0,0 @@
>> -[Unit]
>> -Description=Proxmox VE replication runner
>> -ConditionPathExists=/usr/bin/pvesr
>> -After=pve-cluster.service
>> -
>> -[Service]
>> -Type=oneshot
>> -ExecStart=/usr/bin/pvesr run --mail 1
>> diff --git a/services/pvesr.timer b/services/pvesr.timer
>> deleted file mode 100644
>> index 01d7b9c7..00000000
>> --- a/services/pvesr.timer
>> +++ /dev/null
>> @@ -1,12 +0,0 @@
>> -[Unit]
>> -Description=Proxmox VE replication runner
>> -
>> -[Timer]
>> -AccuracySec=1
>> -RemainAfterElapse=no
>> -
>> -[Timer]
>> -OnCalendar=minutely
>> -
>> -[Install]
>> -WantedBy=timers.target
>> \ No newline at end of file
>>





  reply	other threads:[~2021-11-02  9:26 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07  8:27 [pve-devel] [PATCH cluster/manager] add scheduling daemon for pvesr + vzdump (and more) Dominik Csapak
2021-10-07  8:27 ` [pve-devel] [PATCH cluster 1/1] add 'jobs.cfg' to observed files Dominik Csapak
2021-10-07  8:27 ` [pve-devel] [PATCH manager 1/7] replace systemd timer with pvescheduler daemon Dominik Csapak
2021-10-29 12:05   ` Fabian Ebner
2021-11-02  9:26     ` Dominik Csapak [this message]
2021-10-07  8:27 ` [pve-devel] [PATCH manager 2/7] postinst: use reload-or-restart instead of reload-or-try-restart Dominik Csapak
2021-10-07  8:38   ` [pve-devel] applied: " Thomas Lamprecht
2021-10-07  8:27 ` [pve-devel] [PATCH manager 3/7] api/backup: refactor string for all days Dominik Csapak
2021-10-07  8:27 ` [pve-devel] [PATCH manager 4/7] add PVE/Jobs to handle VZDump jobs Dominik Csapak
2021-11-02 13:52   ` Fabian Ebner
2021-11-02 14:33     ` Dominik Csapak
2021-11-03  7:37       ` Fabian Ebner
2021-10-07  8:27 ` [pve-devel] [PATCH manager 5/7] pvescheduler: run jobs from jobs.cfg Dominik Csapak
2021-10-07  8:27 ` [pve-devel] [PATCH manager 6/7] api/backup: handle new vzdump jobs Dominik Csapak
2021-11-03  9:05   ` Fabian Ebner
2021-10-07  8:27 ` [pve-devel] [PATCH manager 7/7] ui: dc/backup: show id+schedule instead of dow+starttime Dominik Csapak
2021-11-03  9:21   ` Fabian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87e84f43-9d3e-ce92-fb2d-8c39a5a23dcc@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal