From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 384137BA3A for ; Tue, 2 Nov 2021 10:26:46 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 295D92D23C for ; Tue, 2 Nov 2021 10:26:46 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id C1A9F2D230 for ; Tue, 2 Nov 2021 10:26:44 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 8948845A82; Tue, 2 Nov 2021 10:26:44 +0100 (CET) Message-ID: <87e84f43-9d3e-ce92-fb2d-8c39a5a23dcc@proxmox.com> Date: Tue, 2 Nov 2021 10:26:42 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:94.0) Gecko/20100101 Thunderbird/94.0 Content-Language: en-US To: Fabian Ebner , pve-devel@lists.proxmox.com, Thomas Lamprecht References: <20211007082727.1385888-1-d.csapak@proxmox.com> <20211007082727.1385888-3-d.csapak@proxmox.com> <24333729-1c38-ebaa-7dad-73ca338fd087@proxmox.com> From: Dominik Csapak In-Reply-To: <24333729-1c38-ebaa-7dad-73ca338fd087@proxmox.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.827 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -1.14 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH manager 1/7] replace systemd timer with pvescheduler daemon X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Nov 2021 09:26:46 -0000 On 10/29/21 14:05, Fabian Ebner wrote: > Am 07.10.21 um 10:27 schrieb Dominik Csapak: >> From: Thomas Lamprecht >> >> The whole thing is already prepared for this, the systemd timer was >> just a fixed periodic timer with a frequency of one minute. And we >> just introduced it as the assumption was made that less memory usage >> would be generated with this approach, AFAIK. >> >> But logging 4+ lines just about that the timer was started, even if >> it does nothing, and that 24/7 is not to cheap and a bit annoying. >> >> So in a first step add a simple daemon, which forks of a child for >> running jobs once a minute. >> This could be made still a bit more intelligent, i.e., look if we >> have jobs tor run before forking - as forking is not the cheapest >> syscall. Further, we could adapt the sleep interval to the next time >> we actually need to run a job (and sending a SIGUSR to the daemon if >> a job interval changes such, that this interval got narrower) >> >> We try to sync running on minute-change boundaries at start, this >> emulates systemd.timer behaviour, we had until now. Also user can >> configure jobs on minute precision, so they probably expect that >> those also start really close to a minute change event. >> Could be adapted to resync during running, to factor in time drift. >> But, as long as enough cpu cycles are available we run in correct >> monotonic intervalls, so this isn't a must, IMO. >> >> Another improvement could be locking a bit more fine grained, i.e. >> not on a per-all-local-job-runs basis, but per-job (per-guest?) >> basis, which would improve temporary starvement  of small >> high-periodic jobs through big, less peridoci jobs. >> We argued that it's the user fault if such situations arise, but they >> can evolve over time without noticing, especially in compolexer >> setups. >> >> Signed-off-by: Thomas Lamprecht >> Signed-off-by: Dominik Csapak >> --- >>   PVE/Service/Makefile          |   2 +- >>   PVE/Service/pvescheduler.pm   | 102 ++++++++++++++++++++++++++++++++++ >>   bin/Makefile                  |   6 +- >>   bin/pvescheduler              |  28 ++++++++++ >>   debian/postinst               |   3 +- >>   services/Makefile             |   3 +- >>   services/pvescheduler.service |  16 ++++++ >>   services/pvesr.service        |   8 --- >>   services/pvesr.timer          |  12 ---- >>   9 files changed, 155 insertions(+), 25 deletions(-) >>   create mode 100644 PVE/Service/pvescheduler.pm > > Just noting: All the other .pm-files in PVE/Service have the executable > flag set. > >>   create mode 100755 bin/pvescheduler >>   create mode 100644 services/pvescheduler.service >>   delete mode 100644 services/pvesr.service >>   delete mode 100644 services/pvesr.timer >> >> diff --git a/PVE/Service/Makefile b/PVE/Service/Makefile >> index fc1cdb14..f66ceb81 100644 >> --- a/PVE/Service/Makefile >> +++ b/PVE/Service/Makefile >> @@ -1,6 +1,6 @@ >>   include ../../defines.mk >> -SOURCES=pvestatd.pm pveproxy.pm pvedaemon.pm spiceproxy.pm >> +SOURCES=pvestatd.pm pveproxy.pm pvedaemon.pm spiceproxy.pm >> pvescheduler.pm >>   all: >> diff --git a/PVE/Service/pvescheduler.pm b/PVE/Service/pvescheduler.pm >> new file mode 100644 >> index 00000000..ce55c45a >> --- /dev/null >> +++ b/PVE/Service/pvescheduler.pm >> @@ -0,0 +1,102 @@ >> +package PVE::Service::pvescheduler; >> + >> +use strict; >> +use warnings; >> + >> +use POSIX qw(WNOHANG); >> +use PVE::SafeSyslog; >> +use PVE::API2::Replication; >> + >> +use PVE::Daemon; >> +use base qw(PVE::Daemon); >> + >> +my $cmdline = [$0, @ARGV]; >> +my %daemon_options = (stop_wait_time => 180, max_workers => 0); >> +my $daemon = __PACKAGE__->new('pvescheduler', $cmdline, >> %daemon_options); >> + >> +my $finish_jobs = sub { >> +    my ($self) = @_; >> +    foreach my $cpid (keys %{$self->{jobs}}) { >> +    my $waitpid = waitpid($cpid, WNOHANG); >> +    if (defined($waitpid) && ($waitpid == $cpid)) { >> +        delete ($self->{jobs}->{$cpid}); >> +    } >> +    } >> +}; >> + >> +sub run { >> +    my ($self) = @_; >> + >> +    my $jobs= {}; >> +    $self->{jobs} = $jobs; >> + >> +    my $old_sig_chld = $SIG{CHLD}; >> +    local $SIG{CHLD} = sub { >> +    local ($@, $!, $?); # do not overwrite error vars >> +    $finish_jobs->($self); >> +    $old_sig_chld->(@_) if $old_sig_chld; >> +    }; >> + >> +    my $logfunc = sub { syslog('info', $_[0]) }; >> + >> +    my $run_jobs = sub { >> +    my $child = fork(); >> +    if (!defined($child)) { >> +        die "fork failed: $!\n"; >> +    } elsif ($child == 0) { >> +        $self->after_fork_cleanup(); >> +        PVE::API2::Replication::run_jobs(undef, $logfunc, 0, 1); > > Like this, the whole replication log ends up in syslog. yes, you're right, we probably don't want that in the syslog (same for vzdump jobs later) > >> +        POSIX::_exit(0); >> +    } >> + >> +    $jobs->{$child} = 1; >> +    }; >> + >> +    # try to run near minute boundaries, makes more sense to the user >> as he >> +    # configures jobs with minute precision >> +    my ($current_seconds) = localtime; >> +    sleep(60 - $current_seconds) if (60 - $current_seconds >= 5); >> + >> +    for (;;) { >> +    last if $self->{shutdown_request}; >> + >> +    $run_jobs->(); >> + >> +    my $sleep_time = 60; >> +    my $slept = 0; # SIGCHLD interrupts sleep, so we need to keep track >> +    while ($slept < $sleep_time) { >> +        last if $self->{shutdown_request}; >> +        $slept += sleep($sleep_time - $slept); >> +    } >> +    } > > Won't the loop drift away from the minute boundary over time? The rest > of the loop takes time to execute, which pushes in the positive > direction, while the fact that the accuracy when interrupted is only > whole seconds pushes in the negative direction. But we can't really rely > on those to cancel each other out. > yes, imho the fix would be to pull the correction we have from before the loop to inside the loop. (maybe only every X loops, since the drift from the few forks + sleep should not amount to much per iteration) >> + >> +    # jobs have a lock timeout of 60s, wait a bit more for graceful >> termination >> +    my $timeout = 0; >> +    while (keys %$jobs > 0 && $timeout < 75) { >> +    kill 'TERM', keys %$jobs; >> +    $timeout += sleep(5); >> +    } >> +    # ensure the rest gets stopped >> +    kill 'KILL', keys %$jobs if (keys %$jobs > 0); >> +} >> + >> +sub shutdown { >> +    my ($self) = @_; >> + >> +    syslog('info', 'got shutdown request, signal running jobs to stop'); >> + >> +    kill 'TERM', keys %{$self->{jobs}}; >> +    $self->{shutdown_request} = 1; >> +} >> + >> +$daemon->register_start_command(); >> +$daemon->register_stop_command(); >> +$daemon->register_status_command(); >> + >> +our $cmddef = { >> +    start => [ __PACKAGE__, 'start', []], >> +    stop => [ __PACKAGE__, 'stop', []], >> +    status => [ __PACKAGE__, 'status', [], undef, sub { print shift . >> "\n";} ], >> +}; >> + >> +1; >> diff --git a/bin/Makefile b/bin/Makefile >> index df30b27c..12cb4671 100644 >> --- a/bin/Makefile >> +++ b/bin/Makefile >> @@ -6,7 +6,7 @@ export NOVIEW=1 >>   PERL_DOC_INC_DIRS=.. >>   include /usr/share/pve-doc-generator/pve-doc-generator.mk >> -SERVICES = pvestatd pveproxy pvedaemon spiceproxy >> +SERVICES = pvestatd pveproxy pvedaemon spiceproxy pvescheduler >>   CLITOOLS = vzdump pvesubscription pveceph pveam pvesr pvenode pvesh >> pve6to7 >>   SCRIPTS =              \ >> @@ -52,6 +52,10 @@ pve6to7.1: >>       printf ".TH PVE6TO7 1\n.SH NAME\npve6to7 \- Proxmox VE upgrade >> checker script for 6.4 to 7.x\n" > $@ >>       printf ".SH SYNOPSIS\npve6to7 [--full]\n" >> $@ >> +pvescheduler.8: >> +    # FIXME: add to doc-generator >> +    echo ".TH pvescheduler 8" > $@ >> + >>   pveversion.1.pod: pveversion >>   pveupgrade.1.pod: pveupgrade >>   pvereport.1.pod: pvereport >> diff --git a/bin/pvescheduler b/bin/pvescheduler >> new file mode 100755 >> index 00000000..5f2bde52 >> --- /dev/null >> +++ b/bin/pvescheduler >> @@ -0,0 +1,28 @@ >> +#!/usr/bin/perl >> + >> +use strict; >> +use warnings; >> + >> +use PVE::Service::pvescheduler; >> + >> +use PVE::RPCEnvironment; >> +use PVE::SafeSyslog; >> + >> +$SIG{'__WARN__'} = sub { >> +    my $err = $@; >> +    my $t = $_[0]; >> +    chomp $t; >> +    print STDERR "$t\n"; >> +    syslog('warning', "%s", $t); >> +    $@ = $err; >> +}; >> + >> +my $prepare = sub { >> +    my $rpcenv = PVE::RPCEnvironment->init('priv'); >> + >> +    $rpcenv->init_request(); >> +    $rpcenv->set_language($ENV{LANG}); >> +    $rpcenv->set_user('root@pam'); >> +}; >> + >> +PVE::Service::pvescheduler->run_cli_handler(prepare => $prepare); >> diff --git a/debian/postinst b/debian/postinst >> index aed4da3f..f6407178 100755 >> --- a/debian/postinst >> +++ b/debian/postinst >> @@ -79,6 +79,7 @@ case "$1" in >>       deb-systemd-invoke reload-or-try-restart pvestatd.service >>       deb-systemd-invoke reload-or-try-restart pveproxy.service >>       deb-systemd-invoke reload-or-try-restart spiceproxy.service >> +    deb-systemd-invoke reload-or-try-restart pvescheduler.service >>       exit 0;; >> @@ -102,7 +103,7 @@ case "$1" in >>       # same as dh_systemd_enable (code copied) >> -    UNITS="pvedaemon.service pveproxy.service spiceproxy.service >> pvestatd.service pvebanner.service pvesr.timer pve-daily-update.timer" >> +    UNITS="pvedaemon.service pveproxy.service spiceproxy.service >> pvestatd.service pvebanner.service pvescheduler.service >> pve-daily-update.timer" >>       NO_RESTART_UNITS="pvenetcommit.service pve-guests.service" >>       for unit in ${UNITS} ${NO_RESTART_UNITS}; do >> diff --git a/services/Makefile b/services/Makefile >> index a3d04a2f..b46c7119 100644 >> --- a/services/Makefile >> +++ b/services/Makefile >> @@ -13,8 +13,7 @@ SERVICES=            \ >>       pve-storage.target    \ >>       pve-daily-update.service\ >>       pve-daily-update.timer    \ >> -    pvesr.service        \ >> -    pvesr.timer >> +    pvescheduler.service >>   .PHONY: install >>   install: ${SERVICES} >> diff --git a/services/pvescheduler.service >> b/services/pvescheduler.service >> new file mode 100644 >> index 00000000..11769e80 >> --- /dev/null >> +++ b/services/pvescheduler.service >> @@ -0,0 +1,16 @@ >> +[Unit] >> +Description=Proxmox VE scheduler >> +ConditionPathExists=/usr/bin/pvescheduler >> +Wants=pve-cluster.service >> +After=pve-cluster.service >> +After=pve-storage.target >> + >> +[Service] >> +ExecStart=/usr/bin/pvescheduler start >> +ExecStop=/usr/bin/pvescheduler stop >> +PIDFile=/var/run/pvescheduler.pid >> +KillMode=process >> +Type=forking >> + >> +[Install] >> +WantedBy=multi-user.target >> diff --git a/services/pvesr.service b/services/pvesr.service >> deleted file mode 100644 >> index cbaed1ca..00000000 >> --- a/services/pvesr.service >> +++ /dev/null >> @@ -1,8 +0,0 @@ >> -[Unit] >> -Description=Proxmox VE replication runner >> -ConditionPathExists=/usr/bin/pvesr >> -After=pve-cluster.service >> - >> -[Service] >> -Type=oneshot >> -ExecStart=/usr/bin/pvesr run --mail 1 >> diff --git a/services/pvesr.timer b/services/pvesr.timer >> deleted file mode 100644 >> index 01d7b9c7..00000000 >> --- a/services/pvesr.timer >> +++ /dev/null >> @@ -1,12 +0,0 @@ >> -[Unit] >> -Description=Proxmox VE replication runner >> - >> -[Timer] >> -AccuracySec=1 >> -RemainAfterElapse=no >> - >> -[Timer] >> -OnCalendar=minutely >> - >> -[Install] >> -WantedBy=timers.target >> \ No newline at end of file >>