From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 1E3C47C42A
 for <pve-devel@lists.proxmox.com>; Fri, 15 Jul 2022 10:51:53 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 119ED25D16
 for <pve-devel@lists.proxmox.com>; Fri, 15 Jul 2022 10:51:53 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Fri, 15 Jul 2022 10:51:52 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 06D674231C
 for <pve-devel@lists.proxmox.com>; Fri, 15 Jul 2022 10:51:52 +0200 (CEST)
Message-ID: <8be282a5-772c-a978-4b4a-3e76deea9f80@proxmox.com>
Date: Fri, 15 Jul 2022 10:51:50 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.11.0
Content-Language: en-US
To: pve-devel@lists.proxmox.com, Dominik Csapak <d.csapak@proxmox.com>
References: <20220714074202.1298324-1-d.csapak@proxmox.com>
From: Fabian Ebner <f.ebner@proxmox.com>
In-Reply-To: <20220714074202.1298324-1-d.csapak@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.047 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
Subject: Re: [pve-devel] [PATCH manager] Jobs: fix scheduling when updating
 on unrelated nodes
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 15 Jul 2022 08:51:53 -0000

In the subject, I wouldn't call the nodes "unrelated". How about "after
updating job from a different node"?

Am 14.07.22 um 09:42 schrieb Dominik Csapak:
> since the jobs are configured clusterwide in pmxcfs, a user can use any
> node to update the config of them. for some configs (schedule/enabled)
> we need to update the last runtime in the state file, but this
> is sadly only node-local.
> 
> to also update the state file on the other nodes, we introduce
> a new 'update_job_props' function that saves relevant properties from
> the config to the statefile each round of the scheduler if they changed.
> 
> this way, we can detect changes in those and update the last runtime too.
> 
> the only situation where that would not be enough is when a user
> changes schedules and back to the original one within a single minute
> (so between scheduler runs). in that case, the other nodes won't
> detect that change, but it seems to be a rather unlikely edge case
> that we can ignore.

Even with that edge case, there's no effect on when the job actually
runs, or? Just the 'updated' time stamp in the job state will not be
correct (from a global perspective) on the other nodes until the job
runs again.

> 
> if we really want to solve that too, we'd have to save the 'updated'
> timestamp in the config too, just to sync it to the job state file
> later.
> 
> in 'synchronize_job_states_with_config' we switch from reading the
> jobstate unconditionally to check the existing of the statefile
> (which is the only condition where that can return undef anyway)

typos:
s/existing/existence/
'where' should be dropped

> so that we don't read the file multiple times each round.
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>

What about starting_job and started_job? The saved_props are lost when
that function writes its new state. Maybe there should be a wrapper for
updating the job state that always preserves certain properties.

> ---
>  PVE/Jobs.pm | 62 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 57 insertions(+), 5 deletions(-)
> 
> diff --git a/PVE/Jobs.pm b/PVE/Jobs.pm
> index 1091bc22..822f0454 100644
> --- a/PVE/Jobs.pm
> +++ b/PVE/Jobs.pm
> @@ -25,6 +25,8 @@ my $default_state = {
>      time => 0,
>  };
>  
> +my $saved_props = [qw(enabled schedule)];

Maybe move update_job_props to right below here, so the comment
describing the use is closer? Or maybe something like
runtime_updating_props is more descriptive?

> +
>  # lockless, since we use file_get_contents, which is atomic
>  sub read_job_state {
>      my ($jobid, $type) = @_;
> @@ -93,8 +95,15 @@ sub update_job_stopped {
>  		upid => $state->{upid},
>  	    };
>  
> -	    if ($state->{updated}) { # save updated time stamp
> -		$new_state->{updated} = $state->{updated};
> +	    # save some old props
> +	    if (my $updated = $state->{updated}) {
> +		$new_state->{updated} = $updated;
> +	    }
> +
> +	    for my $prop (@$saved_props) {
> +		if (defined($state->{$prop})) {
> +		    $new_state->{$prop} = $state->{$prop};
> +		}
>  	    }
>  
>  	    my $path = $get_state_file->($jobid, $type);
> @@ -105,7 +114,7 @@ sub update_job_stopped {
>  
>  # must be called when the job is first created
>  sub create_job {
> -    my ($jobid, $type) = @_;
> +    my ($jobid, $type, $cfg) = @_;

The caller in PVE/API2/Backup.pm could also be adapted to this change.
Although I suppose any new job will be caught by
synchronize_job_states_with_config, like on nodes different from the one
on which it was created.

>  
>      lock_job_state($jobid, $type, sub {
>  	my $state = read_job_state($jobid, $type) // $default_state;
> @@ -115,6 +124,11 @@ sub create_job {
>  	}
>  
>  	$state->{time} = time();
> +	for my $prop (@$saved_props) {
> +	    if (defined($cfg->{$prop})) {
> +		$state->{$prop} = $cfg->{$prop};
> +	    }
> +	}
>  
>  	my $path = $get_state_file->($jobid, $type);
>  	PVE::Tools::file_set_contents($path, encode_json($state));
> @@ -192,6 +206,39 @@ sub update_last_runtime {
>      });
>  }
>  
> +# saves some properties of the jobcfg into the jobstate so we can track
> +# them on different nodes (where the update was not done)
> +# and update the last runtime when they change
> +sub update_job_props {

update_saved_props or detect_changed_runtime_props might be a bit more
telling

> +    my ($jobid, $type, $cfg) = @_;
> +
> +    lock_job_state($jobid, $type, sub {
> +	my $old_state = read_job_state($jobid, $type) // $default_state;
> +
> +	my $updated = 0;
> +	for my $prop (@$saved_props) {
> +	    my $old_prop = $old_state->{$prop} // '';
> +	    my $new_prop = $cfg->{$prop} // '';
> +	    next if "$old_prop" eq "$new_prop";
> +
> +	    if (defined($cfg->{$prop})) {
> +		$old_state->{$prop} = $cfg->{$prop};
> +	    } else {
> +		delete $old_state->{$prop};
> +	    }
> +
> +	    $updated = 1;
> +	}
> +
> +	return if !$updated;
> +	$old_state->{updated} = time();
> +
> +	my $path = $get_state_file->($jobid, $type);
> +	PVE::Tools::file_set_contents($path, encode_json($old_state));
> +    });
> +}
> +
> +
>  sub get_last_runtime {
>      my ($jobid, $type) = @_;
>