public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH manager] Jobs: fix scheduling when updating on unrelated nodes
@ 2022-07-14  7:42 Dominik Csapak
  2022-07-15  8:51 ` Fabian Ebner
  0 siblings, 1 reply; 4+ messages in thread
From: Dominik Csapak @ 2022-07-14  7:42 UTC (permalink / raw)
  To: pve-devel

since the jobs are configured clusterwide in pmxcfs, a user can use any
node to update the config of them. for some configs (schedule/enabled)
we need to update the last runtime in the state file, but this
is sadly only node-local.

to also update the state file on the other nodes, we introduce
a new 'update_job_props' function that saves relevant properties from
the config to the statefile each round of the scheduler if they changed.

this way, we can detect changes in those and update the last runtime too.

the only situation where that would not be enough is when a user
changes schedules and back to the original one within a single minute
(so between scheduler runs). in that case, the other nodes won't
detect that change, but it seems to be a rather unlikely edge case
that we can ignore.

if we really want to solve that too, we'd have to save the 'updated'
timestamp in the config too, just to sync it to the job state file
later.

in 'synchronize_job_states_with_config' we switch from reading the
jobstate unconditionally to check the existing of the statefile
(which is the only condition where that can return undef anyway)
so that we don't read the file multiple times each round.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 PVE/Jobs.pm | 62 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 57 insertions(+), 5 deletions(-)

diff --git a/PVE/Jobs.pm b/PVE/Jobs.pm
index 1091bc22..822f0454 100644
--- a/PVE/Jobs.pm
+++ b/PVE/Jobs.pm
@@ -25,6 +25,8 @@ my $default_state = {
     time => 0,
 };
 
+my $saved_props = [qw(enabled schedule)];
+
 # lockless, since we use file_get_contents, which is atomic
 sub read_job_state {
     my ($jobid, $type) = @_;
@@ -93,8 +95,15 @@ sub update_job_stopped {
 		upid => $state->{upid},
 	    };
 
-	    if ($state->{updated}) { # save updated time stamp
-		$new_state->{updated} = $state->{updated};
+	    # save some old props
+	    if (my $updated = $state->{updated}) {
+		$new_state->{updated} = $updated;
+	    }
+
+	    for my $prop (@$saved_props) {
+		if (defined($state->{$prop})) {
+		    $new_state->{$prop} = $state->{$prop};
+		}
 	    }
 
 	    my $path = $get_state_file->($jobid, $type);
@@ -105,7 +114,7 @@ sub update_job_stopped {
 
 # must be called when the job is first created
 sub create_job {
-    my ($jobid, $type) = @_;
+    my ($jobid, $type, $cfg) = @_;
 
     lock_job_state($jobid, $type, sub {
 	my $state = read_job_state($jobid, $type) // $default_state;
@@ -115,6 +124,11 @@ sub create_job {
 	}
 
 	$state->{time} = time();
+	for my $prop (@$saved_props) {
+	    if (defined($cfg->{$prop})) {
+		$state->{$prop} = $cfg->{$prop};
+	    }
+	}
 
 	my $path = $get_state_file->($jobid, $type);
 	PVE::Tools::file_set_contents($path, encode_json($state));
@@ -192,6 +206,39 @@ sub update_last_runtime {
     });
 }
 
+# saves some properties of the jobcfg into the jobstate so we can track
+# them on different nodes (where the update was not done)
+# and update the last runtime when they change
+sub update_job_props {
+    my ($jobid, $type, $cfg) = @_;
+
+    lock_job_state($jobid, $type, sub {
+	my $old_state = read_job_state($jobid, $type) // $default_state;
+
+	my $updated = 0;
+	for my $prop (@$saved_props) {
+	    my $old_prop = $old_state->{$prop} // '';
+	    my $new_prop = $cfg->{$prop} // '';
+	    next if "$old_prop" eq "$new_prop";
+
+	    if (defined($cfg->{$prop})) {
+		$old_state->{$prop} = $cfg->{$prop};
+	    } else {
+		delete $old_state->{$prop};
+	    }
+
+	    $updated = 1;
+	}
+
+	return if !$updated;
+	$old_state->{updated} = time();
+
+	my $path = $get_state_file->($jobid, $type);
+	PVE::Tools::file_set_contents($path, encode_json($old_state));
+    });
+}
+
+
 sub get_last_runtime {
     my ($jobid, $type) = @_;
 
@@ -265,8 +312,13 @@ sub synchronize_job_states_with_config {
 	for my $id (keys $data->{ids}->%*) {
 	    my $job = $data->{ids}->{$id};
 	    my $type = $job->{type};
-	    my $jobstate = read_job_state($id, $type);
-	    create_job($id, $type) if !defined($jobstate);
+
+	    my $path = $get_state_file->($id, $type);
+	    if (-e $path) {
+		update_job_props($id, $type, $job);
+	    } else {
+		create_job($id, $type, $job);
+	    }
 	}
 
 	PVE::Tools::dir_glob_foreach($state_dir, '(.*?)-(.*).json', sub {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH manager] Jobs: fix scheduling when updating on unrelated nodes
  2022-07-14  7:42 [pve-devel] [PATCH manager] Jobs: fix scheduling when updating on unrelated nodes Dominik Csapak
@ 2022-07-15  8:51 ` Fabian Ebner
  2022-07-15  9:01   ` Dominik Csapak
  0 siblings, 1 reply; 4+ messages in thread
From: Fabian Ebner @ 2022-07-15  8:51 UTC (permalink / raw)
  To: pve-devel, Dominik Csapak

In the subject, I wouldn't call the nodes "unrelated". How about "after
updating job from a different node"?

Am 14.07.22 um 09:42 schrieb Dominik Csapak:
> since the jobs are configured clusterwide in pmxcfs, a user can use any
> node to update the config of them. for some configs (schedule/enabled)
> we need to update the last runtime in the state file, but this
> is sadly only node-local.
> 
> to also update the state file on the other nodes, we introduce
> a new 'update_job_props' function that saves relevant properties from
> the config to the statefile each round of the scheduler if they changed.
> 
> this way, we can detect changes in those and update the last runtime too.
> 
> the only situation where that would not be enough is when a user
> changes schedules and back to the original one within a single minute
> (so between scheduler runs). in that case, the other nodes won't
> detect that change, but it seems to be a rather unlikely edge case
> that we can ignore.

Even with that edge case, there's no effect on when the job actually
runs, or? Just the 'updated' time stamp in the job state will not be
correct (from a global perspective) on the other nodes until the job
runs again.

> 
> if we really want to solve that too, we'd have to save the 'updated'
> timestamp in the config too, just to sync it to the job state file
> later.
> 
> in 'synchronize_job_states_with_config' we switch from reading the
> jobstate unconditionally to check the existing of the statefile
> (which is the only condition where that can return undef anyway)

typos:
s/existing/existence/
'where' should be dropped

> so that we don't read the file multiple times each round.
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>

What about starting_job and started_job? The saved_props are lost when
that function writes its new state. Maybe there should be a wrapper for
updating the job state that always preserves certain properties.

> ---
>  PVE/Jobs.pm | 62 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 57 insertions(+), 5 deletions(-)
> 
> diff --git a/PVE/Jobs.pm b/PVE/Jobs.pm
> index 1091bc22..822f0454 100644
> --- a/PVE/Jobs.pm
> +++ b/PVE/Jobs.pm
> @@ -25,6 +25,8 @@ my $default_state = {
>      time => 0,
>  };
>  
> +my $saved_props = [qw(enabled schedule)];

Maybe move update_job_props to right below here, so the comment
describing the use is closer? Or maybe something like
runtime_updating_props is more descriptive?

> +
>  # lockless, since we use file_get_contents, which is atomic
>  sub read_job_state {
>      my ($jobid, $type) = @_;
> @@ -93,8 +95,15 @@ sub update_job_stopped {
>  		upid => $state->{upid},
>  	    };
>  
> -	    if ($state->{updated}) { # save updated time stamp
> -		$new_state->{updated} = $state->{updated};
> +	    # save some old props
> +	    if (my $updated = $state->{updated}) {
> +		$new_state->{updated} = $updated;
> +	    }
> +
> +	    for my $prop (@$saved_props) {
> +		if (defined($state->{$prop})) {
> +		    $new_state->{$prop} = $state->{$prop};
> +		}
>  	    }
>  
>  	    my $path = $get_state_file->($jobid, $type);
> @@ -105,7 +114,7 @@ sub update_job_stopped {
>  
>  # must be called when the job is first created
>  sub create_job {
> -    my ($jobid, $type) = @_;
> +    my ($jobid, $type, $cfg) = @_;

The caller in PVE/API2/Backup.pm could also be adapted to this change.
Although I suppose any new job will be caught by
synchronize_job_states_with_config, like on nodes different from the one
on which it was created.

>  
>      lock_job_state($jobid, $type, sub {
>  	my $state = read_job_state($jobid, $type) // $default_state;
> @@ -115,6 +124,11 @@ sub create_job {
>  	}
>  
>  	$state->{time} = time();
> +	for my $prop (@$saved_props) {
> +	    if (defined($cfg->{$prop})) {
> +		$state->{$prop} = $cfg->{$prop};
> +	    }
> +	}
>  
>  	my $path = $get_state_file->($jobid, $type);
>  	PVE::Tools::file_set_contents($path, encode_json($state));
> @@ -192,6 +206,39 @@ sub update_last_runtime {
>      });
>  }
>  
> +# saves some properties of the jobcfg into the jobstate so we can track
> +# them on different nodes (where the update was not done)
> +# and update the last runtime when they change
> +sub update_job_props {

update_saved_props or detect_changed_runtime_props might be a bit more
telling

> +    my ($jobid, $type, $cfg) = @_;
> +
> +    lock_job_state($jobid, $type, sub {
> +	my $old_state = read_job_state($jobid, $type) // $default_state;
> +
> +	my $updated = 0;
> +	for my $prop (@$saved_props) {
> +	    my $old_prop = $old_state->{$prop} // '';
> +	    my $new_prop = $cfg->{$prop} // '';
> +	    next if "$old_prop" eq "$new_prop";
> +
> +	    if (defined($cfg->{$prop})) {
> +		$old_state->{$prop} = $cfg->{$prop};
> +	    } else {
> +		delete $old_state->{$prop};
> +	    }
> +
> +	    $updated = 1;
> +	}
> +
> +	return if !$updated;
> +	$old_state->{updated} = time();
> +
> +	my $path = $get_state_file->($jobid, $type);
> +	PVE::Tools::file_set_contents($path, encode_json($old_state));
> +    });
> +}
> +
> +
>  sub get_last_runtime {
>      my ($jobid, $type) = @_;
>  




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH manager] Jobs: fix scheduling when updating on unrelated nodes
  2022-07-15  8:51 ` Fabian Ebner
@ 2022-07-15  9:01   ` Dominik Csapak
  2022-07-15  9:20     ` Fabian Ebner
  0 siblings, 1 reply; 4+ messages in thread
From: Dominik Csapak @ 2022-07-15  9:01 UTC (permalink / raw)
  To: Fabian Ebner, pve-devel



On 7/15/22 10:51, Fabian Ebner wrote:
> In the subject, I wouldn't call the nodes "unrelated". How about "after
> updating job from a different node"?
> 

sure makes sense

> Am 14.07.22 um 09:42 schrieb Dominik Csapak:
>> since the jobs are configured clusterwide in pmxcfs, a user can use any
>> node to update the config of them. for some configs (schedule/enabled)
>> we need to update the last runtime in the state file, but this
>> is sadly only node-local.
>>
>> to also update the state file on the other nodes, we introduce
>> a new 'update_job_props' function that saves relevant properties from
>> the config to the statefile each round of the scheduler if they changed.
>>
>> this way, we can detect changes in those and update the last runtime too.
>>
>> the only situation where that would not be enough is when a user
>> changes schedules and back to the original one within a single minute
>> (so between scheduler runs). in that case, the other nodes won't
>> detect that change, but it seems to be a rather unlikely edge case
>> that we can ignore.
> 
> Even with that edge case, there's no effect on when the job actually
> runs, or? Just the 'updated' time stamp in the job state will not be
> correct (from a global perspective) on the other nodes until the job
> runs again.

the more i think about it, the more i think you're right

for some reason i thought that no updating the timestamp in
this scenario might mean that it can run instantly, but that
can only happen when the last runtime is older than it should be.
(e.g. even when the node is offline/pvescheduler is not running,
we'll (by default) update the timestamp on the first iteration)

so yes, i think we can safely ignore that edge case then
> 
>>
>> if we really want to solve that too, we'd have to save the 'updated'
>> timestamp in the config too, just to sync it to the job state file
>> later.
>>
>> in 'synchronize_job_states_with_config' we switch from reading the
>> jobstate unconditionally to check the existing of the statefile
>> (which is the only condition where that can return undef anyway)
> 
> typos:
> s/existing/existence/
> 'where' should be dropped

ok

> 
>> so that we don't read the file multiple times each round.
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> 
> What about starting_job and started_job? The saved_props are lost when
> that function writes its new state. Maybe there should be a wrapper for
> updating the job state that always preserves certain properties.

i guess you're right, but currently that makes no difference since
we're only concerned with not running too early which is irrelevant
for the starting/started case
(and it'll be synced up again after the next iteration)

> 
>> ---
>>   PVE/Jobs.pm | 62 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 57 insertions(+), 5 deletions(-)
>>
>> diff --git a/PVE/Jobs.pm b/PVE/Jobs.pm
>> index 1091bc22..822f0454 100644
>> --- a/PVE/Jobs.pm
>> +++ b/PVE/Jobs.pm
>> @@ -25,6 +25,8 @@ my $default_state = {
>>       time => 0,
>>   };
>>   
>> +my $saved_props = [qw(enabled schedule)];
> 
> Maybe move update_job_props to right below here, so the comment
> describing the use is closer? Or maybe something like
> runtime_updating_props is more descriptive?
> 
>> +
>>   # lockless, since we use file_get_contents, which is atomic
>>   sub read_job_state {
>>       my ($jobid, $type) = @_;
>> @@ -93,8 +95,15 @@ sub update_job_stopped {
>>   		upid => $state->{upid},
>>   	    };
>>   
>> -	    if ($state->{updated}) { # save updated time stamp
>> -		$new_state->{updated} = $state->{updated};
>> +	    # save some old props
>> +	    if (my $updated = $state->{updated}) {
>> +		$new_state->{updated} = $updated;
>> +	    }
>> +
>> +	    for my $prop (@$saved_props) {
>> +		if (defined($state->{$prop})) {
>> +		    $new_state->{$prop} = $state->{$prop};
>> +		}
>>   	    }
>>   
>>   	    my $path = $get_state_file->($jobid, $type);
>> @@ -105,7 +114,7 @@ sub update_job_stopped {
>>   
>>   # must be called when the job is first created
>>   sub create_job {
>> -    my ($jobid, $type) = @_;
>> +    my ($jobid, $type, $cfg) = @_;
> 
> The caller in PVE/API2/Backup.pm could also be adapted to this change.
> Although I suppose any new job will be caught by
> synchronize_job_states_with_config, like on nodes different from the one
> on which it was created.

true, better still to give the config right away

> 
>>   
>>       lock_job_state($jobid, $type, sub {
>>   	my $state = read_job_state($jobid, $type) // $default_state;
>> @@ -115,6 +124,11 @@ sub create_job {
>>   	}
>>   
>>   	$state->{time} = time();
>> +	for my $prop (@$saved_props) {
>> +	    if (defined($cfg->{$prop})) {
>> +		$state->{$prop} = $cfg->{$prop};
>> +	    }
>> +	}
>>   
>>   	my $path = $get_state_file->($jobid, $type);
>>   	PVE::Tools::file_set_contents($path, encode_json($state));
>> @@ -192,6 +206,39 @@ sub update_last_runtime {
>>       });
>>   }
>>   
>> +# saves some properties of the jobcfg into the jobstate so we can track
>> +# them on different nodes (where the update was not done)
>> +# and update the last runtime when they change
>> +sub update_job_props {
> 
> update_saved_props or detect_changed_runtime_props might be a bit more
> telling

then i'd opt for 'detect_changed_runtime_props' since it's a bit more
verbose imho

> 
>> +    my ($jobid, $type, $cfg) = @_;
>> +
>> +    lock_job_state($jobid, $type, sub {
>> +	my $old_state = read_job_state($jobid, $type) // $default_state;
>> +
>> +	my $updated = 0;
>> +	for my $prop (@$saved_props) {
>> +	    my $old_prop = $old_state->{$prop} // '';
>> +	    my $new_prop = $cfg->{$prop} // '';
>> +	    next if "$old_prop" eq "$new_prop";
>> +
>> +	    if (defined($cfg->{$prop})) {
>> +		$old_state->{$prop} = $cfg->{$prop};
>> +	    } else {
>> +		delete $old_state->{$prop};
>> +	    }
>> +
>> +	    $updated = 1;
>> +	}
>> +
>> +	return if !$updated;
>> +	$old_state->{updated} = time();
>> +
>> +	my $path = $get_state_file->($jobid, $type);
>> +	PVE::Tools::file_set_contents($path, encode_json($old_state));
>> +    });
>> +}
>> +
>> +
>>   sub get_last_runtime {
>>       my ($jobid, $type) = @_;
>>   




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH manager] Jobs: fix scheduling when updating on unrelated nodes
  2022-07-15  9:01   ` Dominik Csapak
@ 2022-07-15  9:20     ` Fabian Ebner
  0 siblings, 0 replies; 4+ messages in thread
From: Fabian Ebner @ 2022-07-15  9:20 UTC (permalink / raw)
  To: Dominik Csapak, pve-devel

Am 15.07.22 um 11:01 schrieb Dominik Csapak> On 7/15/22 10:51, Fabian
Ebner wrote:
>>
>>> so that we don't read the file multiple times each round.
>>>

Could add
Fixes: 530b0a71 ("fix #4053: don't run vzdump jobs when they change from
disabled->enabled")
for completeness.

>>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>>
>> What about starting_job and started_job? The saved_props are lost when
>> that function writes its new state. Maybe there should be a wrapper for
>> updating the job state that always preserves certain properties.
> 
> i guess you're right, but currently that makes no difference since
> we're only concerned with not running too early which is irrelevant
> for the starting/started case
> (and it'll be synced up again after the next iteration)
> 

Hmm, it won't work for (at least) a minutely job. The job will run, the
state will lose the saved_props and then
synchronize_job_states_with_config will update the last runtime in the
next run_jobs, and the job won't run that iteration.

>>> @@ -192,6 +206,39 @@ sub update_last_runtime {
>>>       });
>>>   }
>>>   +# saves some properties of the jobcfg into the jobstate so we can
>>> track
>>> +# them on different nodes (where the update was not done)
>>> +# and update the last runtime when they change
>>> +sub update_job_props {
>>
>> update_saved_props or detect_changed_runtime_props might be a bit more
>> telling
> 
> then i'd opt for 'detect_changed_runtime_props' since it's a bit more
> verbose imho
> 

While the logic for updating the last run time in PVE/API2/Backup.pm's
update_job call is slightly different (won't update when going from
enabled to disabled), I feel like we could switch to (unconditionally)
calling update_job_props there too?




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-15  9:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14  7:42 [pve-devel] [PATCH manager] Jobs: fix scheduling when updating on unrelated nodes Dominik Csapak
2022-07-15  8:51 ` Fabian Ebner
2022-07-15  9:01   ` Dominik Csapak
2022-07-15  9:20     ` Fabian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal