public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Fabian Ebner <f.ebner@proxmox.com>
Subject: Re: [pve-devel] [PATCH zsync] fix #2821: only abort if there really is a waiting/syncing job instance already
Date: Mon, 14 Dec 2020 14:47:30 +0100	[thread overview]
Message-ID: <5c12de59-cec9-d1f3-9c3d-17a99c67e872@proxmox.com> (raw)
In-Reply-To: <20201214130039.9997-1-f.ebner@proxmox.com>

On 14.12.20 14:00, Fabian Ebner wrote:
> By remembering the instance via PID and start time and checking for that
> information in later instances. If the old instance can't be found, the new one
> will continue and register itself in the state.
> 
> After updating, if there is a waiting instance running the old version, one more
> might be created, because there is no instance_id yet. But the new instance will
> set the instance_id, which any later instance will see.
> 
> More importantly, if the state is wrongly 'waiting' or 'syncing', e.g.
> because an instance was terminated before finishing, we don't abort anymore and
> recover from the wrong state, thus fixing the bug.
> 
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
> 
> I couldn't find a better unique identifier that can be easily verfied from
> within another instance, but PID and start time should be good enough for the
> intended purpose.
> 
> Another alternative would be to introduce job-specific locking around the whole
> sync() block, but then we would have some three lock-level deep code...
> 
> @Thomas: I felt like this was more complete than the "clear state after boot"-
> solution, because it also works when the processes are killed for different
> reasons than during shutdown.

that's true, and it seems like a quite nice and short approach to me, great!

> 
>  pve-zsync | 40 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/pve-zsync b/pve-zsync
> index f3b98c4..506bfab 100755
> --- a/pve-zsync
> +++ b/pve-zsync
> @@ -266,6 +266,7 @@ sub add_state_to_job {
>      $job->{state} = $state->{state};
>      $job->{lsync} = $state->{lsync};
>      $job->{vm_type} = $state->{vm_type};
> +    $job->{instance_id} = $state->{instance_id};
>  
>      for (my $i = 0; $state->{"snap$i"}; $i++) {
>  	$job->{"snap$i"} = $state->{"snap$i"};
> @@ -365,6 +366,7 @@ sub update_state {
>      if ($job->{state} ne "del") {
>  	$state->{state} = $job->{state};
>  	$state->{lsync} = $job->{lsync};
> +	$state->{instance_id} = $job->{instance_id};
>  	$state->{vm_type} = $job->{vm_type};
>  
>  	for (my $i = 0; $job->{"snap$i"} ; $i++) {
> @@ -584,6 +586,33 @@ sub destroy_job {
>      });
>  }
>  
> +sub get_process_start_time {
> +    my ($pid) = @_;
> +
> +    return eval { run_cmd(['ps', '-o', 'lstart=', '-p', "$pid"]); };

instead of fork+exec do a much cheaper file read?

I.e., copying over file_read_firstline from PVE::Tools then:

sub get_process_start_time {
    my $stat_str = file_read_firstline("/proc/$pid/stat");
    my $stat = [ split(/\s+/, $stat_str) ];

    return $stat->[21];
}

plus some error handling (note I did not test above)

> +}
> +
> +sub get_instance_id {
> +    my ($pid) = @_;
> +
> +    my $starttime = get_process_start_time($pid)
> +	or die "could not determine start time for process '$pid'\n";
> +
> +    return "${pid}:${starttime}";
> +}
> +
> +sub instance_exists {
> +    my ($instance_id) = @_;
> +
> +    if (defined($instance_id) && $instance_id =~ m/^([1-9][0-9]*):(.*)$/) {
> +	my ($pid, $starttime) = ($1, $2);
> +	my $actual_starttime = get_process_start_time($pid);
> +	return defined($actual_starttime) && $starttime eq $actual_starttime;
> +    }
> +
> +    return 0;
> +}
> +
>  sub sync {
>      my ($param) = @_;
>  
> @@ -593,11 +622,18 @@ sub sync {
>  	eval { $job = get_job($param) };
>  
>  	if ($job) {
> -	    if (defined($job->{state}) && ($job->{state} eq "syncing" || $job->{state} eq "waiting")) {
> +	    my $state = $job->{state} // 'ok';
> +	    $state = 'ok' if !instance_exists($job->{instance_id});
> +
> +	    if ($state eq "syncing" || $state eq "waiting") {
>  		die "Job --source $param->{source} --name $param->{name} is already scheduled to sync\n";
>  	    }
>  
>  	    $job->{state} = "waiting";
> +
> +	    eval { $job->{instance_id} = get_instance_id($$); };

I'd query and cache the local instance ID from the current process on startup, this
would have the nice side effect of avoiding error potential here completely

> +	    warn "Could not set instance ID - $@" if $@;
> +
>  	    update_state($job);
>  	}
>      });
> @@ -671,6 +707,7 @@ sub sync {
>  		eval { $job = get_job($param); };
>  		if ($job) {
>  		    $job->{state} = "error";
> +		    delete $job->{instance_id};
>  		    update_state($job);
>  		}
>  	    });
> @@ -687,6 +724,7 @@ sub sync {
>  		    $job->{state} = "ok";
>  		}
>  		$job->{lsync} = $date;
> +		delete $job->{instance_id};
>  		update_state($job);
>  	    }
>  	});
> 






  reply	other threads:[~2020-12-14 13:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-14 13:00 Fabian Ebner
2020-12-14 13:47 ` Thomas Lamprecht [this message]
2020-12-17  8:40   ` Fabian Ebner
2020-12-17  9:23     ` Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c12de59-cec9-d1f3-9c3d-17a99c67e872@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal