all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Fabian Ebner <f.ebner@proxmox.com>
Subject: Re: [pve-devel] [PATCH zsync] fix #2821: only abort if there really is a waiting/syncing job instance already
Date: Mon, 14 Dec 2020 14:47:30 +0100	[thread overview]
Message-ID: <5c12de59-cec9-d1f3-9c3d-17a99c67e872@proxmox.com> (raw)
In-Reply-To: <20201214130039.9997-1-f.ebner@proxmox.com>

On 14.12.20 14:00, Fabian Ebner wrote:
> By remembering the instance via PID and start time and checking for that
> information in later instances. If the old instance can't be found, the new one
> will continue and register itself in the state.
> 
> After updating, if there is a waiting instance running the old version, one more
> might be created, because there is no instance_id yet. But the new instance will
> set the instance_id, which any later instance will see.
> 
> More importantly, if the state is wrongly 'waiting' or 'syncing', e.g.
> because an instance was terminated before finishing, we don't abort anymore and
> recover from the wrong state, thus fixing the bug.
> 
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
> 
> I couldn't find a better unique identifier that can be easily verfied from
> within another instance, but PID and start time should be good enough for the
> intended purpose.
> 
> Another alternative would be to introduce job-specific locking around the whole
> sync() block, but then we would have some three lock-level deep code...
> 
> @Thomas: I felt like this was more complete than the "clear state after boot"-
> solution, because it also works when the processes are killed for different
> reasons than during shutdown.

that's true, and it seems like a quite nice and short approach to me, great!

> 
>  pve-zsync | 40 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/pve-zsync b/pve-zsync
> index f3b98c4..506bfab 100755
> --- a/pve-zsync
> +++ b/pve-zsync
> @@ -266,6 +266,7 @@ sub add_state_to_job {
>      $job->{state} = $state->{state};
>      $job->{lsync} = $state->{lsync};
>      $job->{vm_type} = $state->{vm_type};
> +    $job->{instance_id} = $state->{instance_id};
>  
>      for (my $i = 0; $state->{"snap$i"}; $i++) {
>  	$job->{"snap$i"} = $state->{"snap$i"};
> @@ -365,6 +366,7 @@ sub update_state {
>      if ($job->{state} ne "del") {
>  	$state->{state} = $job->{state};
>  	$state->{lsync} = $job->{lsync};
> +	$state->{instance_id} = $job->{instance_id};
>  	$state->{vm_type} = $job->{vm_type};
>  
>  	for (my $i = 0; $job->{"snap$i"} ; $i++) {
> @@ -584,6 +586,33 @@ sub destroy_job {
>      });
>  }
>  
> +sub get_process_start_time {
> +    my ($pid) = @_;
> +
> +    return eval { run_cmd(['ps', '-o', 'lstart=', '-p', "$pid"]); };

instead of fork+exec do a much cheaper file read?

I.e., copying over file_read_firstline from PVE::Tools then:

sub get_process_start_time {
    my $stat_str = file_read_firstline("/proc/$pid/stat");
    my $stat = [ split(/\s+/, $stat_str) ];

    return $stat->[21];
}

plus some error handling (note I did not test above)

> +}
> +
> +sub get_instance_id {
> +    my ($pid) = @_;
> +
> +    my $starttime = get_process_start_time($pid)
> +	or die "could not determine start time for process '$pid'\n";
> +
> +    return "${pid}:${starttime}";
> +}
> +
> +sub instance_exists {
> +    my ($instance_id) = @_;
> +
> +    if (defined($instance_id) && $instance_id =~ m/^([1-9][0-9]*):(.*)$/) {
> +	my ($pid, $starttime) = ($1, $2);
> +	my $actual_starttime = get_process_start_time($pid);
> +	return defined($actual_starttime) && $starttime eq $actual_starttime;
> +    }
> +
> +    return 0;
> +}
> +
>  sub sync {
>      my ($param) = @_;
>  
> @@ -593,11 +622,18 @@ sub sync {
>  	eval { $job = get_job($param) };
>  
>  	if ($job) {
> -	    if (defined($job->{state}) && ($job->{state} eq "syncing" || $job->{state} eq "waiting")) {
> +	    my $state = $job->{state} // 'ok';
> +	    $state = 'ok' if !instance_exists($job->{instance_id});
> +
> +	    if ($state eq "syncing" || $state eq "waiting") {
>  		die "Job --source $param->{source} --name $param->{name} is already scheduled to sync\n";
>  	    }
>  
>  	    $job->{state} = "waiting";
> +
> +	    eval { $job->{instance_id} = get_instance_id($$); };

I'd query and cache the local instance ID from the current process on startup, this
would have the nice side effect of avoiding error potential here completely

> +	    warn "Could not set instance ID - $@" if $@;
> +
>  	    update_state($job);
>  	}
>      });
> @@ -671,6 +707,7 @@ sub sync {
>  		eval { $job = get_job($param); };
>  		if ($job) {
>  		    $job->{state} = "error";
> +		    delete $job->{instance_id};
>  		    update_state($job);
>  		}
>  	    });
> @@ -687,6 +724,7 @@ sub sync {
>  		    $job->{state} = "ok";
>  		}
>  		$job->{lsync} = $date;
> +		delete $job->{instance_id};
>  		update_state($job);
>  	    }
>  	});
> 






  reply	other threads:[~2020-12-14 13:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-14 13:00 Fabian Ebner
2020-12-14 13:47 ` Thomas Lamprecht [this message]
2020-12-17  8:40   ` Fabian Ebner
2020-12-17  9:23     ` Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c12de59-cec9-d1f3-9c3d-17a99c67e872@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal