public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fabian Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH v2 zsync 3/3] fix #2821: only abort if there really is a waiting/syncing job
Date: Thu, 17 Dec 2020 15:17:39 +0100	[thread overview]
Message-ID: <20201217141739.22535-3-f.ebner@proxmox.com> (raw)
In-Reply-To: <20201217141739.22535-1-f.ebner@proxmox.com>

by remembering the process via PID+start time+boot ID and checking for that
information in the new instance. If the old instance can't be found, the new
one will continue and register itself in the state.

After updating the pve-zsync package, if there is a waiting instance running the
old version, one more might be created, because there is no instance_id yet. But
the new instance will set the instance_id, which any later instance will see.

More importantly, if the state is wrongly 'waiting' or 'syncing', i.e.
because an instance was terminated before finishing, we don't abort anymore and
recover from the wrong state, thus fixing the bug.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---

Changes from v1:
    * use file reads instead of spawning 'ps'
    * use PID+boot specific start time+boot ID instead of PID+absolute start time
    * collapse get_process_start_time() and get_instance_id() into one function
    * query the ID of the current instance on startup

 pve-zsync | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/pve-zsync b/pve-zsync
index 76e12ce..5c95955 100755
--- a/pve-zsync
+++ b/pve-zsync
@@ -55,6 +55,8 @@ my $TARGETRE = qr!^(?:($HOSTRE):)?(\d+|(?:[\w\-_]+)(/.+)?)$!;
 
 my $DISK_KEY_RE = qr/^(?:(?:(?:virtio|ide|scsi|sata|efidisk|mp)\d+)|rootfs): /;
 
+my $INSTANCE_ID = get_instance_id($$);
+
 my $command = $ARGV[0];
 
 if (defined($command) && $command ne 'help' && $command ne 'printpod') {
@@ -274,6 +276,7 @@ sub add_state_to_job {
     $job->{state} = $state->{state};
     $job->{lsync} = $state->{lsync};
     $job->{vm_type} = $state->{vm_type};
+    $job->{instance_id} = $state->{instance_id};
 
     for (my $i = 0; $state->{"snap$i"}; $i++) {
 	$job->{"snap$i"} = $state->{"snap$i"};
@@ -359,6 +362,7 @@ sub update_state {
     if ($job->{state} ne "del") {
 	$state->{state} = $job->{state};
 	$state->{lsync} = $job->{lsync};
+	$state->{instance_id} = $job->{instance_id};
 	$state->{vm_type} = $job->{vm_type};
 
 	for (my $i = 0; $job->{"snap$i"} ; $i++) {
@@ -571,6 +575,33 @@ sub destroy_job {
     });
 }
 
+sub get_instance_id {
+    my ($pid) = @_;
+
+    my $stat = read_file("/proc/$pid/stat", 1)
+	or die "unable to read process stats\n";
+    my $boot_id = read_file("/proc/sys/kernel/random/boot_id", 1)
+	or die "unable to read boot ID\n";
+
+    my $stats = [ split(/\s+/, $stat) ];
+    my $starttime = $stats->[21];
+    chomp($boot_id);
+
+    return "${pid}:${starttime}:${boot_id}";
+}
+
+sub instance_exists {
+    my ($instance_id) = @_;
+
+    if (defined($instance_id) && $instance_id =~ m/^([1-9][0-9]*):/) {
+	my $pid = $1;
+	my $actual_id = eval { get_instance_id($pid); };
+	return defined($actual_id) && $actual_id eq $instance_id;
+    }
+
+    return 0;
+}
+
 sub sync {
     my ($param) = @_;
 
@@ -580,11 +611,16 @@ sub sync {
 	eval { $job = get_job($param) };
 
 	if ($job) {
-	    if (defined($job->{state}) && ($job->{state} eq "syncing" || $job->{state} eq "waiting")) {
+	    my $state = $job->{state} // 'ok';
+	    $state = 'ok' if !instance_exists($job->{instance_id});
+
+	    if ($state eq "syncing" || $state eq "waiting") {
 		die "Job --source $param->{source} --name $param->{name} is already scheduled to sync\n";
 	    }
 
 	    $job->{state} = "waiting";
+	    $job->{instance_id} = $INSTANCE_ID;
+
 	    update_state($job);
 	}
     });
@@ -658,6 +694,7 @@ sub sync {
 		eval { $job = get_job($param); };
 		if ($job) {
 		    $job->{state} = "error";
+		    delete $job->{instance_id};
 		    update_state($job);
 		}
 	    });
@@ -674,6 +711,7 @@ sub sync {
 		    $job->{state} = "ok";
 		}
 		$job->{lsync} = $date;
+		delete $job->{instance_id};
 		update_state($job);
 	    }
 	});
-- 
2.20.1





  parent reply	other threads:[~2020-12-17 14:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 14:17 [pve-devel] [PATCH v2 zsync 1/3] remove unused function write_cron Fabian Ebner
2020-12-17 14:17 ` [pve-devel] [PATCH v2 zsync 2/3] introduce and use read_file helper Fabian Ebner
2020-12-18 16:44   ` Thomas Lamprecht
2020-12-17 14:17 ` Fabian Ebner [this message]
2020-12-18 16:43 ` [pve-devel] applied-series: Re: [PATCH v2 zsync 1/3] remove unused function write_cron Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201217141739.22535-3-f.ebner@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal