From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 8C83E70F16 for ; Tue, 7 Jun 2022 11:12:15 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 851802BB96 for ; Tue, 7 Jun 2022 11:12:15 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 0A9342BB8D for ; Tue, 7 Jun 2022 11:12:15 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id D847F42BA3 for ; Tue, 7 Jun 2022 11:12:14 +0200 (CEST) Message-ID: Date: Tue, 7 Jun 2022 11:12:14 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Content-Language: en-US To: pve-devel@lists.proxmox.com, Dominik Csapak References: <20220603071630.374408-1-d.csapak@proxmox.com> From: Fabian Ebner In-Reply-To: <20220603071630.374408-1-d.csapak@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 1.510 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -2.889 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: Re: [pve-devel] [PATCH guest-common v2 1/2] ReplicationState: purge state from non local vms X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2022 09:12:15 -0000 Am 03.06.22 um 09:16 schrieb Dominik Csapak: > when running replication, we don't want to keep replication states for > non-local vms. Normally this would not be a problem, since on migration, > we transfer the states anyway, but when the ha-manager steals a vm, it > cannot do that. In that case, having an old state lying around is > harmful, since the code does not expect the state to be out-of-sync > with the actual snapshots on disk. > > One such problem is the following: > > Replicate vm 100 from node A to node B and C, and activate HA. When node > A dies, it will be relocated to e.g. node B and start replicate from > there. If node B now had an old state lying around for it's sync to node > C, it might delete the common base snapshots of B and C and cannot sync > again. To be even more robust, we could ensure that the last_sync snapshot mentioned in the job state is actually present before starting to remove replication snapshots in prepare() on the source side, or change it to only remove older snapshots. But prepare() is also used on the target side to remove stale volumes, so we'd have to be careful not to break the logic for that. I'm working on the v2 of a series for improving removal of stale volumes anyways, so I'll see if I can add something there. > > Deleting the state for all non local guests fixes that issue, since it > always starts fresh, and the potentially existing old state cannot be > valid anyway since we just relocated the vm here (from a dead node). > > Signed-off-by: Dominik Csapak > Reviewed-by: Fabian Grünbichler Both patches: Reviewed-by: Fabian Ebner