From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 5A4D91FF141 for ; Fri, 13 Feb 2026 10:10:04 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 51B3431240; Fri, 13 Feb 2026 10:10:30 +0100 (CET) From: Marco Gaiarin Subject: Replication error: dataset is busy Date: Fri, 13 Feb 2026 09:43:08 +0100 Organization: Il gaio usa sempre TIN per le liste, fallo anche tu!!! Message-ID: X-Trace: eraldo.lilliput.linux.it 1770972377 2355038 192.168.24.2 (13 Feb 2026 08:46:17 GMT) X-Mailer: tin/2.6.4-20240224 ("Banff") (Linux/6.17.0-14-generic (x86_64)) X-Gateway-System: SmartGate 1.4.5 To: pve-user@lists.proxmox.com X-SPAM-LEVEL: Spam detection results: 0 AWL 0.347 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_PASS -0.1 DMARC pass policy FILL_THIS_FORM 0.001 Fill in a form with personal information JMQ_SPF_NEUTRAL 0.5 SPF set to ?all KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_PASS -0.001 SPF: HELO matches SPF record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: Q3PMGVLVSJI65ZHQNMR6NNI66LCHG3DO X-Message-ID-Hash: Q3PMGVLVSJI65ZHQNMR6NNI66LCHG3DO X-MailFrom: gaio@lilliput.linux.it X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE user list List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Situation: couple of PVE nodes, with a direct link between the two to manage replica and migration. For an hardware failure, the NIC on one of the server (cnpve2) failed, and server need to be powered off. After node cnpve2 reboot, all replica recovered, apart one (runing on cnpve2): 2026-02-13 09:26:01 121-0: start replication job 2026-02-13 09:26:01 121-0: guest => VM 121, running => 4345 2026-02-13 09:26:01 121-0: volumes => local-zfs:vm-121-disk-0,rpool-data:vm-121-disk-0,rpool-data:vm-121-disk-1 2026-02-13 09:26:04 121-0: freeze guest filesystem 2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on local-zfs:vm-121-disk-0 2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-0 2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-1 2026-02-13 09:26:06 121-0: thaw guest filesystem 2026-02-13 09:26:06 121-0: using insecure transmission, rate limit: 10 MByte/s 2026-02-13 09:26:06 121-0: incremental sync 'local-zfs:vm-121-disk-0' (__replicate_121-0_1770876001__ => __replicate_121-0_1770971161__) 2026-02-13 09:26:06 121-0: using a bandwidth limit of 10000000 bytes per second for transferring 'local-zfs:vm-121-disk-0' 2026-02-13 09:26:08 121-0: send from @__replicate_121-0_1770876001__ to rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__ estimated size is 2.76G 2026-02-13 09:26:08 121-0: total estimated size is 2.76G 2026-02-13 09:26:08 121-0: TIME SENT SNAPSHOT rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__ 2026-02-13 09:26:08 121-0: 663540 B 648.0 KB 0.69 s 964531 B/s 941.92 KB/s 2026-02-13 09:26:08 121-0: write: Broken pipe 2026-02-13 09:26:08 121-0: warning: cannot send 'rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__': signal received 2026-02-13 09:26:08 121-0: cannot send 'rpool/data/vm-121-disk-0': I/O error 2026-02-13 09:26:08 121-0: command 'zfs send -Rpv -I __replicate_121-0_1770876001__ -- rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__' failed: exit code 1 2026-02-13 09:26:08 121-0: [cnpve1] cannot receive incremental stream: dataset is busy 2026-02-13 09:26:08 121-0: [cnpve1] command 'zfs recv -F -- rpool/data/vm-121-disk-0' failed: exit code 1 2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on local-zfs:vm-121-disk-0 2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-0 2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-1 2026-02-13 09:26:08 121-0: end replication job with error: failed to run insecure migration: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=cnpve1' -o 'UserKnownHostsFile=/etc/pve/nodes/cnpve1/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@10.10.251.21 -- pvesm import local-zfs:vm-121-disk-0 zfs tcp://10.10.251.0/24 -with-snapshots 1 -snapshot __replicate_121-0_1770971161__ -allow-rename 0 -base __replicate_121-0_1770876001__' failed: exit code 255 On the rebooted node there's no holds: root@cnpve2:~# zfs list -t snapshot | grep 121 rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__ 5.99G - 2.49T - rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__ 115M - 22.1G - rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__ 1.57G - 35.4G - root@cnpve2:~# zfs holds rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP root@cnpve2:~# zfs holds rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP root@cnpve2:~# zfs holds rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP on the opposite node too: root@cnpve1:~# zfs list -t snapshot | grep 121 rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__ 0B - 2.49T - rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__ 0B - 22.1G - rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__ 0B - 35.4G - root@cnpve1:~# zfs holds rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP root@cnpve1:~# zfs holds rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP root@cnpve1:~# zfs holds rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP It is clear that somethink remains 'locked' on the non-rebooted node (cnpve1), but how identify and unlock them? Thanks. --