From: Marco Gaiarin <gaio@lilliput.linux.it>
To: pve-user@lists.proxmox.com
Subject: Replication error: dataset is busy
Date: Fri, 13 Feb 2026 09:43:08 +0100 [thread overview]
Message-ID: <qp436m-odc.ln1@leia.lilliput.linux.it> (raw)
Situation: couple of PVE nodes, with a direct link between the two to
manage replica and migration.
For an hardware failure, the NIC on one of the server (cnpve2) failed, and
server need to be powered off.
After node cnpve2 reboot, all replica recovered, apart one (runing on cnpve2):
2026-02-13 09:26:01 121-0: start replication job
2026-02-13 09:26:01 121-0: guest => VM 121, running => 4345
2026-02-13 09:26:01 121-0: volumes => local-zfs:vm-121-disk-0,rpool-data:vm-121-disk-0,rpool-data:vm-121-disk-1
2026-02-13 09:26:04 121-0: freeze guest filesystem
2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on local-zfs:vm-121-disk-0
2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-0
2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-1
2026-02-13 09:26:06 121-0: thaw guest filesystem
2026-02-13 09:26:06 121-0: using insecure transmission, rate limit: 10 MByte/s
2026-02-13 09:26:06 121-0: incremental sync 'local-zfs:vm-121-disk-0' (__replicate_121-0_1770876001__ => __replicate_121-0_1770971161__)
2026-02-13 09:26:06 121-0: using a bandwidth limit of 10000000 bytes per second for transferring 'local-zfs:vm-121-disk-0'
2026-02-13 09:26:08 121-0: send from @__replicate_121-0_1770876001__ to rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__ estimated size is 2.76G
2026-02-13 09:26:08 121-0: total estimated size is 2.76G
2026-02-13 09:26:08 121-0: TIME SENT SNAPSHOT rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__
2026-02-13 09:26:08 121-0: 663540 B 648.0 KB 0.69 s 964531 B/s 941.92 KB/s
2026-02-13 09:26:08 121-0: write: Broken pipe
2026-02-13 09:26:08 121-0: warning: cannot send 'rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__': signal received
2026-02-13 09:26:08 121-0: cannot send 'rpool/data/vm-121-disk-0': I/O error
2026-02-13 09:26:08 121-0: command 'zfs send -Rpv -I __replicate_121-0_1770876001__ -- rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__' failed: exit code 1
2026-02-13 09:26:08 121-0: [cnpve1] cannot receive incremental stream: dataset is busy
2026-02-13 09:26:08 121-0: [cnpve1] command 'zfs recv -F -- rpool/data/vm-121-disk-0' failed: exit code 1
2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on local-zfs:vm-121-disk-0
2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-0
2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-1
2026-02-13 09:26:08 121-0: end replication job with error: failed to run insecure migration: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=cnpve1' -o 'UserKnownHostsFile=/etc/pve/nodes/cnpve1/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@10.10.251.21 -- pvesm import local-zfs:vm-121-disk-0 zfs tcp://10.10.251.0/24 -with-snapshots 1 -snapshot __replicate_121-0_1770971161__ -allow-rename 0 -base __replicate_121-0_1770876001__' failed: exit code 255
On the rebooted node there's no holds:
root@cnpve2:~# zfs list -t snapshot | grep 121
rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__ 5.99G - 2.49T -
rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__ 115M - 22.1G -
rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__ 1.57G - 35.4G -
root@cnpve2:~# zfs holds rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__
NAME TAG TIMESTAMP
root@cnpve2:~# zfs holds rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__
NAME TAG TIMESTAMP
root@cnpve2:~# zfs holds rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__
NAME TAG TIMESTAMP
on the opposite node too:
root@cnpve1:~# zfs list -t snapshot | grep 121
rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__ 0B - 2.49T -
rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__ 0B - 22.1G -
rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__ 0B - 35.4G -
root@cnpve1:~# zfs holds rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__
NAME TAG TIMESTAMP
root@cnpve1:~# zfs holds rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__
NAME TAG TIMESTAMP
root@cnpve1:~# zfs holds rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__
NAME TAG TIMESTAMP
It is clear that somethink remains 'locked' on the non-rebooted node
(cnpve1), but how identify and unlock them?
Thanks.
--
reply other threads:[~2026-02-13 9:10 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=qp436m-odc.ln1@leia.lilliput.linux.it \
--to=gaio@lilliput.linux.it \
--cc=pve-user@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.