all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* Replication error: dataset is busy
@ 2026-02-13  8:43 Marco Gaiarin
  0 siblings, 0 replies; only message in thread
From: Marco Gaiarin @ 2026-02-13  8:43 UTC (permalink / raw)
  To: pve-user


Situation: couple of PVE nodes, with a direct link between the two to
manage replica and migration.
For an hardware failure, the NIC on one of the server (cnpve2) failed, and
server need to be powered off.

After node cnpve2 reboot, all replica recovered, apart one (runing on cnpve2):

 2026-02-13 09:26:01 121-0: start replication job
 2026-02-13 09:26:01 121-0: guest => VM 121, running => 4345
 2026-02-13 09:26:01 121-0: volumes => local-zfs:vm-121-disk-0,rpool-data:vm-121-disk-0,rpool-data:vm-121-disk-1
 2026-02-13 09:26:04 121-0: freeze guest filesystem
 2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on local-zfs:vm-121-disk-0
 2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-0
 2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-1
 2026-02-13 09:26:06 121-0: thaw guest filesystem
 2026-02-13 09:26:06 121-0: using insecure transmission, rate limit: 10 MByte/s
 2026-02-13 09:26:06 121-0: incremental sync 'local-zfs:vm-121-disk-0' (__replicate_121-0_1770876001__ => __replicate_121-0_1770971161__)
 2026-02-13 09:26:06 121-0: using a bandwidth limit of 10000000 bytes per second for transferring 'local-zfs:vm-121-disk-0'
 2026-02-13 09:26:08 121-0: send from @__replicate_121-0_1770876001__ to rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__ estimated size is 2.76G
 2026-02-13 09:26:08 121-0: total estimated size is 2.76G
 2026-02-13 09:26:08 121-0: TIME        SENT   SNAPSHOT rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__
 2026-02-13 09:26:08 121-0: 663540 B 648.0 KB 0.69 s 964531 B/s 941.92 KB/s
 2026-02-13 09:26:08 121-0: write: Broken pipe
 2026-02-13 09:26:08 121-0: warning: cannot send 'rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__': signal received
 2026-02-13 09:26:08 121-0: cannot send 'rpool/data/vm-121-disk-0': I/O error
 2026-02-13 09:26:08 121-0: command 'zfs send -Rpv -I __replicate_121-0_1770876001__ -- rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__' failed: exit code 1
 2026-02-13 09:26:08 121-0: [cnpve1] cannot receive incremental stream: dataset is busy
 2026-02-13 09:26:08 121-0: [cnpve1] command 'zfs recv -F -- rpool/data/vm-121-disk-0' failed: exit code 1
 2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on local-zfs:vm-121-disk-0
 2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-0
 2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-1
 2026-02-13 09:26:08 121-0: end replication job with error: failed to run insecure migration: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=cnpve1' -o 'UserKnownHostsFile=/etc/pve/nodes/cnpve1/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@10.10.251.21 -- pvesm import local-zfs:vm-121-disk-0 zfs tcp://10.10.251.0/24 -with-snapshots 1 -snapshot __replicate_121-0_1770971161__ -allow-rename 0 -base __replicate_121-0_1770876001__' failed: exit code 255

On the rebooted node there's no holds:

 root@cnpve2:~# zfs list -t snapshot | grep 121
 rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__      5.99G      -  2.49T  -
 rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__       115M      -  22.1G  -
 rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__      1.57G      -  35.4G  -
 root@cnpve2:~# zfs holds rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__
 NAME                                                     TAG  TIMESTAMP
 root@cnpve2:~# zfs holds rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__
 NAME                                                     TAG  TIMESTAMP
 root@cnpve2:~# zfs holds rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__
 NAME                                                     TAG  TIMESTAMP

on the opposite node too:

 root@cnpve1:~# zfs list -t snapshot | grep 121
 rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__         0B      -  2.49T  -
 rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__         0B      -  22.1G  -
 rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__         0B      -  35.4G  -
 root@cnpve1:~# zfs holds rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__
 NAME                                                     TAG  TIMESTAMP
 root@cnpve1:~# zfs holds rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__
 NAME                                                     TAG  TIMESTAMP
 root@cnpve1:~# zfs holds rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__
 NAME                                                     TAG  TIMESTAMP

It is clear that somethink remains 'locked' on the non-rebooted node
(cnpve1), but how identify and unlock them?


Thanks.

-- 





^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-02-13  9:10 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-13  8:43 Replication error: dataset is busy Marco Gaiarin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal