public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
* [PVE-User] Replication blocked issue
@ 2021-04-28 15:34 Bertorello, Marco
  0 siblings, 0 replies; only message in thread
From: Bertorello, Marco @ 2021-04-28 15:34 UTC (permalink / raw)
  To: pve-user

Dear PVE users,

I've a 3-nodes clusters, with ZFS storage.
Every node use it's own storage and the VMs/LXCs are replicated across 
other nodes every 10 minutes.

Some times happens that a replica job is running without an end.

For example at the moment I have a replication started yesterday:

2021-04-27 07:20:01 101-1: start replication job
2021-04-27 07:20:01 101-1: guest => CT 101, running => 1
2021-04-27 07:20:01 101-1: volumes => DS1:subvol-101-disk-1
2021-04-27 07:20:02 101-1: freeze guest filesystem
2021-04-27 07:20:05 101-1: create snapshot 
'__replicate_101-1_1619500801__' on DS1:subvol-101-disk-1
2021-04-27 07:20:06 101-1: thaw guest filesystem
2021-04-27 07:20:06 101-1: using secure transmission, rate limit: none
2021-04-27 07:20:06 101-1: incremental sync 'DS1:subvol-101-disk-1' 
(__replicate_101-1_1619500201__ => __replicate_101-1_1619500801__)
2021-04-27 07:20:08 101-1: send from @__replicate_101-1_1619500201__ to 
zp1/subvol-101-disk-1@__replicate_101-0_1619500211__ estimated size is 213K
2021-04-27 07:20:08 101-1: send from @__replicate_101-0_1619500211__ to 
zp1/subvol-101-disk-1@__replicate_101-1_1619500801__ estimated size is 26.1M
2021-04-27 07:20:08 101-1: total estimated size is 26.4M
2021-04-27 07:20:09 101-1: TIME        SENT   SNAPSHOT 
zp1/subvol-101-disk-1@__replicate_101-1_1619500801__
2021-04-27 07:20:09 101-1: 07:20:09   3.18M 
zp1/subvol-101-disk-1@__replicate_101-1_1619500801__
[...]
2021-04-28 17:27:25 101-1: 17:27:25   3.18M 
zp1/subvol-101-disk-1@__replicate_101-1_1619500801__
2021-04-28 17:27:26 101-1: 17:27:26   3.18M 
zp1/subvol-101-disk-1@__replicate_101-1_1619500801__
2021-04-28 17:27:27 101-1: 17:27:27   3.18M 
zp1/subvol-101-disk-1@__replicate_101-1_1619500801__

as you can see, no progress in this time slot, still 3.18M transferred.

There are 2 big problems with this:

1) the blocked replica prevents the other replication scheduled on the 
source node to run until this replication ends or fail

2) I've no other solution but reboot the destination node to exit this 
situation.

I tried to kill the process on the destination node, but the process is 
in D state and cannot be killed.
There is a way to get out this scenario without reboot nodes?

Thanks a lot and best regards,

-- 
Marco Bertorello
https://www.marcobertorello.it






^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-04-28 15:34 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28 15:34 [PVE-User] Replication blocked issue Bertorello, Marco

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal