* [PVE-User] Replication failed, got tiemout?
@ 2022-03-31 16:24 Marco Gaiarin
2022-04-04 8:17 ` Marco Gaiarin
0 siblings, 1 reply; 4+ messages in thread
From: Marco Gaiarin @ 2022-03-31 16:24 UTC (permalink / raw)
To: pve-user
New installed PVE6 2-node cluster, totally unloaded; only some test VMs that
are replicated between the two nodes, conected via a 10G direct cable.
Sometimes we get:
command 'zfs snapshot rpool/data/vm-103-disk-0@__replicate_103-0_1648656014__' failed: got timeout
What can be?! Thanks.
--
Ho ancora la forza di non tirarmi indietro, [...]
di far la conta degli amici andati e dire ``ci vediam più tardi''
(F. Guccini)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PVE-User] Replication failed, got tiemout?
2022-03-31 16:24 [PVE-User] Replication failed, got tiemout? Marco Gaiarin
@ 2022-04-04 8:17 ` Marco Gaiarin
2022-04-05 7:26 ` Aaron Lauterer
0 siblings, 1 reply; 4+ messages in thread
From: Marco Gaiarin @ 2022-04-04 8:17 UTC (permalink / raw)
To: Marco Gaiarin; +Cc: pve-user
> New installed PVE6 2-node cluster, totally unloaded; only some test VMs that
> are replicated between the two nodes, conected via a 10G direct cable.
> Sometimes we get:
> command 'zfs snapshot rpool/data/vm-103-disk-0@__replicate_103-0_1648656014__' failed: got timeout
> What can be?! Thanks.
We catch a log on /var/log/pve/replicate/, but seems, at least to me, not
providing some more clue:
2022-04-02 14:00:14 103-0: start replication job
2022-04-02 14:00:14 103-0: guest => VM 103, running => 5167
2022-04-02 14:00:14 103-0: volumes => local-zfs:vm-103-disk-0
2022-04-02 14:00:16 103-0: create snapshot '__replicate_103-0_1648900814__' on local-zfs:vm-103-disk-0
2022-04-02 14:00:21 103-0: end replication job with error: command 'zfs snapshot rpool/data/vm-103-disk-0@__replicate_103-0_1648900814__' failed: got timeout
I'm seeking info. Thanks.
--
Fino a quando il colore della pelle sarà più importante del colore
degli occhi, sarà sempre guerra. (Bob Marley)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PVE-User] Replication failed, got tiemout?
2022-04-04 8:17 ` Marco Gaiarin
@ 2022-04-05 7:26 ` Aaron Lauterer
2022-04-05 15:55 ` Marco Gaiarin
0 siblings, 1 reply; 4+ messages in thread
From: Aaron Lauterer @ 2022-04-05 7:26 UTC (permalink / raw)
To: Proxmox VE user list, Marco Gaiarin
Is the pool using HDDs? Could be that other things are happening at that moment and HDDs are really not great for random IO. I had that as well sometimes. Went away when I changed to SSDs. A dedicated special device vdev on (mirrored) SSDs should also improve the situation while not needing as many SSDs. Snapshots are a metadata operation. See [0] or `man zpoolconcepts` and look for "special device"
Cheers Aaron
[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_zfs_special_device
On 4/4/22 10:17, Marco Gaiarin wrote:
>> New installed PVE6 2-node cluster, totally unloaded; only some test VMs that
>> are replicated between the two nodes, conected via a 10G direct cable.
>> Sometimes we get:
>> command 'zfs snapshot rpool/data/vm-103-disk-0@__replicate_103-0_1648656014__' failed: got timeout
>> What can be?! Thanks.
>
> We catch a log on /var/log/pve/replicate/, but seems, at least to me, not
> providing some more clue:
>
> 2022-04-02 14:00:14 103-0: start replication job
> 2022-04-02 14:00:14 103-0: guest => VM 103, running => 5167
> 2022-04-02 14:00:14 103-0: volumes => local-zfs:vm-103-disk-0
> 2022-04-02 14:00:16 103-0: create snapshot '__replicate_103-0_1648900814__' on local-zfs:vm-103-disk-0
> 2022-04-02 14:00:21 103-0: end replication job with error: command 'zfs snapshot rpool/data/vm-103-disk-0@__replicate_103-0_1648900814__' failed: got timeout
>
> I'm seeking info. Thanks.
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PVE-User] Replication failed, got tiemout?
2022-04-05 7:26 ` Aaron Lauterer
@ 2022-04-05 15:55 ` Marco Gaiarin
0 siblings, 0 replies; 4+ messages in thread
From: Marco Gaiarin @ 2022-04-05 15:55 UTC (permalink / raw)
To: Aaron Lauterer, pve-user
Mandi! Aaron Lauterer
In chel di` si favelave...
> Is the pool using HDDs?
Yes. After fiddling a bit, we are supposing an IO peak trouble, this
confirm all.
Because we don't need normally strict replication timing, for now we
have limited the bandwidth, and seems works.
Thanks for all the info.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-04-05 16:00 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-31 16:24 [PVE-User] Replication failed, got tiemout? Marco Gaiarin
2022-04-04 8:17 ` Marco Gaiarin
2022-04-05 7:26 ` Aaron Lauterer
2022-04-05 15:55 ` Marco Gaiarin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox