* Re: [pve-devel] How does proxmox handle loss of connection / reboot of iSCSI storage
[not found] <DM6PR17MB34665CA6AB7E651E1675C791D01EA@DM6PR17MB3466.namprd17.prod.outlook.com>
@ 2025-09-26 13:32 ` Max R. Carrara
2025-09-26 16:41 ` Lorne Guse via pve-devel
0 siblings, 1 reply; 3+ messages in thread
From: Max R. Carrara @ 2025-09-26 13:32 UTC (permalink / raw)
To: Lorne Guse, Proxmox VE development discussion
On Fri Sep 26, 2025 at 4:06 AM CEST, Lorne Guse wrote:
> RE: TrueNAS over iSCSI Custom Storage Plugin
>
> TrueNAS has asked me to investigate how Proxmox reacts to reboot of the storage server while VMs and cluster are active. This is especially relevant for updates to TrueNAS.
>
> >The one test we'd like to see work is reboot of TrueNAS node while VMs and cluster are operational… does it it "resume" cleanly? A TrueNAS software update will be similar.
>
> I don't think the storage plugin is responsible for this level of interaction with the storage server. Is there anything that can be done at the storage plugin level to facilitate graceful recovery when the storage server goes down?
>
>
> --
> Lorne Guse
From what I have experienced, it depends entirely on the underlying
storage implementation. Since you nerd-sniped me a little here, I
decided to do some testing.
On ZFS over iSCSI (using LIO), the downtime does not affect the VM at
all, except that I/O is stalled while the remote storage is rebooting.
So while I/O operations might take a little while to go through from the
VMs perspective, nothing broke here (in my Debian VM at least).
Note that with "broke" I mean that the VM kept on running, the OS and
its parts didn't throw any errors, no systemd units failed, etc.
Of course, if an application running inside the VM for example sets a
timeout on some disk operation and throws an error because of that,
that's an "issue" with the application.
I even shut down the ZFS-over-iSCSI-via-LIO remote for a couple minutes
to see if it would throw any errors eventually, but nope, it doesn't;
things just take a while:
Starting: Fri Sep 26 02:32:52 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:32:58 PM CEST 2025
Starting: Fri Sep 26 02:32:59 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:33:04 PM CEST 2025
Starting: Fri Sep 26 02:33:05 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:36:16 PM CEST 2025
Starting: Fri Sep 26 02:36:17 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:36:23 PM CEST 2025
Starting: Fri Sep 26 02:36:24 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:36:29 PM CEST 2025
The timestamps there show that the storage was down for ~3 minutes,
which is a *lot*, but nevertheless everything kept on running.
The above is the output of the following:
while sleep 1; do echo "Starting: $(date)"; sha256sum foo; echo "Done: $(date)"; done
... where "foo" is a 4 GiB large file I had created with:
dd if=/dev/urandom of=./foo bs=1M count=4000
With the TrueNAS legacy plugin (also ZFS over iSCSI, as you know),
reboots of TrueNAS are also handled "graciously" in this way; I was able
to observe the same behavior as with the LIO iSCSI provider. So if you
keep using iSCSI for the new plugin (which I think you do, IIRC),
everything should be fine. But as I said, it's up to the applications
inside the guest whether long disk I/O latencies are a problem or not.
On a side note, I'm not too familiar with how QEMU handles iSCSI
sessions in particular, but from what it seems it just waits until the
iSCSI session resumes; at least that's what I'm assuming here.
For curiosity's sake I also tested this with my SSHFS plugin [0], and
in that case the VM remained online, but threw I/O errors immediately
and remained in an unusable state even once the storage was up again.
(I'll actually see if I can prevent that from happening; IIRC there's
an option for reconnecting, unless I'm mistaken.)
Regarding your question what the plugin can do to facilitate graceful
recovery: In your case, things should be fine "out of the box" because
of the magic intricacies of iSCSI + QEMU, with other plugins & storage
implementations it really depends.
Hope that helps clearing some things up!
[0]: https://git.proxmox.com/?p=pve-storage-plugin-examples.git;a=blob;f=plugin-sshfs/src/PVE/Storage/Custom/SSHFSPlugin.pm;h=2d1612b139a3342e7a91b9d2809c2cf209ed9b05;hb=refs/heads/master
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [pve-devel] How does proxmox handle loss of connection / reboot of iSCSI storage
2025-09-26 13:32 ` [pve-devel] How does proxmox handle loss of connection / reboot of iSCSI storage Max R. Carrara
@ 2025-09-26 16:41 ` Lorne Guse via pve-devel
0 siblings, 0 replies; 3+ messages in thread
From: Lorne Guse via pve-devel @ 2025-09-26 16:41 UTC (permalink / raw)
To: Max R. Carrara, Proxmox VE development discussion; +Cc: Lorne Guse
[-- Attachment #1: Type: message/rfc822, Size: 16155 bytes --]
From: Lorne Guse <boomshankerx@hotmail.com>
To: "Max R. Carrara" <m.carrara@proxmox.com>, Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: How does proxmox handle loss of connection / reboot of iSCSI storage
Date: Fri, 26 Sep 2025 16:41:36 +0000
Message-ID: <DM6PR17MB346689876FD08DC2A051F0A1D01EA@DM6PR17MB3466.namprd17.prod.outlook.com>
TIL what nerd-sniping is. I was worried that I broke some kind of rule a first. LOL
Thank you for your response. I will do some more extensive testing to see if there is a limit. Some TrueNAS updates can take longer than 3 min.
I imagine it might be guest-dependent.
I always assumed that I had to shut down my VMs before updating TrueNAS. On the next update I'll run some backups and update while my proxmox cluster is online.
________________________________
From: Max R. Carrara <m.carrara@proxmox.com>
Sent: Friday, September 26, 2025 7:32 AM
To: Lorne Guse <boomshankerx@hotmail.com>; Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: How does proxmox handle loss of connection / reboot of iSCSI storage
On Fri Sep 26, 2025 at 4:06 AM CEST, Lorne Guse wrote:
> RE: TrueNAS over iSCSI Custom Storage Plugin
>
> TrueNAS has asked me to investigate how Proxmox reacts to reboot of the storage server while VMs and cluster are active. This is especially relevant for updates to TrueNAS.
>
> >The one test we'd like to see work is reboot of TrueNAS node while VMs and cluster are operational… does it it "resume" cleanly? A TrueNAS software update will be similar.
>
> I don't think the storage plugin is responsible for this level of interaction with the storage server. Is there anything that can be done at the storage plugin level to facilitate graceful recovery when the storage server goes down?
>
>
> --
> Lorne Guse
From what I have experienced, it depends entirely on the underlying
storage implementation. Since you nerd-sniped me a little here, I
decided to do some testing.
On ZFS over iSCSI (using LIO), the downtime does not affect the VM at
all, except that I/O is stalled while the remote storage is rebooting.
So while I/O operations might take a little while to go through from the
VMs perspective, nothing broke here (in my Debian VM at least).
Note that with "broke" I mean that the VM kept on running, the OS and
its parts didn't throw any errors, no systemd units failed, etc.
Of course, if an application running inside the VM for example sets a
timeout on some disk operation and throws an error because of that,
that's an "issue" with the application.
I even shut down the ZFS-over-iSCSI-via-LIO remote for a couple minutes
to see if it would throw any errors eventually, but nope, it doesn't;
things just take a while:
Starting: Fri Sep 26 02:32:52 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:32:58 PM CEST 2025
Starting: Fri Sep 26 02:32:59 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:33:04 PM CEST 2025
Starting: Fri Sep 26 02:33:05 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:36:16 PM CEST 2025
Starting: Fri Sep 26 02:36:17 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:36:23 PM CEST 2025
Starting: Fri Sep 26 02:36:24 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:36:29 PM CEST 2025
The timestamps there show that the storage was down for ~3 minutes,
which is a *lot*, but nevertheless everything kept on running.
The above is the output of the following:
while sleep 1; do echo "Starting: $(date)"; sha256sum foo; echo "Done: $(date)"; done
... where "foo" is a 4 GiB large file I had created with:
dd if=/dev/urandom of=./foo bs=1M count=4000
With the TrueNAS legacy plugin (also ZFS over iSCSI, as you know),
reboots of TrueNAS are also handled "graciously" in this way; I was able
to observe the same behavior as with the LIO iSCSI provider. So if you
keep using iSCSI for the new plugin (which I think you do, IIRC),
everything should be fine. But as I said, it's up to the applications
inside the guest whether long disk I/O latencies are a problem or not.
On a side note, I'm not too familiar with how QEMU handles iSCSI
sessions in particular, but from what it seems it just waits until the
iSCSI session resumes; at least that's what I'm assuming here.
For curiosity's sake I also tested this with my SSHFS plugin [0], and
in that case the VM remained online, but threw I/O errors immediately
and remained in an unusable state even once the storage was up again.
(I'll actually see if I can prevent that from happening; IIRC there's
an option for reconnecting, unless I'm mistaken.)
Regarding your question what the plugin can do to facilitate graceful
recovery: In your case, things should be fine "out of the box" because
of the magic intricacies of iSCSI + QEMU, with other plugins & storage
implementations it really depends.
Hope that helps clearing some things up!
[0]: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.proxmox.com%2F%3Fp%3Dpve-storage-plugin-examples.git%3Ba%3Dblob%3Bf%3Dplugin-sshfs%2Fsrc%2FPVE%2FStorage%2FCustom%2FSSHFSPlugin.pm%3Bh%3D2d1612b139a3342e7a91b9d2809c2cf209ed9b05%3Bhb%3Drefs%2Fheads%2Fmaster&data=05%7C02%7C%7C2383b70d0bbe497f5e8f08ddfd01203d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638944903454301887%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=3zfBzsMeJwbVPTbH5YERI8xEc%2BLsRtlOUSVd3r%2BZBoI%3D&reserved=0<https://git.proxmox.com/?p=pve-storage-plugin-examples.git;a=blob;f=plugin-sshfs/src/PVE/Storage/Custom/SSHFSPlugin.pm;h=2d1612b139a3342e7a91b9d2809c2cf209ed9b05;hb=refs/heads/master>
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* [pve-devel] How does proxmox handle loss of connection / reboot of iSCSI storage
@ 2025-09-26 2:06 Lorne Guse via pve-devel
0 siblings, 0 replies; 3+ messages in thread
From: Lorne Guse via pve-devel @ 2025-09-26 2:06 UTC (permalink / raw)
To: Proxmox VE development discussion, m.carrara; +Cc: Lorne Guse
[-- Attachment #1: Type: message/rfc822, Size: 10496 bytes --]
From: Lorne Guse <boomshankerx@hotmail.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, "m.carrara@proxmox.com" <m.carrara@proxmox.com>
Subject: How does proxmox handle loss of connection / reboot of iSCSI storage
Date: Fri, 26 Sep 2025 02:06:13 +0000
Message-ID: <DM6PR17MB34665CA6AB7E651E1675C791D01EA@DM6PR17MB3466.namprd17.prod.outlook.com>
RE: TrueNAS over iSCSI Custom Storage Plugin
TrueNAS has asked me to investigate how Proxmox reacts to reboot of the storage server while VMs and cluster are active. This is especially relevant for updates to TrueNAS.
>The one test we'd like to see work is reboot of TrueNAS node while VMs and cluster are operational… does it it "resume" cleanly? A TrueNAS software update will be similar.
I don't think the storage plugin is responsible for this level of interaction with the storage server. Is there anything that can be done at the storage plugin level to facilitate graceful recovery when the storage server goes down?
--
Lorne Guse
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-09-26 16:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <DM6PR17MB34665CA6AB7E651E1675C791D01EA@DM6PR17MB3466.namprd17.prod.outlook.com>
2025-09-26 13:32 ` [pve-devel] How does proxmox handle loss of connection / reboot of iSCSI storage Max R. Carrara
2025-09-26 16:41 ` Lorne Guse via pve-devel
2025-09-26 2:06 Lorne Guse via pve-devel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.