From: "Max R. Carrara" <m.carrara@proxmox.com>
To: "Lorne Guse" <boomshankerx@hotmail.com>,
"Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] How does proxmox handle loss of connection / reboot of iSCSI storage
Date: Fri, 26 Sep 2025 15:32:21 +0200 [thread overview]
Message-ID: <DD2RQ25PEUQJ.3UCAOIFZYCS0I@proxmox.com> (raw)
In-Reply-To: <DM6PR17MB34665CA6AB7E651E1675C791D01EA@DM6PR17MB3466.namprd17.prod.outlook.com>
On Fri Sep 26, 2025 at 4:06 AM CEST, Lorne Guse wrote:
> RE: TrueNAS over iSCSI Custom Storage Plugin
>
> TrueNAS has asked me to investigate how Proxmox reacts to reboot of the storage server while VMs and cluster are active. This is especially relevant for updates to TrueNAS.
>
> >The one test we'd like to see work is reboot of TrueNAS node while VMs and cluster are operational… does it it "resume" cleanly? A TrueNAS software update will be similar.
>
> I don't think the storage plugin is responsible for this level of interaction with the storage server. Is there anything that can be done at the storage plugin level to facilitate graceful recovery when the storage server goes down?
>
>
> --
> Lorne Guse
From what I have experienced, it depends entirely on the underlying
storage implementation. Since you nerd-sniped me a little here, I
decided to do some testing.
On ZFS over iSCSI (using LIO), the downtime does not affect the VM at
all, except that I/O is stalled while the remote storage is rebooting.
So while I/O operations might take a little while to go through from the
VMs perspective, nothing broke here (in my Debian VM at least).
Note that with "broke" I mean that the VM kept on running, the OS and
its parts didn't throw any errors, no systemd units failed, etc.
Of course, if an application running inside the VM for example sets a
timeout on some disk operation and throws an error because of that,
that's an "issue" with the application.
I even shut down the ZFS-over-iSCSI-via-LIO remote for a couple minutes
to see if it would throw any errors eventually, but nope, it doesn't;
things just take a while:
Starting: Fri Sep 26 02:32:52 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:32:58 PM CEST 2025
Starting: Fri Sep 26 02:32:59 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:33:04 PM CEST 2025
Starting: Fri Sep 26 02:33:05 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:36:16 PM CEST 2025
Starting: Fri Sep 26 02:36:17 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:36:23 PM CEST 2025
Starting: Fri Sep 26 02:36:24 PM CEST 2025
d5ae75665497b917c70216497a480104b0395e0b53c6256b1f1e3de96c29eb87 foo
Done: Fri Sep 26 02:36:29 PM CEST 2025
The timestamps there show that the storage was down for ~3 minutes,
which is a *lot*, but nevertheless everything kept on running.
The above is the output of the following:
while sleep 1; do echo "Starting: $(date)"; sha256sum foo; echo "Done: $(date)"; done
... where "foo" is a 4 GiB large file I had created with:
dd if=/dev/urandom of=./foo bs=1M count=4000
With the TrueNAS legacy plugin (also ZFS over iSCSI, as you know),
reboots of TrueNAS are also handled "graciously" in this way; I was able
to observe the same behavior as with the LIO iSCSI provider. So if you
keep using iSCSI for the new plugin (which I think you do, IIRC),
everything should be fine. But as I said, it's up to the applications
inside the guest whether long disk I/O latencies are a problem or not.
On a side note, I'm not too familiar with how QEMU handles iSCSI
sessions in particular, but from what it seems it just waits until the
iSCSI session resumes; at least that's what I'm assuming here.
For curiosity's sake I also tested this with my SSHFS plugin [0], and
in that case the VM remained online, but threw I/O errors immediately
and remained in an unusable state even once the storage was up again.
(I'll actually see if I can prevent that from happening; IIRC there's
an option for reconnecting, unless I'm mistaken.)
Regarding your question what the plugin can do to facilitate graceful
recovery: In your case, things should be fine "out of the box" because
of the magic intricacies of iSCSI + QEMU, with other plugins & storage
implementations it really depends.
Hope that helps clearing some things up!
[0]: https://git.proxmox.com/?p=pve-storage-plugin-examples.git;a=blob;f=plugin-sshfs/src/PVE/Storage/Custom/SSHFSPlugin.pm;h=2d1612b139a3342e7a91b9d2809c2cf209ed9b05;hb=refs/heads/master
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next parent reply other threads:[~2025-09-26 13:32 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <DM6PR17MB34665CA6AB7E651E1675C791D01EA@DM6PR17MB3466.namprd17.prod.outlook.com>
2025-09-26 13:32 ` Max R. Carrara [this message]
2025-09-26 16:41 ` Lorne Guse via pve-devel
2025-09-26 2:06 Lorne Guse via pve-devel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DD2RQ25PEUQJ.3UCAOIFZYCS0I@proxmox.com \
--to=m.carrara@proxmox.com \
--cc=boomshankerx@hotmail.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.