public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Erik Fastermann <e.fastermann@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH qemu-server 2/3] fix #1989: disk: add qcow2 cache options
Date: Fri, 29 May 2026 17:26:00 +0200	[thread overview]
Message-ID: <94d54894-19fe-4524-ac62-5eee6191f0ff@proxmox.com> (raw)
In-Reply-To: <20260529125808.204983-3-e.fastermann@proxmox.com>

Am 29.05.26 um 2:58 PM schrieb Erik Fastermann:
> Add multiple options to configure the qcow2 L2/refcount cache. This
> enables huge performance gains in some cases.
> 
> For a detailed explanation of the options see the QEMU docs [0].
> Additionally the cache size can be configured based on the size of the
> disk image automatically.
> 
> Both blockdev and the older drive commandline are supported, which makes
> the feature accessible to all QEMU machine versions supported by PVE.
> Options which only apply to disk creation (cluster_size, refcount_bits)
> are not considered in this patch.
> 
> [0] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/qcow2-cache.txt
> 
> Signed-off-by: Erik Fastermann <e.fastermann@proxmox.com>
> ---
>  src/PVE/QemuServer.pm          | 33 ++++++++++++
>  src/PVE/QemuServer/Blockdev.pm | 33 ++++++++++++
>  src/PVE/QemuServer/Drive.pm    | 99 ++++++++++++++++++++++++++++++++++
>  3 files changed, 165 insertions(+)
> 
> diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
> index 118f26bc..e53ae760 100644
> --- a/src/PVE/QemuServer.pm
> +++ b/src/PVE/QemuServer.pm
> @@ -1294,6 +1294,39 @@ sub print_drive_commandline_full {
>          $opts .= ",auto-remove=on";
>      }
>  
> +    if (defined($drive->{cache_size})) {
> +        if ($drive->{cache_size} eq 'based-on-disk') {
> +            # Automatically set the cache-size based on the disk size. The formula
> +            # is adapted from here:
> +            # https://gitlab.com/qemu-project/qemu/-/blob/master/docs/qcow2-cache.txt#L97-98
> +            # Assumes the default values for cluster_size and refcount_bits and
> +            # that extended L2 entries are not used. As this combines the refcount
> +            # and L2 cache sizes, QEMU is free to choose a different value for each
> +            # of them, which means the L2 cache is not necessarily 4 times bigger
> +            # than the refcount cache.

For snapshot-as-volume-chain, for backed disks, we use extended L2
entries and 128k cluster size. Can we query the disk here to get the
actual params (would prepare for the future if we allow changing these
upon creation too, and it's in the context of startup so that should not
be too expensive)?

Same below for blockdev

> +            my $cache_size = $drive->{size} / 1024 / 1024 / 1024 * (131072 + 32768);
> +            $opts .= ",cache-size=" . int($cache_size);
> +        } else {
> +            $opts .= ",cache-size=" . int($drive->{cache_size});
> +        }
> +    }
> +
> +    if (defined($drive->{l2_cache_size})) {
> +        $opts .= ",cache-size=" . int($drive->{l2_cache_size});
> +    }
> +
> +    if (defined($drive->{l2_cache_entry_size})) {
> +        $opts .= ",l2-cache-entry-size=" . int($drive->{l2_cache_entry_size});
> +    }
> +
> +    if (defined($drive->{refcount_cache_size})) {
> +        $opts .= ",refcount-cache-size=" . int($drive->{refcount_cache_size});
> +    }
> +
> +    if (defined($drive->{cache_clean_interval})) {
> +        $opts .= ",cache-clean-interval=" . int($drive->{cache_clean_interval});
> +    }
> +
>      # my $file_param = $live_restore_name ? "file.file.filename" : "file";
>      my $file_param = "file";
>      if ($live_restore_name) {
> diff --git a/src/PVE/QemuServer/Blockdev.pm b/src/PVE/QemuServer/Blockdev.pm
> index 101c747c..7cb7dccb 100644
> --- a/src/PVE/QemuServer/Blockdev.pm
> +++ b/src/PVE/QemuServer/Blockdev.pm
> @@ -403,6 +403,39 @@ sub generate_format_blockdev {
>          $blockdev->{'discard-no-unref'} = JSON::true if $format eq 'qcow2';
>      }
>  
> +    if (defined($drive->{cache_size})) {
> +        if ($drive->{cache_size} eq 'based-on-disk') {
> +            # Automatically set the cache-size based on the disk size. The formula
> +            # is adapted from here:
> +            # https://gitlab.com/qemu-project/qemu/-/blob/master/docs/qcow2-cache.txt#L97-98
> +            # Assumes the default values for cluster_size and refcount_bits and
> +            # that extended L2 entries are not used. As this combines the refcount
> +            # and L2 cache sizes, QEMU is free to choose a different value for each
> +            # of them, which means the L2 cache is not necessarily 4 times bigger
> +            # than the refcount cache.
> +            my $cache_size = $drive->{size} / 1024 / 1024 / 1024 * (131072 + 32768);
> +            $blockdev->{'cache-size'} = int($cache_size);
> +        } else {
> +            $blockdev->{'cache-size'} = int($drive->{cache_size});
> +        }
> +    }
> +
> +    if (defined($drive->{l2_cache_size})) {
> +        $blockdev->{'l2-cache-size'} = int($drive->{l2_cache_size});
> +    }
> +
> +    if (defined($drive->{l2_cache_entry_size})) {
> +        $blockdev->{'l2-cache-entry-size'} = int($drive->{l2_cache_entry_size});
> +    }
> +
> +    if (defined($drive->{refcount_cache_size})) {
> +        $blockdev->{'refcount-cache-size'} = int($drive->{refcount_cache_size});
> +    }
> +
> +    if (defined($drive->{cache_clean_interval})) {
> +        $blockdev->{'cache-clean-interval'} = int($drive->{cache_clean_interval});
> +    }
> +
>      return $blockdev;
>  }
>  
> diff --git a/src/PVE/QemuServer/Drive.pm b/src/PVE/QemuServer/Drive.pm
> index b80b7dbb..ff4be684 100644
> --- a/src/PVE/QemuServer/Drive.pm
> +++ b/src/PVE/QemuServer/Drive.pm
> @@ -256,6 +256,51 @@ my %drivedesc_base = (
>          optional => 1,
>          default => 0,
>      },
> +    cache_size => {

For new options, please use kebap-case. All options are qcow2-specific,
so I wonder if we should use 'qcow2-' as a prefix for the options.

> +        type => 'string',
> +        oneOf => [
> +            {
> +                type => 'integer',
> +                minimum => 1,
> +                description => 'Cache size in bytes',
> +                optional => 1,
> +            },
> +            {
> +                type => 'string',
> +                enum => ['based-on-disk'],
> +                description =>
> +                    'Automatically pick a cache size based on the configured disk size',
> +                optional => 1,
> +            },
> +        ],
> +        description => 'Cache size for qcow2 disks in bytes',
> +        typetext => '(<bytes> | based-on-disk)',
> +        optional => 1,
> +    },
> +    l2_cache_size => {
> +        type => 'integer',
> +        minimum => 1,
> +        description => 'L2 cache size for qcow2 disks in bytes',
> +        optional => 1,
> +    },
> +    l2_cache_entry_size => {
> +        type => 'integer',
> +        minimum => 512,
> +        description => 'L2 cache entry size for qcow2 disks in bytes',
> +        optional => 1,
> +    },
> +    refcount_cache_size => {
> +        type => 'integer',
> +        minimum => 1,
> +        description => 'Refcount cache size for qcow2 disks in bytes',
> +        optional => 1,
> +    },
> +    cache_clean_interval => {
> +        type => 'integer',
> +        minimum => 0,
> +        description => 'Cache clean interval for qcow2 disks in seconds',
> +        optional => 1,
> +    },
>  );
>  
>  my %iothread_fmt = (
> @@ -838,6 +883,60 @@ sub parse_drive {
>          }
>      }
>  
> +    my $is_qcow2 =
> +        (defined($res->{format}) && $res->{format} eq 'qcow2') || $res->{file} =~ /\.qcow2$/;

Please ask the storage layer what the format is. There is a
checked_volume_format() method for this.

> +
> +    my $any_qcow2_cache_option =
> +        defined($res->{cache_size})
> +        || defined($res->{l2_cache_size})
> +        || defined($res->{l2_cache_entry_size})
> +        || defined($res->{refcount_cache_size})
> +        || defined($res->{cache_clean_interval});
> +
> +    if ($any_qcow2_cache_option && !$is_qcow2) {
> +        warn "qcow2 cache options require disk format qcow2\n";

I think it might be better to iterate and explicitly mention the option
so the user knows right away which one it is. We should also mention the
drive ID and we should use log_warn() so the warning is more prominent.

> +        ++$error;

I'd prefer to just ignore the options if the format is wrong (with a
warning mentioning that this happens) rather than return an undef value
as a result later.

> +    }
> +
> +    my $cache_option_count =
> +        defined($res->{cache_size}) +
> +        defined($res->{l2_cache_size}) +
> +        defined($res->{refcount_cache_size});
> +
> +    if ($cache_option_count > 2) {
> +        warn "at most two of {,l2_,refcount_}cache_size can be set simultaneously\n";

Some users might not be familiar with the {,,} templating

> +        ++$error;
> +    }
> +
> +    sub is_power_of_two {

We usually don't declare such nested subs, but use a closure or separate
private helper sub.

> +        my ($n) = @_;
> +        return $n > 0 && (($n & ($n - 1)) == 0);
> +    }
> +
> +    if (defined($res->{l2_cache_entry_size}) && !is_power_of_two($res->{l2_cache_entry_size})) {
> +        warn "l2_cache_entry_size must be a power of two\n";
> +        ++$error;
> +    }
> +
> +    if (defined($res->{cache_size}) && $res->{cache_size} eq 'based-on-disk') {
> +        if (defined($res->{l2_cache_size}) || defined($res->{refcount_cache_size})) {
> +            warn "cache_size 'based-on-disk' not compatible with l2 or refcount cache size set\n";
> +            ++$error;
> +        }
> +    }
> +
> +    if (defined($res->{cache_size}) && $res->{cache_size} ne 'based-on-disk') {
> +        if (($res->{l2_cache_size} // 0) >= $res->{cache_size}) {
> +            warn "l2_cache_size is larger than or equal to cache_size\n";
> +            ++$error;
> +        }
> +
> +        if (($res->{refcount_cache_size} // 0) >= $res->{cache_size}) {
> +            warn "refcount_cache_size is larger than or equal to cache_size\n";
> +            ++$error;
> +        }
> +    }
> +
>      return if $error;
>  
>      return if $res->{mbps_rd} && $res->{mbps};





  reply	other threads:[~2026-05-29 15:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-29 12:58 [RFC common/manager/qemu-server 0/3] fix #1989: qemu: add qcow2 cache options Erik Fastermann
2026-05-29 12:58 ` [PATCH pve-common 1/3] json schema: display nested oneOf error messages Erik Fastermann
2026-05-29 15:25   ` Fiona Ebner
2026-05-29 12:58 ` [PATCH qemu-server 2/3] fix #1989: disk: add qcow2 cache options Erik Fastermann
2026-05-29 15:26   ` Fiona Ebner [this message]
2026-05-29 12:58 ` [PATCH pve-manager 3/3] fix #1989: qemu: disk: add cache size config Erik Fastermann
2026-05-29 15:26 ` [RFC common/manager/qemu-server 0/3] fix #1989: qemu: add qcow2 cache options Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94d54894-19fe-4524-ac62-5eee6191f0ff@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=e.fastermann@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal