all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
	"Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH storage 2/2] rbd plugin: status: explain why percentage value can be different from Ceph
Date: Wed, 14 May 2025 11:31:17 +0200	[thread overview]
Message-ID: <651c22bb-69b3-43f4-9ed8-9357ce828bcf@proxmox.com> (raw)
In-Reply-To: <451129351.14846.1747213617524@webmail.proxmox.com>

Am 14.05.25 um 11:06 schrieb Fabian Grünbichler:
>> Fiona Ebner <f.ebner@proxmox.com> hat am 14.05.2025 10:22 CEST geschrieben:
>>
>>  
>> Am 13.05.25 um 15:31 schrieb Fiona Ebner:
>>> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
>>> ---
>>>  src/PVE/Storage/RBDPlugin.pm | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/src/PVE/Storage/RBDPlugin.pm b/src/PVE/Storage/RBDPlugin.pm
>>> index 154fa00..b56f8e4 100644
>>> --- a/src/PVE/Storage/RBDPlugin.pm
>>> +++ b/src/PVE/Storage/RBDPlugin.pm
>>> @@ -703,6 +703,12 @@ sub status {
>>>  
>>>      # max_avail -> max available space for data w/o replication in the pool
>>>      # stored -> amount of user data w/o replication in the pool
>>> +    # NOTE These values are used because they are most natural from a user perspective.
>>> +    # However, the %USED/percent_used value in Ceph is calculated from values before factoring out
>>> +    # replication, namely 'bytes_used / (bytes_used + avail_raw)'. In certain setups, e.g. with LZ4
>>> +    # compression, this percentage can be noticeably different form the percentage
>>> +    # 'stored / (stored + max_avail)' shown in the Proxmox VE CLI/UI. See also src/mon/PGMap.cc from
>>> +    # the Ceph source code, which also mentions that 'stored' is an approximation.
>>>      my $free = $d->{stats}->{max_avail};
>>>      my $used = $d->{stats}->{stored};
>>>      my $total = $used + $free;
>>
>> Thinking about this again, I don't think continuing to use 'stored' is
>> best after all, because that is before compression. And this is where
>> the mismatch really comes from AFAICT. For highly compressible data, the
>> mismatch between actual usage on the storage and 'stored' can be very
>> big (in a quick test using the 'yes' command to fill an RBD image, I got
>> stored = 2 * (used / replication_count)). And here in the storage stats
>> we are interested in the usage on the storage, not the actual amount of
>> data written by the user. For ZFS we also don't use 'logicalused', but
>> 'used'.
> 
> but for ZFS, we actually use the "logical" view provided by `zfs list/get`,
> not the "physical" view provided by `zpool list/get` (and even the latter
> would already account for redundancy).

But that is not the same logcial view as 'logicalused' which would not
consider compression.

> 
> e.g., with a testpool consisting of three mirrored vdevs of size 1G, with
> a single dataset filled with a file with 512MB of random data:
> 
> $ zpool list -v testpool
> NAME                 SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
> testpool             960M   513M   447M        -         -    42%    53%  1.00x    ONLINE  -
>   mirror-0           960M   513M   447M        -         -    42%  53.4%      -    ONLINE
>     /tmp/vdev1.img     1G      -      -        -         -      -      -      -    ONLINE
>     /tmp/vdev2.img     1G      -      -        -         -      -      -      -    ONLINE
>     /tmp/vdev3.img     1G      -      -        -         -      -      -      -    ONLINE
> 
> and what we use for the storage status:
> 
> $ zfs get available,used testpool/data
> NAME           PROPERTY   VALUE  SOURCE
> testpool/data  available  319M   -
> testpool/data  used       512M   -
> 
> if we switch away from `stored`, we'd have to account for replication
> ourselves to match that, right? and we don't have that information
> readily available (and also no idea how to handle EC pools?)? wouldn't
> we just exchange one wrong set of numbers with another (differently)
> wrong set of numbers?

I would've used avail_raw / max_avail to calculate the replication
factor and apply that to bytes_used. Sure it won't be perfect, but it
should lead to matching the percent_used reported by Ceph:

percent_used = used_bytes / (used_bytes + avail_raw)
max_avail = avail_raw / rep

(rep is called raw_used_rate in Ceph source, but I'm shortening it for
readability)

Thus:
rep = avail_raw / max_avail

our_used = used_bytes / rep
our_avail = max_avail = avail_raw / rep

our_percentage = our_used / (our_used + our_avail) =
(used_bytes/rep) / (used_bytes/rep + avail_raw/rep) =
then canceling rep
= used_bytes / (used_bytes + avail_raw) = percent_used from Ceph

The point is that it'd be much better than not considering compression.

> 
> FWIW, we already provide raw numbers in the pool view, and could maybe
> expand that view to provide more details?
> 
> e.g., for my test rbd pool the pool view shows 50,29% used amounting to
> 163,43GiB, whereas the storage status says 51.38% used amounting to
> 61.11GB of 118.94GB, with the default 3/2 replication
> 
> ceph df detail says:
> 
> {
>       "name": "rbd",
>       "id": 2,
>       "stats": {
>         "stored": 61108710142,               => /1000/1000/1000 == storage used

But this is not really "storage used". This is the amount of user data,
before compression. The actual usage on the storage can be much lower
than this.

>         "stored_data": 61108699136,
>         "stored_omap": 11006,
>         "objects": 15579,
>         "kb_used": 171373017,
>         "bytes_used": 175485968635,          => /1024/1024/1024 == pool used
>         "data_bytes_used": 175485935616,
>         "omap_bytes_used": 33019,
>         "percent_used": 0.5028545260429382,  => rounded this is the pool view percentage
>         "max_avail": 57831211008,            => (this + stored)/1000/1000/1000 storage total
>         "quota_objects": 0,
>         "quota_bytes": 0,
>         "dirty": 0,
>         "rd": 253354,
>         "rd_bytes": 38036885504,
>         "wr": 75833,
>         "wr_bytes": 33857918976,
>         "compress_bytes_used": 0,
>         "compress_under_bytes": 0,
>         "stored_raw": 183326130176,
>         "avail_raw": 173493638191
>       }
>     },
> 
> 
>> From src/osd/osd_types.h:
>>
>>>   int64_t data_stored = 0;                ///< Bytes actually stored by the user
>>>   int64_t data_compressed = 0;            ///< Bytes stored after compression
>>>   int64_t data_compressed_allocated = 0;  ///< Bytes allocated for compressed data
>>>   int64_t data_compressed_original = 0;   ///< Bytes that were compressed
>>
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

  reply	other threads:[~2025-05-14  9:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-13 13:31 [pve-devel] [PATCH storage 1/2] rbd plugin: status: drop outdated fallback Fiona Ebner
2025-05-13 13:31 ` [pve-devel] [PATCH storage 2/2] rbd plugin: status: explain why percentage value can be different from Ceph Fiona Ebner
2025-05-14  8:22   ` Fiona Ebner
2025-05-14  9:06     ` Fabian Grünbichler
2025-05-14  9:31       ` Fiona Ebner [this message]
2025-05-14 11:07         ` Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=651c22bb-69b3-43f4-9ed8-9357ce828bcf@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal