[PVE-User] "nearfull" status in PVE Dashboard not consistent

public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed

* [PVE-User] "nearfull" status in PVE Dashboard not consistent
@ 2024-09-07 19:04 Frank Thommen
  2024-09-07 19:07 ` Frank Thommen
  0 siblings, 1 reply; 17+ messages in thread
From: Frank Thommen @ 2024-09-07 19:04 UTC (permalink / raw)
  To: Proxmox VE user list

Dear all,

I am currently in the process to add SSDs for DB/WAL to our "converged" 
3-node Ceph cluster. After having done so on two of three nodes, the PVE 
Ceph dashboard now reports "5 pool(s) nearfull":

      HEALTH_WARN: 5 pool(s) nearfull
      pool 'pve-pool1' is nearfull
      pool 'cephfs_data' is nearfull
      pool 'cephfs_metadata' is nearfull
      pool '.mgr' is nearfull
      pool '.rgw.root' is nearfull

(see also attached Ceph_dashboard_nearfull_warning.jpg). The storage in 
general is 73% full ("40.87 TiB of 55.67 TiB").

However when looking at the pool overview in PVE, the pools don't seem 
to be very full at all. Some of them are even reported as being 
completely empty (see the attached Ceph_pool_overview.jpg).

Please note: All Ceph manipulations have been done from the PVE UI, as 
we are not very experienced with the Ceph CLI.

We are running PVE 8.2.3 and Ceph runs on version 17.2.7.

Is this inconsistency normal or a problem? And if the latter, then (how) 
can it be fixed?

Cheers, Frank
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent
  2024-09-07 19:04 [PVE-User] "nearfull" status in PVE Dashboard not consistent Frank Thommen
@ 2024-09-07 19:07 ` Frank Thommen
  2024-09-07 19:14   ` Frank Thommen
  0 siblings, 1 reply; 17+ messages in thread
From: Frank Thommen @ 2024-09-07 19:07 UTC (permalink / raw)
  To: pve-user

It seems, the attachments got lost on their way. Here they are (again)
Frank

On 07.09.24 21:04, Frank Thommen wrote:
> Dear all,
> 
> I am currently in the process to add SSDs for DB/WAL to our "converged" 
> 3-node Ceph cluster. After having done so on two of three nodes, the PVE 
> Ceph dashboard now reports "5 pool(s) nearfull":
> 
>       HEALTH_WARN: 5 pool(s) nearfull
>       pool 'pve-pool1' is nearfull
>       pool 'cephfs_data' is nearfull
>       pool 'cephfs_metadata' is nearfull
>       pool '.mgr' is nearfull
>       pool '.rgw.root' is nearfull
> 
> (see also attached Ceph_dashboard_nearfull_warning.jpg). The storage in 
> general is 73% full ("40.87 TiB of 55.67 TiB").
> 
> However when looking at the pool overview in PVE, the pools don't seem 
> to be very full at all. Some of them are even reported as being 
> completely empty (see the attached Ceph_pool_overview.jpg).
> 
> Please note: All Ceph manipulations have been done from the PVE UI, as 
> we are not very experienced with the Ceph CLI.
> 
> We are running PVE 8.2.3 and Ceph runs on version 17.2.7.
> 
> Is this inconsistency normal or a problem? And if the latter, then (how) 
> can it be fixed?
> 
> Cheers, Frank
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent
  2024-09-07 19:07 ` Frank Thommen
@ 2024-09-07 19:14   ` Frank Thommen
  2024-09-07 19:27     ` David der Nederlanden | ITTY via pve-user
  2024-09-09  0:46     ` Bryan Fields
  0 siblings, 2 replies; 17+ messages in thread
From: Frank Thommen @ 2024-09-07 19:14 UTC (permalink / raw)
  To: pve-user

Mailman is making fun of me: First it does not accept the mail because 
of too big attachments, now that I reduced the size, it removes them 
completely :-(

Sorry, I digress... Please find the two images here:

   * https://pasteboard.co/CUPNjkTmyYV8.jpg 
(Ceph_dashboard_nearfull_warning.jpg)
   * https://pasteboard.co/34GBggOiUNII.jpg (Ceph_pool_overview.jpg)

HTH, Frank


On 07.09.24 21:07, Frank Thommen wrote:
> It seems, the attachments got lost on their way. Here they are (again)
> Frank
> 
> On 07.09.24 21:04, Frank Thommen wrote:
>> Dear all,
>>
>> I am currently in the process to add SSDs for DB/WAL to our 
>> "converged" 3-node Ceph cluster. After having done so on two of three 
>> nodes, the PVE Ceph dashboard now reports "5 pool(s) nearfull":
>>
>>       HEALTH_WARN: 5 pool(s) nearfull
>>       pool 'pve-pool1' is nearfull
>>       pool 'cephfs_data' is nearfull
>>       pool 'cephfs_metadata' is nearfull
>>       pool '.mgr' is nearfull
>>       pool '.rgw.root' is nearfull
>>
>> (see also attached Ceph_dashboard_nearfull_warning.jpg). The storage 
>> in general is 73% full ("40.87 TiB of 55.67 TiB").
>>
>> However when looking at the pool overview in PVE, the pools don't seem 
>> to be very full at all. Some of them are even reported as being 
>> completely empty (see the attached Ceph_pool_overview.jpg).
>>
>> Please note: All Ceph manipulations have been done from the PVE UI, as 
>> we are not very experienced with the Ceph CLI.
>>
>> We are running PVE 8.2.3 and Ceph runs on version 17.2.7.
>>
>> Is this inconsistency normal or a problem? And if the latter, then 
>> (how) can it be fixed?
>>
>> Cheers, Frank
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent
  2024-09-07 19:14   ` Frank Thommen
@ 2024-09-07 19:27     ` David der Nederlanden | ITTY via pve-user
  2024-09-07 19:49       ` Peter Eisch via pve-user
  2024-09-08 12:17       ` [PVE-User] " Frank Thommen
  2024-09-09  0:46     ` Bryan Fields
  1 sibling, 2 replies; 17+ messages in thread
From: David der Nederlanden | ITTY via pve-user @ 2024-09-07 19:27 UTC (permalink / raw)
  To: Proxmox VE user list; +Cc: David der Nederlanden | ITTY

[-- Attachment #1: Type: message/rfc822, Size: 15553 bytes --]

From: David der Nederlanden | ITTY <david@itty.nl>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: RE: [PVE-User] "nearfull" status in PVE Dashboard not consistent
Date: Sat, 7 Sep 2024 19:27:38 +0000
Message-ID: <AM8P193MB11393CE9AC8873699D5E5950B89F2@AM8P193MB1139.EURP193.PROD.OUTLOOK.COM>

Hi Frank,

Can you share your OSD layout too?

My first thought is that you added the SSD's as OSD which caused that OSD to get full, with a near full pool as a result.

You can get some insights with:
`ceph osd tree`

And if needed you can reweight the OSD's, but that would require a good OSD layout:
`ceph osd reweight-by-utilization`

Sources:
https://forum.proxmox.com/threads/ceph-pool-full.47810/ 
https://docs.ceph.com/en/reef/rados/operations/health-checks/#pool-near-full

Kind regards,
David der Nederlanden

-----Oorspronkelijk bericht-----
Van: pve-user <pve-user-bounces@lists.proxmox.com> Namens Frank Thommen
Verzonden: Saturday, September 7, 2024 21:15
Aan: pve-user@lists.proxmox.com
Onderwerp: Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent

Mailman is making fun of me: First it does not accept the mail because of too big attachments, now that I reduced the size, it removes them completely :-(

Sorry, I digress... Please find the two images here:

   * https://pasteboard.co/CUPNjkTmyYV8.jpg
(Ceph_dashboard_nearfull_warning.jpg)
   * https://pasteboard.co/34GBggOiUNII.jpg (Ceph_pool_overview.jpg)

HTH, Frank


On 07.09.24 21:07, Frank Thommen wrote:
> It seems, the attachments got lost on their way. Here they are (again) 
> Frank
> 
> On 07.09.24 21:04, Frank Thommen wrote:
>> Dear all,
>>
>> I am currently in the process to add SSDs for DB/WAL to our 
>> "converged" 3-node Ceph cluster. After having done so on two of three 
>> nodes, the PVE Ceph dashboard now reports "5 pool(s) nearfull":
>>
>>       HEALTH_WARN: 5 pool(s) nearfull
>>       pool 'pve-pool1' is nearfull
>>       pool 'cephfs_data' is nearfull
>>       pool 'cephfs_metadata' is nearfull
>>       pool '.mgr' is nearfull
>>       pool '.rgw.root' is nearfull
>>
>> (see also attached Ceph_dashboard_nearfull_warning.jpg). The storage 
>> in general is 73% full ("40.87 TiB of 55.67 TiB").
>>
>> However when looking at the pool overview in PVE, the pools don't 
>> seem to be very full at all. Some of them are even reported as being 
>> completely empty (see the attached Ceph_pool_overview.jpg).
>>
>> Please note: All Ceph manipulations have been done from the PVE UI, 
>> as we are not very experienced with the Ceph CLI.
>>
>> We are running PVE 8.2.3 and Ceph runs on version 17.2.7.
>>
>> Is this inconsistency normal or a problem? And if the latter, then
>> (how) can it be fixed?
>>
>> Cheers, Frank
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent
  2024-09-07 19:27     ` David der Nederlanden | ITTY via pve-user
@ 2024-09-07 19:49       ` Peter Eisch via pve-user
  2024-09-08 12:17         ` [PVE-User] [Extern] - " Frank Thommen
  2024-09-08 12:17       ` [PVE-User] " Frank Thommen
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Eisch via pve-user @ 2024-09-07 19:49 UTC (permalink / raw)
  To: Proxmox VE user list; +Cc: Peter Eisch

[-- Attachment #1: Type: message/rfc822, Size: 4497 bytes --]

From: Peter Eisch <peter@boku.net>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent
Date: Sat, 7 Sep 2024 14:49:39 -0500 (CDT)
Message-ID: <854037bc-bd1a-4dc7-91e7-73ccc68c73db@boku.net>

Also 'ceph osd df' would be useful to help.

peter

Sep 7, 2024 14:43:18 David der Nederlanden | ITTY via pve-user <pve-user@lists.proxmox.com>:

> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
  2024-09-07 19:49       ` Peter Eisch via pve-user
@ 2024-09-08 12:17         ` Frank Thommen
  2024-09-09 10:36           ` Eneko Lacunza via pve-user
  0 siblings, 1 reply; 17+ messages in thread
From: Frank Thommen @ 2024-09-08 12:17 UTC (permalink / raw)
  To: pve-user

`ceph osd df` gives me:

----------------------------------
$ ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META 
AVAIL    %USE   VAR   PGS  STATUS
  0    hdd  3.68259   1.00000  3.7 TiB  3.0 TiB  2.9 TiB   38 MiB  8.9 
GiB  731 GiB  80.63  1.10  189      up
  1    hdd  3.68259   1.00000  3.7 TiB  2.5 TiB  2.5 TiB   34 MiB  7.7 
GiB  1.2 TiB  68.31  0.93  159      up
  2    hdd  3.68259   1.00000  3.7 TiB  3.0 TiB  2.9 TiB   32 MiB  7.6 
GiB  713 GiB  81.10  1.10  174      up
  9    hdd  1.84380   1.00000  1.8 TiB  1.2 TiB  1.2 TiB   13 MiB  4.7 
GiB  652 GiB  65.49  0.89   71      up
10    hdd  1.84380   1.00000  1.8 TiB  1.3 TiB  1.3 TiB   16 MiB  4.8 
GiB  543 GiB  71.24  0.97   85      up
11    hdd  1.84380   1.00000  1.8 TiB  1.2 TiB  1.2 TiB   13 MiB  4.7 
GiB  659 GiB  65.09  0.89   71      up
12    hdd  1.82909   1.00000  1.8 TiB  1.5 TiB  1.4 TiB   15 MiB  5.5 
GiB  380 GiB  79.70  1.09   84      up
  3    hdd  3.81450   1.00000  3.8 TiB  2.8 TiB  2.6 TiB   28 MiB  7.2 
GiB  1.0 TiB  73.65  1.00  158      up
  4    hdd  3.81450   1.00000  3.8 TiB  2.1 TiB  2.0 TiB   27 MiB  6.4 
GiB  1.7 TiB  56.06  0.76  137      up
  5    hdd  3.81450   1.00000  3.8 TiB  3.3 TiB  3.1 TiB   37 MiB  8.5 
GiB  568 GiB  85.45  1.16  194      up
13    hdd  1.90720   1.00000  1.9 TiB  1.4 TiB  1.3 TiB   14 MiB  3.3 
GiB  543 GiB  72.22  0.98   78      up
14    hdd  1.90720   1.00000  1.9 TiB  1.6 TiB  1.5 TiB   19 MiB  5.1 
GiB  297 GiB  84.78  1.15   91      up
15    hdd  1.90720   1.00000  1.9 TiB  1.5 TiB  1.4 TiB   18 MiB  4.9 
GiB  444 GiB  77.28  1.05   82      up
16    hdd  1.90039   1.00000  1.9 TiB  1.6 TiB  1.6 TiB   19 MiB  5.4 
GiB  261 GiB  86.59  1.18   93      up
  6    hdd  3.63869   1.00000  3.6 TiB  2.6 TiB  2.6 TiB   35 MiB  5.9 
GiB  1.0 TiB  71.69  0.98  172      up
  7    hdd  3.63869   1.00000  3.6 TiB  2.8 TiB  2.8 TiB   30 MiB  6.3 
GiB  875 GiB  76.53  1.04  173      up
  8    hdd  3.63869   1.00000  3.6 TiB  2.3 TiB  2.3 TiB   29 MiB  5.4 
GiB  1.3 TiB  63.51  0.86  147      up
17    hdd  1.81929   1.00000  1.8 TiB  1.3 TiB  1.3 TiB   20 MiB  3.0 
GiB  486 GiB  73.93  1.01   92      up
18    hdd  1.81929   1.00000  1.8 TiB  1.5 TiB  1.5 TiB   17 MiB  3.7 
GiB  367 GiB  80.30  1.09   93      up
19    hdd  1.81940   1.00000  1.8 TiB  1.2 TiB  1.2 TiB   17 MiB  3.6 
GiB  646 GiB  65.35  0.89   78      up
20    hdd  1.81929   1.00000  1.8 TiB  1.2 TiB  1.2 TiB   15 MiB  2.7 
GiB  631 GiB  66.16  0.90   78      up
                        TOTAL   56 TiB   41 TiB   40 TiB  483 MiB  115 
GiB   15 TiB  73.42
MIN/MAX VAR: 0.76/1.18  STDDEV: 8.03
$
----------------------------------

Frank


On 07.09.24 21:49, Peter Eisch via pve-user wrote:
> Also 'ceph osd df' would be useful to help.
> 
> peter


_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
  2024-09-08 12:17         ` [PVE-User] [Extern] - " Frank Thommen
@ 2024-09-09 10:36           ` Eneko Lacunza via pve-user
  2024-09-10 12:02             ` Frank Thommen
  0 siblings, 1 reply; 17+ messages in thread
From: Eneko Lacunza via pve-user @ 2024-09-09 10:36 UTC (permalink / raw)
  To: pve-user; +Cc: Eneko Lacunza

[-- Attachment #1: Type: message/rfc822, Size: 6018 bytes --]

From: Eneko Lacunza <elacunza@binovo.es>
To: pve-user@lists.proxmox.com
Subject: Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
Date: Mon, 9 Sep 2024 12:36:59 +0200
Message-ID: <e78689f7-58e2-4a4f-8a3e-e65f627e8936@binovo.es>

Hi Frank,

El 8/9/24 a las 14:17, Frank Thommen escribió:
>  5 hdd  3.81450   1.00000  3.8 TiB  3.3 TiB  3.1 TiB   37 MiB  8.5 
> GiB  568 GiB  85.45  1.16  194      up
> 16    hdd  1.90039   1.00000  1.9 TiB  1.6 TiB  1.6 TiB   19 MiB 5.4 
> GiB  261 GiB  86.59  1.18   93      up

Those OSD are nearfull, if you have the default 0.85 value for that. You 
can try to lower a bit their weight.

Cheers

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/



[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
  2024-09-09 10:36           ` Eneko Lacunza via pve-user
@ 2024-09-10 12:02             ` Frank Thommen
  2024-09-10 18:31               ` David der Nederlanden | ITTY via pve-user
  2024-09-11 10:51               ` Frank Thommen
  0 siblings, 2 replies; 17+ messages in thread
From: Frank Thommen @ 2024-09-10 12:02 UTC (permalink / raw)
  To: pve-user

Yes, but Ceph also reports five(!) nearfull pools, but the pool overview 
doesn't reflect that. I'd love to show this with CLI, but I could not 
find an appropriate command so far. Hopefully, inline images work:

 From the Ceph overview dashboard:

And from the pool overview

Two pools are quite full (80% and 82%), but far from nearfull, and the 
other tree are basically empty.

Frank


On 09.09.24 12:36, Eneko Lacunza via pve-user wrote:
> Hi Frank,
>
> El 8/9/24 a las 14:17, Frank Thommen escribió:
>>  5 hdd  3.81450 1.00000  3.8 TiB  3.3 TiB  3.1 TiB   37 MiB  8.5 GiB  
>> 568 GiB 85.45  1.16  194      up
>> 16    hdd  1.90039   1.00000  1.9 TiB  1.6 TiB  1.6 TiB   19 MiB 5.4 
>> GiB  261 GiB  86.59  1.18   93      up
>
> Those OSD are nearfull, if you have the default 0.85 value for that. 
> You can try to lower a bit their weight.
>
> Cheers
>
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project 
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
  2024-09-10 12:02             ` Frank Thommen
@ 2024-09-10 18:31               ` David der Nederlanden | ITTY via pve-user
  2024-09-11 11:00                 ` Frank Thommen
  2024-09-11 10:51               ` Frank Thommen
  1 sibling, 1 reply; 17+ messages in thread
From: David der Nederlanden | ITTY via pve-user @ 2024-09-10 18:31 UTC (permalink / raw)
  To: Proxmox VE user list; +Cc: David der Nederlanden | ITTY

[-- Attachment #1: Type: message/rfc822, Size: 15591 bytes --]

From: David der Nederlanden | ITTY <david@itty.nl>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: RE: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
Date: Tue, 10 Sep 2024 18:31:51 +0000
Message-ID: <AM8P193MB1139BBAFA85B6FA5CFFADE5AB89A2@AM8P193MB1139.EURP193.PROD.OUTLOOK.COM>

Hi Frank,

The images didn't work :)

Pool and osd nearfull are closely related, when OSD's get full your pool also gets nearfull as Ceph needs to be able to follow the crush rules,
which it can't if one of the OSD's gets full, hence it warns when it gets nearfull.

I see that you're mixing OSD sizes, deleting and recreating the OSD's one by one caused this, as the OSD's got new weights you should be OK when you reweight them.
You can do this by hand or using reweight-by-utilization, what you prefer.

Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max rule should never be above 80%, as this gives you a nearfull pool when it starts backfilling when you lose one node, or even a full pool worst case, rendering the pool read only.

Kind regards,
David

-----Oorspronkelijk bericht-----
Van: pve-user <pve-user-bounces@lists.proxmox.com> Namens Frank Thommen
Verzonden: Tuesday, September 10, 2024 14:02
Aan: pve-user@lists.proxmox.com
Onderwerp: Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent

Yes, but Ceph also reports five(!) nearfull pools, but the pool overview doesn't reflect that. I'd love to show this with CLI, but I could not find an appropriate command so far. Hopefully, inline images work:

 From the Ceph overview dashboard:

And from the pool overview

Two pools are quite full (80% and 82%), but far from nearfull, and the other tree are basically empty.

Frank

On 09.09.24 12:36, Eneko Lacunza via pve-user wrote:
> Hi Frank,
>
> El 8/9/24 a las 14:17, Frank Thommen escribió:
>>  5 hdd  3.81450 1.00000  3.8 TiB  3.3 TiB  3.1 TiB   37 MiB  8.5 GiB
>> 568 GiB 85.45  1.16  194      up
>> 16    hdd  1.90039   1.00000  1.9 TiB  1.6 TiB  1.6 TiB   19 MiB 5.4 
>> GiB  261 GiB  86.59  1.18   93      up
>
> Those OSD are nearfull, if you have the default 0.85 value for that. 
> You can try to lower a bit their weight.
>
> Cheers
>
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
  2024-09-10 18:31               ` David der Nederlanden | ITTY via pve-user
@ 2024-09-11 11:00                 ` Frank Thommen
  2024-09-11 11:52                   ` Daniel Oliver
  2024-09-11 14:22                   ` Frank Thommen
  0 siblings, 2 replies; 17+ messages in thread
From: Frank Thommen @ 2024-09-11 11:00 UTC (permalink / raw)
  To: pve-user

The OSDs are of different size, because we have 4 TB and 2 TB disks in 
the systems.

We might give the reweight a try.



On 10.09.24 20:31, David der Nederlanden | ITTY via pve-user wrote:
> Hi Frank,
> 
> The images didn't work 🙂
> 
> Pool and osd nearfull are closely related, when OSD's get full your pool
>   also gets nearfull as Ceph needs to be able to follow the crush rules,
> which it can't if one of the OSD's gets full, hence it warns when it
> gets nearfull.
> 
> I see that you're mixing OSD sizes, deleting and recreating the OSD's
> one by one caused this, as the OSD's got new weights you should be OK
> when you reweight them.
> You can do this by hand or using reweight-by-utilization, what you
> prefer.
> 
> Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max
> rule should never be above 80%, as this gives you a nearfull pool when
> it starts backfilling when you lose one node, or even a full pool worst
> case, rendering the pool read only.
> 
> Kind regards,
> David

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
  2024-09-11 11:00                 ` Frank Thommen
@ 2024-09-11 11:52                   ` Daniel Oliver
  2024-09-11 14:24                     ` Frank Thommen
  2024-09-11 14:22                   ` Frank Thommen
  1 sibling, 1 reply; 17+ messages in thread
From: Daniel Oliver @ 2024-09-11 11:52 UTC (permalink / raw)
  To: Proxmox VE user list

The built-in Ceph balancer only balances based on PG numbers, which can vary wildly in size for several reasons.

I ended up disabling the built-in balancer and switching to https://github.com/TheJJ/ceph-balancer, which we now run daily with the following parameters:
placementoptimizer.py balance --ignore-ideal-pgcounts=all --osdused=delta --osdfrom fullest

This keeps things nicely balanced from a fullness perspective, with the most important bit being ignore-ideal-pgcounts, as it allows balancing decisions outside of what the built-in balancer would decide.

From: pve-user <pve-user-bounces@lists.proxmox.com> on behalf of Frank Thommen <f.thommen@dkfz-heidelberg.de>
Date: Wednesday, 11 September 2024 at 12:00
To: pve-user@lists.proxmox.com <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
The OSDs are of different size, because we have 4 TB and 2 TB disks in
the systems.

We might give the reweight a try.

On 10.09.24 20:31, David der Nederlanden | ITTY via pve-user wrote:
> Hi Frank,
>
> The images didn't work 🙂
>
> Pool and osd nearfull are closely related, when OSD's get full your pool
>   also gets nearfull as Ceph needs to be able to follow the crush rules,
> which it can't if one of the OSD's gets full, hence it warns when it
> gets nearfull.
>
> I see that you're mixing OSD sizes, deleting and recreating the OSD's
> one by one caused this, as the OSD's got new weights you should be OK
> when you reweight them.
> You can do this by hand or using reweight-by-utilization, what you
> prefer.
>
> Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max
> rule should never be above 80%, as this gives you a nearfull pool when
> it starts backfilling when you lose one node, or even a full pool worst
> case, rendering the pool read only.
>
> Kind regards,
> David

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

-- 
Unless expressly stated otherwise, the contents of this e-mail represent 
only the views of the sender as expressed only to the intended recipient. 
It does not commit ripjar limited to any course of action or legal 
responsibility. This e-mail and the information it contains may be 
privileged and/or confidential. It is for the intended addressee(s) only. 
The unauthorised use, disclosure or copying of this e-mail, or any 
information it contains, is prohibited and could, in certain circumstances, 
be a criminal offence. If you are not an intended recipient, please notify 
admin@ripjar.com <mailto:admin@ripjar.com> immediately.

RIPJAR LIMITED - 
Registered in England & Wales - Company No: 8217339 - Registered Address: 
Suite 404, Eagle Tower, Montpellier Drive, Cheltenham, Gloucestershire GL50 
1TA.
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
  2024-09-11 11:52                   ` Daniel Oliver
@ 2024-09-11 14:24                     ` Frank Thommen
  0 siblings, 0 replies; 17+ messages in thread
From: Frank Thommen @ 2024-09-11 14:24 UTC (permalink / raw)
  To: pve-user

I will have a look. However, not having real working experience with 
Ceph, using an external balancer requires a "leap of faith" from my side :-)


On 11.09.24 13:52, Daniel Oliver wrote:
> The built-in Ceph balancer only balances based on PG numbers, which can vary wildly in size for several reasons.
> 
> I ended up disabling the built-in balancer and switching to https://github.com/TheJJ/ceph-balancer, which we now run daily with the following parameters:
> placementoptimizer.py balance --ignore-ideal-pgcounts=all --osdused=delta --osdfrom fullest
> 
> This keeps things nicely balanced from a fullness perspective, with the most important bit being ignore-ideal-pgcounts, as it allows balancing decisions outside of what the built-in balancer would decide.
> 
> From: pve-user <pve-user-bounces@lists.proxmox.com> on behalf of Frank Thommen <f.thommen@dkfz-heidelberg.de>
> Date: Wednesday, 11 September 2024 at 12:00
> To: pve-user@lists.proxmox.com <pve-user@lists.proxmox.com>
> Subject: Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
> The OSDs are of different size, because we have 4 TB and 2 TB disks in
> the systems.
> 
> We might give the reweight a try.
> 
> 
> 
> On 10.09.24 20:31, David der Nederlanden | ITTY via pve-user wrote:
>> Hi Frank,
>>
>> The images didn't work 🙂
>>
>> Pool and osd nearfull are closely related, when OSD's get full your pool
>>    also gets nearfull as Ceph needs to be able to follow the crush rules,
>> which it can't if one of the OSD's gets full, hence it warns when it
>> gets nearfull.
>>
>> I see that you're mixing OSD sizes, deleting and recreating the OSD's
>> one by one caused this, as the OSD's got new weights you should be OK
>> when you reweight them.
>> You can do this by hand or using reweight-by-utilization, what you
>> prefer.
>>
>> Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max
>> rule should never be above 80%, as this gives you a nearfull pool when
>> it starts backfilling when you lose one node, or even a full pool worst
>> case, rendering the pool read only.
>>
>> Kind regards,
>> David
> 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
  2024-09-11 11:00                 ` Frank Thommen
  2024-09-11 11:52                   ` Daniel Oliver
@ 2024-09-11 14:22                   ` Frank Thommen
  1 sibling, 0 replies; 17+ messages in thread
From: Frank Thommen @ 2024-09-11 14:22 UTC (permalink / raw)
  To: pve-user

I'm not sure, if that is related, but looking at the activity LEDs of 
the DB/WAL SSD devices (two SSDs in RAID 1 each) in each host, the 
device in one host is basically permanently active (LEDs constantly 
flickering w/o pause), while on the other host, the device seems almost 
completely inactive (one blink every few seconds). The third host has no 
SSD DB device yet. To me that looks, as one Ceph node is extremely 
active, while the other isn't. That also looks like an imbalance to me.

Frank


On 11.09.24 13:00, Frank Thommen wrote:
> The OSDs are of different size, because we have 4 TB and 2 TB disks in 
> the systems.
> 
> We might give the reweight a try.
> 
> 
> 
> On 10.09.24 20:31, David der Nederlanden | ITTY via pve-user wrote:
>> Hi Frank,
>>
>> The images didn't work 🙂
>>
>> Pool and osd nearfull are closely related, when OSD's get full your pool
>>   also gets nearfull as Ceph needs to be able to follow the crush rules,
>> which it can't if one of the OSD's gets full, hence it warns when it
>> gets nearfull.
>>
>> I see that you're mixing OSD sizes, deleting and recreating the OSD's
>> one by one caused this, as the OSD's got new weights you should be OK
>> when you reweight them.
>> You can do this by hand or using reweight-by-utilization, what you
>> prefer.
>>
>> Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max
>> rule should never be above 80%, as this gives you a nearfull pool when
>> it starts backfilling when you lose one node, or even a full pool worst
>> case, rendering the pool read only.
>>
>> Kind regards,
>> David
> 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
  2024-09-10 12:02             ` Frank Thommen
  2024-09-10 18:31               ` David der Nederlanden | ITTY via pve-user
@ 2024-09-11 10:51               ` Frank Thommen
  1 sibling, 0 replies; 17+ messages in thread
From: Frank Thommen @ 2024-09-11 10:51 UTC (permalink / raw)
  To: pve-user

ok, here are - finally - the screenshots:

   * from the Ceph overview dashboard: https://postimg.cc/fVJFstqW
   * from the pool overview: https://postimg.cc/ykf52XPr

Frank


On 10.09.24 14:02, Frank Thommen wrote:
> Yes, but Ceph also reports five(!) nearfull pools, but the pool overview 
> doesn't reflect that. I'd love to show this with CLI, but I could not 
> find an appropriate command so far. Hopefully, inline images work:
> 
>  From the Ceph overview dashboard:
> 
> And from the pool overview
> 
> Two pools are quite full (80% and 82%), but far from nearfull, and the 
> other tree are basically empty.
> 
> Frank
> 
> 
> On 09.09.24 12:36, Eneko Lacunza via pve-user wrote:
>> Hi Frank,
>>
>> El 8/9/24 a las 14:17, Frank Thommen escribió:
>>>  5 hdd  3.81450 1.00000  3.8 TiB  3.3 TiB  3.1 TiB   37 MiB  8.5 GiB 
>>> 568 GiB 85.45  1.16  194      up
>>> 16    hdd  1.90039   1.00000  1.9 TiB  1.6 TiB  1.6 TiB   19 MiB 5.4 
>>> GiB  261 GiB  86.59  1.18   93      up
>>
>> Those OSD are nearfull, if you have the default 0.85 value for that. 
>> You can try to lower a bit their weight.
>>
>> Cheers
>>
>> Eneko Lacunza
>> Zuzendari teknikoa | Director técnico
>> Binovo IT Human Project 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent
  2024-09-07 19:27     ` David der Nederlanden | ITTY via pve-user
  2024-09-07 19:49       ` Peter Eisch via pve-user
@ 2024-09-08 12:17       ` Frank Thommen
  1 sibling, 0 replies; 17+ messages in thread
From: Frank Thommen @ 2024-09-08 12:17 UTC (permalink / raw)
  To: pve-user

Hi David,

I have deleted the OSDs one by one (sometimes two by two) and then 
recreated them, but this time with an SSD partition as DB device 
(previously the DB was on the OSD HDDs). So the SSDs should not have 
been added as OSDs.

`ceph osd tree` gives me
----------------------------------
$ ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME            STATUS  REWEIGHT  PRI-AFF
-1         55.66704  root default
-3         18.40823      host pve01
  0    hdd   3.68259          osd.0            up   1.00000  1.00000
  1    hdd   3.68259          osd.1            up   1.00000  1.00000
  2    hdd   3.68259          osd.2            up   1.00000  1.00000
  9    hdd   1.84380          osd.9            up   1.00000  1.00000
10    hdd   1.84380          osd.10           up   1.00000  1.00000
11    hdd   1.84380          osd.11           up   1.00000  1.00000
12    hdd   1.82909          osd.12           up   1.00000  1.00000
-5         19.06548      host pve02
  3    hdd   3.81450          osd.3            up   1.00000  1.00000
  4    hdd   3.81450          osd.4            up   1.00000  1.00000
  5    hdd   3.81450          osd.5            up   1.00000  1.00000
13    hdd   1.90720          osd.13           up   1.00000  1.00000
14    hdd   1.90720          osd.14           up   1.00000  1.00000
15    hdd   1.90720          osd.15           up   1.00000  1.00000
16    hdd   1.90039          osd.16           up   1.00000  1.00000
-7         18.19333      host pve03
  6    hdd   3.63869          osd.6            up   1.00000  1.00000
  7    hdd   3.63869          osd.7            up   1.00000  1.00000
  8    hdd   3.63869          osd.8            up   1.00000  1.00000
17    hdd   1.81929          osd.17           up   1.00000  1.00000
18    hdd   1.81929          osd.18           up   1.00000  1.00000
19    hdd   1.81940          osd.19           up   1.00000  1.00000
20    hdd   1.81929          osd.20           up   1.00000  1.00000
$
----------------------------------

Cheers, Frank



On 07.09.24 21:27, David der Nederlanden | ITTY via pve-user wrote:
> Hi Frank,
> 
> Can you share your OSD layout too?
> 
> My first thought is that you added the SSD's as OSD which caused that OSD to get full, with a near full pool as a result.
> 
> You can get some insights with:
> `ceph osd tree`
> 
> And if needed you can reweight the OSD's, but that would require a good OSD layout:
> `ceph osd reweight-by-utilization`
> 
> Sources:
> https://forum.proxmox.com/threads/ceph-pool-full.47810/  
> https://docs.ceph.com/en/reef/rados/operations/health-checks/#pool-near-full
> 
> Kind regards,
> David der Nederlanden


_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent
  2024-09-07 19:14   ` Frank Thommen
  2024-09-07 19:27     ` David der Nederlanden | ITTY via pve-user
@ 2024-09-09  0:46     ` Bryan Fields
  2024-09-10 11:48       ` [PVE-User] [Extern] - " Frank Thommen
  1 sibling, 1 reply; 17+ messages in thread
From: Bryan Fields @ 2024-09-09  0:46 UTC (permalink / raw)
  To: Proxmox VE user list

On 9/7/24 3:14 PM, Frank Thommen wrote:
> Sorry, I digress... Please find the two images here:
> 
>    * https://pasteboard.co/CUPNjkTmyYV8.jpg 
> (Ceph_dashboard_nearfull_warning.jpg)
>    * https://pasteboard.co/34GBggOiUNII.jpg (Ceph_pool_overview.jpg)
> 
> HTH, Frank

The links are broken, "502 Bad Gateway, nginx/1.1.19"

-- 
Bryan Fields

727-409-1194 - Voice
http://bryanfields.net

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
  2024-09-09  0:46     ` Bryan Fields
@ 2024-09-10 11:48       ` Frank Thommen
  0 siblings, 0 replies; 17+ messages in thread
From: Frank Thommen @ 2024-09-10 11:48 UTC (permalink / raw)
  To: Proxmox VE user list, Bryan Fields

pasteboard.co seems to be out-of-order at the time. Are there good 
alternatives? It would really be easier, if I could just send 
attachments to the list :-/

On 09.09.24 02:46, Bryan Fields wrote:
> On 9/7/24 3:14 PM, Frank Thommen wrote:
>> Sorry, I digress... Please find the two images here:
>>
>>     * https://pasteboard.co/CUPNjkTmyYV8.jpg
>> (Ceph_dashboard_nearfull_warning.jpg)
>>     * https://pasteboard.co/34GBggOiUNII.jpg (Ceph_pool_overview.jpg)
>>
>> HTH, Frank
> 
> The links are broken, "502 Bad Gateway, nginx/1.1.19"
> 

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-09-11 14:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-07 19:04 [PVE-User] "nearfull" status in PVE Dashboard not consistent Frank Thommen
2024-09-07 19:07 ` Frank Thommen
2024-09-07 19:14   ` Frank Thommen
2024-09-07 19:27     ` David der Nederlanden | ITTY via pve-user
2024-09-07 19:49       ` Peter Eisch via pve-user
2024-09-08 12:17         ` [PVE-User] [Extern] - " Frank Thommen
2024-09-09 10:36           ` Eneko Lacunza via pve-user
2024-09-10 12:02             ` Frank Thommen
2024-09-10 18:31               ` David der Nederlanden | ITTY via pve-user
2024-09-11 11:00                 ` Frank Thommen
2024-09-11 11:52                   ` Daniel Oliver
2024-09-11 14:24                     ` Frank Thommen
2024-09-11 14:22                   ` Frank Thommen
2024-09-11 10:51               ` Frank Thommen
2024-09-08 12:17       ` [PVE-User] " Frank Thommen
2024-09-09  0:46     ` Bryan Fields
2024-09-10 11:48       ` [PVE-User] [Extern] - " Frank Thommen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal