* [PVE-User] "nearfull" status in PVE Dashboard not consistent @ 2024-09-07 19:04 Frank Thommen 2024-09-07 19:07 ` Frank Thommen 0 siblings, 1 reply; 17+ messages in thread From: Frank Thommen @ 2024-09-07 19:04 UTC (permalink / raw) To: Proxmox VE user list Dear all, I am currently in the process to add SSDs for DB/WAL to our "converged" 3-node Ceph cluster. After having done so on two of three nodes, the PVE Ceph dashboard now reports "5 pool(s) nearfull": HEALTH_WARN: 5 pool(s) nearfull pool 'pve-pool1' is nearfull pool 'cephfs_data' is nearfull pool 'cephfs_metadata' is nearfull pool '.mgr' is nearfull pool '.rgw.root' is nearfull (see also attached Ceph_dashboard_nearfull_warning.jpg). The storage in general is 73% full ("40.87 TiB of 55.67 TiB"). However when looking at the pool overview in PVE, the pools don't seem to be very full at all. Some of them are even reported as being completely empty (see the attached Ceph_pool_overview.jpg). Please note: All Ceph manipulations have been done from the PVE UI, as we are not very experienced with the Ceph CLI. We are running PVE 8.2.3 and Ceph runs on version 17.2.7. Is this inconsistency normal or a problem? And if the latter, then (how) can it be fixed? Cheers, Frank _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent 2024-09-07 19:04 [PVE-User] "nearfull" status in PVE Dashboard not consistent Frank Thommen @ 2024-09-07 19:07 ` Frank Thommen 2024-09-07 19:14 ` Frank Thommen 0 siblings, 1 reply; 17+ messages in thread From: Frank Thommen @ 2024-09-07 19:07 UTC (permalink / raw) To: pve-user It seems, the attachments got lost on their way. Here they are (again) Frank On 07.09.24 21:04, Frank Thommen wrote: > Dear all, > > I am currently in the process to add SSDs for DB/WAL to our "converged" > 3-node Ceph cluster. After having done so on two of three nodes, the PVE > Ceph dashboard now reports "5 pool(s) nearfull": > > HEALTH_WARN: 5 pool(s) nearfull > pool 'pve-pool1' is nearfull > pool 'cephfs_data' is nearfull > pool 'cephfs_metadata' is nearfull > pool '.mgr' is nearfull > pool '.rgw.root' is nearfull > > (see also attached Ceph_dashboard_nearfull_warning.jpg). The storage in > general is 73% full ("40.87 TiB of 55.67 TiB"). > > However when looking at the pool overview in PVE, the pools don't seem > to be very full at all. Some of them are even reported as being > completely empty (see the attached Ceph_pool_overview.jpg). > > Please note: All Ceph manipulations have been done from the PVE UI, as > we are not very experienced with the Ceph CLI. > > We are running PVE 8.2.3 and Ceph runs on version 17.2.7. > > Is this inconsistency normal or a problem? And if the latter, then (how) > can it be fixed? > > Cheers, Frank > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent 2024-09-07 19:07 ` Frank Thommen @ 2024-09-07 19:14 ` Frank Thommen 2024-09-07 19:27 ` David der Nederlanden | ITTY via pve-user 2024-09-09 0:46 ` Bryan Fields 0 siblings, 2 replies; 17+ messages in thread From: Frank Thommen @ 2024-09-07 19:14 UTC (permalink / raw) To: pve-user Mailman is making fun of me: First it does not accept the mail because of too big attachments, now that I reduced the size, it removes them completely :-( Sorry, I digress... Please find the two images here: * https://pasteboard.co/CUPNjkTmyYV8.jpg (Ceph_dashboard_nearfull_warning.jpg) * https://pasteboard.co/34GBggOiUNII.jpg (Ceph_pool_overview.jpg) HTH, Frank On 07.09.24 21:07, Frank Thommen wrote: > It seems, the attachments got lost on their way. Here they are (again) > Frank > > On 07.09.24 21:04, Frank Thommen wrote: >> Dear all, >> >> I am currently in the process to add SSDs for DB/WAL to our >> "converged" 3-node Ceph cluster. After having done so on two of three >> nodes, the PVE Ceph dashboard now reports "5 pool(s) nearfull": >> >> HEALTH_WARN: 5 pool(s) nearfull >> pool 'pve-pool1' is nearfull >> pool 'cephfs_data' is nearfull >> pool 'cephfs_metadata' is nearfull >> pool '.mgr' is nearfull >> pool '.rgw.root' is nearfull >> >> (see also attached Ceph_dashboard_nearfull_warning.jpg). The storage >> in general is 73% full ("40.87 TiB of 55.67 TiB"). >> >> However when looking at the pool overview in PVE, the pools don't seem >> to be very full at all. Some of them are even reported as being >> completely empty (see the attached Ceph_pool_overview.jpg). >> >> Please note: All Ceph manipulations have been done from the PVE UI, as >> we are not very experienced with the Ceph CLI. >> >> We are running PVE 8.2.3 and Ceph runs on version 17.2.7. >> >> Is this inconsistency normal or a problem? And if the latter, then >> (how) can it be fixed? >> >> Cheers, Frank >> _______________________________________________ >> pve-user mailing list >> pve-user@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent 2024-09-07 19:14 ` Frank Thommen @ 2024-09-07 19:27 ` David der Nederlanden | ITTY via pve-user 2024-09-07 19:49 ` Peter Eisch via pve-user 2024-09-08 12:17 ` [PVE-User] " Frank Thommen 2024-09-09 0:46 ` Bryan Fields 1 sibling, 2 replies; 17+ messages in thread From: David der Nederlanden | ITTY via pve-user @ 2024-09-07 19:27 UTC (permalink / raw) To: Proxmox VE user list; +Cc: David der Nederlanden | ITTY [-- Attachment #1: Type: message/rfc822, Size: 15553 bytes --] From: David der Nederlanden | ITTY <david@itty.nl> To: Proxmox VE user list <pve-user@lists.proxmox.com> Subject: RE: [PVE-User] "nearfull" status in PVE Dashboard not consistent Date: Sat, 7 Sep 2024 19:27:38 +0000 Message-ID: <AM8P193MB11393CE9AC8873699D5E5950B89F2@AM8P193MB1139.EURP193.PROD.OUTLOOK.COM> Hi Frank, Can you share your OSD layout too? My first thought is that you added the SSD's as OSD which caused that OSD to get full, with a near full pool as a result. You can get some insights with: `ceph osd tree` And if needed you can reweight the OSD's, but that would require a good OSD layout: `ceph osd reweight-by-utilization` Sources: https://forum.proxmox.com/threads/ceph-pool-full.47810/ https://docs.ceph.com/en/reef/rados/operations/health-checks/#pool-near-full Kind regards, David der Nederlanden -----Oorspronkelijk bericht----- Van: pve-user <pve-user-bounces@lists.proxmox.com> Namens Frank Thommen Verzonden: Saturday, September 7, 2024 21:15 Aan: pve-user@lists.proxmox.com Onderwerp: Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent Mailman is making fun of me: First it does not accept the mail because of too big attachments, now that I reduced the size, it removes them completely :-( Sorry, I digress... Please find the two images here: * https://pasteboard.co/CUPNjkTmyYV8.jpg (Ceph_dashboard_nearfull_warning.jpg) * https://pasteboard.co/34GBggOiUNII.jpg (Ceph_pool_overview.jpg) HTH, Frank On 07.09.24 21:07, Frank Thommen wrote: > It seems, the attachments got lost on their way. Here they are (again) > Frank > > On 07.09.24 21:04, Frank Thommen wrote: >> Dear all, >> >> I am currently in the process to add SSDs for DB/WAL to our >> "converged" 3-node Ceph cluster. After having done so on two of three >> nodes, the PVE Ceph dashboard now reports "5 pool(s) nearfull": >> >> HEALTH_WARN: 5 pool(s) nearfull >> pool 'pve-pool1' is nearfull >> pool 'cephfs_data' is nearfull >> pool 'cephfs_metadata' is nearfull >> pool '.mgr' is nearfull >> pool '.rgw.root' is nearfull >> >> (see also attached Ceph_dashboard_nearfull_warning.jpg). The storage >> in general is 73% full ("40.87 TiB of 55.67 TiB"). >> >> However when looking at the pool overview in PVE, the pools don't >> seem to be very full at all. Some of them are even reported as being >> completely empty (see the attached Ceph_pool_overview.jpg). >> >> Please note: All Ceph manipulations have been done from the PVE UI, >> as we are not very experienced with the Ceph CLI. >> >> We are running PVE 8.2.3 and Ceph runs on version 17.2.7. >> >> Is this inconsistency normal or a problem? And if the latter, then >> (how) can it be fixed? >> >> Cheers, Frank >> _______________________________________________ >> pve-user mailing list >> pve-user@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent 2024-09-07 19:27 ` David der Nederlanden | ITTY via pve-user @ 2024-09-07 19:49 ` Peter Eisch via pve-user 2024-09-08 12:17 ` [PVE-User] [Extern] - " Frank Thommen 2024-09-08 12:17 ` [PVE-User] " Frank Thommen 1 sibling, 1 reply; 17+ messages in thread From: Peter Eisch via pve-user @ 2024-09-07 19:49 UTC (permalink / raw) To: Proxmox VE user list; +Cc: Peter Eisch [-- Attachment #1: Type: message/rfc822, Size: 4497 bytes --] From: Peter Eisch <peter@boku.net> To: Proxmox VE user list <pve-user@lists.proxmox.com> Subject: Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent Date: Sat, 7 Sep 2024 14:49:39 -0500 (CDT) Message-ID: <854037bc-bd1a-4dc7-91e7-73ccc68c73db@boku.net> Also 'ceph osd df' would be useful to help. peter Sep 7, 2024 14:43:18 David der Nederlanden | ITTY via pve-user <pve-user@lists.proxmox.com>: > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent 2024-09-07 19:49 ` Peter Eisch via pve-user @ 2024-09-08 12:17 ` Frank Thommen 2024-09-09 10:36 ` Eneko Lacunza via pve-user 0 siblings, 1 reply; 17+ messages in thread From: Frank Thommen @ 2024-09-08 12:17 UTC (permalink / raw) To: pve-user `ceph osd df` gives me: ---------------------------------- $ ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 hdd 3.68259 1.00000 3.7 TiB 3.0 TiB 2.9 TiB 38 MiB 8.9 GiB 731 GiB 80.63 1.10 189 up 1 hdd 3.68259 1.00000 3.7 TiB 2.5 TiB 2.5 TiB 34 MiB 7.7 GiB 1.2 TiB 68.31 0.93 159 up 2 hdd 3.68259 1.00000 3.7 TiB 3.0 TiB 2.9 TiB 32 MiB 7.6 GiB 713 GiB 81.10 1.10 174 up 9 hdd 1.84380 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 13 MiB 4.7 GiB 652 GiB 65.49 0.89 71 up 10 hdd 1.84380 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 16 MiB 4.8 GiB 543 GiB 71.24 0.97 85 up 11 hdd 1.84380 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 13 MiB 4.7 GiB 659 GiB 65.09 0.89 71 up 12 hdd 1.82909 1.00000 1.8 TiB 1.5 TiB 1.4 TiB 15 MiB 5.5 GiB 380 GiB 79.70 1.09 84 up 3 hdd 3.81450 1.00000 3.8 TiB 2.8 TiB 2.6 TiB 28 MiB 7.2 GiB 1.0 TiB 73.65 1.00 158 up 4 hdd 3.81450 1.00000 3.8 TiB 2.1 TiB 2.0 TiB 27 MiB 6.4 GiB 1.7 TiB 56.06 0.76 137 up 5 hdd 3.81450 1.00000 3.8 TiB 3.3 TiB 3.1 TiB 37 MiB 8.5 GiB 568 GiB 85.45 1.16 194 up 13 hdd 1.90720 1.00000 1.9 TiB 1.4 TiB 1.3 TiB 14 MiB 3.3 GiB 543 GiB 72.22 0.98 78 up 14 hdd 1.90720 1.00000 1.9 TiB 1.6 TiB 1.5 TiB 19 MiB 5.1 GiB 297 GiB 84.78 1.15 91 up 15 hdd 1.90720 1.00000 1.9 TiB 1.5 TiB 1.4 TiB 18 MiB 4.9 GiB 444 GiB 77.28 1.05 82 up 16 hdd 1.90039 1.00000 1.9 TiB 1.6 TiB 1.6 TiB 19 MiB 5.4 GiB 261 GiB 86.59 1.18 93 up 6 hdd 3.63869 1.00000 3.6 TiB 2.6 TiB 2.6 TiB 35 MiB 5.9 GiB 1.0 TiB 71.69 0.98 172 up 7 hdd 3.63869 1.00000 3.6 TiB 2.8 TiB 2.8 TiB 30 MiB 6.3 GiB 875 GiB 76.53 1.04 173 up 8 hdd 3.63869 1.00000 3.6 TiB 2.3 TiB 2.3 TiB 29 MiB 5.4 GiB 1.3 TiB 63.51 0.86 147 up 17 hdd 1.81929 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 20 MiB 3.0 GiB 486 GiB 73.93 1.01 92 up 18 hdd 1.81929 1.00000 1.8 TiB 1.5 TiB 1.5 TiB 17 MiB 3.7 GiB 367 GiB 80.30 1.09 93 up 19 hdd 1.81940 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 17 MiB 3.6 GiB 646 GiB 65.35 0.89 78 up 20 hdd 1.81929 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 15 MiB 2.7 GiB 631 GiB 66.16 0.90 78 up TOTAL 56 TiB 41 TiB 40 TiB 483 MiB 115 GiB 15 TiB 73.42 MIN/MAX VAR: 0.76/1.18 STDDEV: 8.03 $ ---------------------------------- Frank On 07.09.24 21:49, Peter Eisch via pve-user wrote: > Also 'ceph osd df' would be useful to help. > > peter _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent 2024-09-08 12:17 ` [PVE-User] [Extern] - " Frank Thommen @ 2024-09-09 10:36 ` Eneko Lacunza via pve-user 2024-09-10 12:02 ` Frank Thommen 0 siblings, 1 reply; 17+ messages in thread From: Eneko Lacunza via pve-user @ 2024-09-09 10:36 UTC (permalink / raw) To: pve-user; +Cc: Eneko Lacunza [-- Attachment #1: Type: message/rfc822, Size: 6018 bytes --] From: Eneko Lacunza <elacunza@binovo.es> To: pve-user@lists.proxmox.com Subject: Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent Date: Mon, 9 Sep 2024 12:36:59 +0200 Message-ID: <e78689f7-58e2-4a4f-8a3e-e65f627e8936@binovo.es> Hi Frank, El 8/9/24 a las 14:17, Frank Thommen escribió: > 5 hdd 3.81450 1.00000 3.8 TiB 3.3 TiB 3.1 TiB 37 MiB 8.5 > GiB 568 GiB 85.45 1.16 194 up > 16 hdd 1.90039 1.00000 1.9 TiB 1.6 TiB 1.6 TiB 19 MiB 5.4 > GiB 261 GiB 86.59 1.18 93 up Those OSD are nearfull, if you have the default 0.85 value for that. You can try to lower a bit their weight. Cheers Eneko Lacunza Zuzendari teknikoa | Director técnico Binovo IT Human Project Tel. +34 943 569 206 | https://www.binovo.es Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent 2024-09-09 10:36 ` Eneko Lacunza via pve-user @ 2024-09-10 12:02 ` Frank Thommen 2024-09-10 18:31 ` David der Nederlanden | ITTY via pve-user 2024-09-11 10:51 ` Frank Thommen 0 siblings, 2 replies; 17+ messages in thread From: Frank Thommen @ 2024-09-10 12:02 UTC (permalink / raw) To: pve-user Yes, but Ceph also reports five(!) nearfull pools, but the pool overview doesn't reflect that. I'd love to show this with CLI, but I could not find an appropriate command so far. Hopefully, inline images work: From the Ceph overview dashboard: And from the pool overview Two pools are quite full (80% and 82%), but far from nearfull, and the other tree are basically empty. Frank On 09.09.24 12:36, Eneko Lacunza via pve-user wrote: > Hi Frank, > > El 8/9/24 a las 14:17, Frank Thommen escribió: >> 5 hdd 3.81450 1.00000 3.8 TiB 3.3 TiB 3.1 TiB 37 MiB 8.5 GiB >> 568 GiB 85.45 1.16 194 up >> 16 hdd 1.90039 1.00000 1.9 TiB 1.6 TiB 1.6 TiB 19 MiB 5.4 >> GiB 261 GiB 86.59 1.18 93 up > > Those OSD are nearfull, if you have the default 0.85 value for that. > You can try to lower a bit their weight. > > Cheers > > Eneko Lacunza > Zuzendari teknikoa | Director técnico > Binovo IT Human Project _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent 2024-09-10 12:02 ` Frank Thommen @ 2024-09-10 18:31 ` David der Nederlanden | ITTY via pve-user 2024-09-11 11:00 ` Frank Thommen 2024-09-11 10:51 ` Frank Thommen 1 sibling, 1 reply; 17+ messages in thread From: David der Nederlanden | ITTY via pve-user @ 2024-09-10 18:31 UTC (permalink / raw) To: Proxmox VE user list; +Cc: David der Nederlanden | ITTY [-- Attachment #1: Type: message/rfc822, Size: 15591 bytes --] From: David der Nederlanden | ITTY <david@itty.nl> To: Proxmox VE user list <pve-user@lists.proxmox.com> Subject: RE: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent Date: Tue, 10 Sep 2024 18:31:51 +0000 Message-ID: <AM8P193MB1139BBAFA85B6FA5CFFADE5AB89A2@AM8P193MB1139.EURP193.PROD.OUTLOOK.COM> Hi Frank, The images didn't work :) Pool and osd nearfull are closely related, when OSD's get full your pool also gets nearfull as Ceph needs to be able to follow the crush rules, which it can't if one of the OSD's gets full, hence it warns when it gets nearfull. I see that you're mixing OSD sizes, deleting and recreating the OSD's one by one caused this, as the OSD's got new weights you should be OK when you reweight them. You can do this by hand or using reweight-by-utilization, what you prefer. Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max rule should never be above 80%, as this gives you a nearfull pool when it starts backfilling when you lose one node, or even a full pool worst case, rendering the pool read only. Kind regards, David -----Oorspronkelijk bericht----- Van: pve-user <pve-user-bounces@lists.proxmox.com> Namens Frank Thommen Verzonden: Tuesday, September 10, 2024 14:02 Aan: pve-user@lists.proxmox.com Onderwerp: Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent Yes, but Ceph also reports five(!) nearfull pools, but the pool overview doesn't reflect that. I'd love to show this with CLI, but I could not find an appropriate command so far. Hopefully, inline images work: From the Ceph overview dashboard: And from the pool overview Two pools are quite full (80% and 82%), but far from nearfull, and the other tree are basically empty. Frank On 09.09.24 12:36, Eneko Lacunza via pve-user wrote: > Hi Frank, > > El 8/9/24 a las 14:17, Frank Thommen escribió: >> 5 hdd 3.81450 1.00000 3.8 TiB 3.3 TiB 3.1 TiB 37 MiB 8.5 GiB >> 568 GiB 85.45 1.16 194 up >> 16 hdd 1.90039 1.00000 1.9 TiB 1.6 TiB 1.6 TiB 19 MiB 5.4 >> GiB 261 GiB 86.59 1.18 93 up > > Those OSD are nearfull, if you have the default 0.85 value for that. > You can try to lower a bit their weight. > > Cheers > > Eneko Lacunza > Zuzendari teknikoa | Director técnico > Binovo IT Human Project _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent 2024-09-10 18:31 ` David der Nederlanden | ITTY via pve-user @ 2024-09-11 11:00 ` Frank Thommen 2024-09-11 11:52 ` Daniel Oliver 2024-09-11 14:22 ` Frank Thommen 0 siblings, 2 replies; 17+ messages in thread From: Frank Thommen @ 2024-09-11 11:00 UTC (permalink / raw) To: pve-user The OSDs are of different size, because we have 4 TB and 2 TB disks in the systems. We might give the reweight a try. On 10.09.24 20:31, David der Nederlanden | ITTY via pve-user wrote: > Hi Frank, > > The images didn't work 🙂 > > Pool and osd nearfull are closely related, when OSD's get full your pool > also gets nearfull as Ceph needs to be able to follow the crush rules, > which it can't if one of the OSD's gets full, hence it warns when it > gets nearfull. > > I see that you're mixing OSD sizes, deleting and recreating the OSD's > one by one caused this, as the OSD's got new weights you should be OK > when you reweight them. > You can do this by hand or using reweight-by-utilization, what you > prefer. > > Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max > rule should never be above 80%, as this gives you a nearfull pool when > it starts backfilling when you lose one node, or even a full pool worst > case, rendering the pool read only. > > Kind regards, > David _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent 2024-09-11 11:00 ` Frank Thommen @ 2024-09-11 11:52 ` Daniel Oliver 2024-09-11 14:24 ` Frank Thommen 2024-09-11 14:22 ` Frank Thommen 1 sibling, 1 reply; 17+ messages in thread From: Daniel Oliver @ 2024-09-11 11:52 UTC (permalink / raw) To: Proxmox VE user list The built-in Ceph balancer only balances based on PG numbers, which can vary wildly in size for several reasons. I ended up disabling the built-in balancer and switching to https://github.com/TheJJ/ceph-balancer, which we now run daily with the following parameters: placementoptimizer.py balance --ignore-ideal-pgcounts=all --osdused=delta --osdfrom fullest This keeps things nicely balanced from a fullness perspective, with the most important bit being ignore-ideal-pgcounts, as it allows balancing decisions outside of what the built-in balancer would decide. From: pve-user <pve-user-bounces@lists.proxmox.com> on behalf of Frank Thommen <f.thommen@dkfz-heidelberg.de> Date: Wednesday, 11 September 2024 at 12:00 To: pve-user@lists.proxmox.com <pve-user@lists.proxmox.com> Subject: Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent The OSDs are of different size, because we have 4 TB and 2 TB disks in the systems. We might give the reweight a try. On 10.09.24 20:31, David der Nederlanden | ITTY via pve-user wrote: > Hi Frank, > > The images didn't work 🙂 > > Pool and osd nearfull are closely related, when OSD's get full your pool > also gets nearfull as Ceph needs to be able to follow the crush rules, > which it can't if one of the OSD's gets full, hence it warns when it > gets nearfull. > > I see that you're mixing OSD sizes, deleting and recreating the OSD's > one by one caused this, as the OSD's got new weights you should be OK > when you reweight them. > You can do this by hand or using reweight-by-utilization, what you > prefer. > > Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max > rule should never be above 80%, as this gives you a nearfull pool when > it starts backfilling when you lose one node, or even a full pool worst > case, rendering the pool read only. > > Kind regards, > David _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user -- Unless expressly stated otherwise, the contents of this e-mail represent only the views of the sender as expressed only to the intended recipient. It does not commit ripjar limited to any course of action or legal responsibility. This e-mail and the information it contains may be privileged and/or confidential. It is for the intended addressee(s) only. The unauthorised use, disclosure or copying of this e-mail, or any information it contains, is prohibited and could, in certain circumstances, be a criminal offence. If you are not an intended recipient, please notify admin@ripjar.com <mailto:admin@ripjar.com> immediately. RIPJAR LIMITED - Registered in England & Wales - Company No: 8217339 - Registered Address: Suite 404, Eagle Tower, Montpellier Drive, Cheltenham, Gloucestershire GL50 1TA. _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent 2024-09-11 11:52 ` Daniel Oliver @ 2024-09-11 14:24 ` Frank Thommen 0 siblings, 0 replies; 17+ messages in thread From: Frank Thommen @ 2024-09-11 14:24 UTC (permalink / raw) To: pve-user I will have a look. However, not having real working experience with Ceph, using an external balancer requires a "leap of faith" from my side :-) On 11.09.24 13:52, Daniel Oliver wrote: > The built-in Ceph balancer only balances based on PG numbers, which can vary wildly in size for several reasons. > > I ended up disabling the built-in balancer and switching to https://github.com/TheJJ/ceph-balancer, which we now run daily with the following parameters: > placementoptimizer.py balance --ignore-ideal-pgcounts=all --osdused=delta --osdfrom fullest > > This keeps things nicely balanced from a fullness perspective, with the most important bit being ignore-ideal-pgcounts, as it allows balancing decisions outside of what the built-in balancer would decide. > > From: pve-user <pve-user-bounces@lists.proxmox.com> on behalf of Frank Thommen <f.thommen@dkfz-heidelberg.de> > Date: Wednesday, 11 September 2024 at 12:00 > To: pve-user@lists.proxmox.com <pve-user@lists.proxmox.com> > Subject: Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent > The OSDs are of different size, because we have 4 TB and 2 TB disks in > the systems. > > We might give the reweight a try. > > > > On 10.09.24 20:31, David der Nederlanden | ITTY via pve-user wrote: >> Hi Frank, >> >> The images didn't work 🙂 >> >> Pool and osd nearfull are closely related, when OSD's get full your pool >> also gets nearfull as Ceph needs to be able to follow the crush rules, >> which it can't if one of the OSD's gets full, hence it warns when it >> gets nearfull. >> >> I see that you're mixing OSD sizes, deleting and recreating the OSD's >> one by one caused this, as the OSD's got new weights you should be OK >> when you reweight them. >> You can do this by hand or using reweight-by-utilization, what you >> prefer. >> >> Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max >> rule should never be above 80%, as this gives you a nearfull pool when >> it starts backfilling when you lose one node, or even a full pool worst >> case, rendering the pool read only. >> >> Kind regards, >> David > > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent 2024-09-11 11:00 ` Frank Thommen 2024-09-11 11:52 ` Daniel Oliver @ 2024-09-11 14:22 ` Frank Thommen 1 sibling, 0 replies; 17+ messages in thread From: Frank Thommen @ 2024-09-11 14:22 UTC (permalink / raw) To: pve-user I'm not sure, if that is related, but looking at the activity LEDs of the DB/WAL SSD devices (two SSDs in RAID 1 each) in each host, the device in one host is basically permanently active (LEDs constantly flickering w/o pause), while on the other host, the device seems almost completely inactive (one blink every few seconds). The third host has no SSD DB device yet. To me that looks, as one Ceph node is extremely active, while the other isn't. That also looks like an imbalance to me. Frank On 11.09.24 13:00, Frank Thommen wrote: > The OSDs are of different size, because we have 4 TB and 2 TB disks in > the systems. > > We might give the reweight a try. > > > > On 10.09.24 20:31, David der Nederlanden | ITTY via pve-user wrote: >> Hi Frank, >> >> The images didn't work 🙂 >> >> Pool and osd nearfull are closely related, when OSD's get full your pool >> also gets nearfull as Ceph needs to be able to follow the crush rules, >> which it can't if one of the OSD's gets full, hence it warns when it >> gets nearfull. >> >> I see that you're mixing OSD sizes, deleting and recreating the OSD's >> one by one caused this, as the OSD's got new weights you should be OK >> when you reweight them. >> You can do this by hand or using reweight-by-utilization, what you >> prefer. >> >> Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max >> rule should never be above 80%, as this gives you a nearfull pool when >> it starts backfilling when you lose one node, or even a full pool worst >> case, rendering the pool read only. >> >> Kind regards, >> David > > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent 2024-09-10 12:02 ` Frank Thommen 2024-09-10 18:31 ` David der Nederlanden | ITTY via pve-user @ 2024-09-11 10:51 ` Frank Thommen 1 sibling, 0 replies; 17+ messages in thread From: Frank Thommen @ 2024-09-11 10:51 UTC (permalink / raw) To: pve-user ok, here are - finally - the screenshots: * from the Ceph overview dashboard: https://postimg.cc/fVJFstqW * from the pool overview: https://postimg.cc/ykf52XPr Frank On 10.09.24 14:02, Frank Thommen wrote: > Yes, but Ceph also reports five(!) nearfull pools, but the pool overview > doesn't reflect that. I'd love to show this with CLI, but I could not > find an appropriate command so far. Hopefully, inline images work: > > From the Ceph overview dashboard: > > And from the pool overview > > Two pools are quite full (80% and 82%), but far from nearfull, and the > other tree are basically empty. > > Frank > > > On 09.09.24 12:36, Eneko Lacunza via pve-user wrote: >> Hi Frank, >> >> El 8/9/24 a las 14:17, Frank Thommen escribió: >>> 5 hdd 3.81450 1.00000 3.8 TiB 3.3 TiB 3.1 TiB 37 MiB 8.5 GiB >>> 568 GiB 85.45 1.16 194 up >>> 16 hdd 1.90039 1.00000 1.9 TiB 1.6 TiB 1.6 TiB 19 MiB 5.4 >>> GiB 261 GiB 86.59 1.18 93 up >> >> Those OSD are nearfull, if you have the default 0.85 value for that. >> You can try to lower a bit their weight. >> >> Cheers >> >> Eneko Lacunza >> Zuzendari teknikoa | Director técnico >> Binovo IT Human Project > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent 2024-09-07 19:27 ` David der Nederlanden | ITTY via pve-user 2024-09-07 19:49 ` Peter Eisch via pve-user @ 2024-09-08 12:17 ` Frank Thommen 1 sibling, 0 replies; 17+ messages in thread From: Frank Thommen @ 2024-09-08 12:17 UTC (permalink / raw) To: pve-user Hi David, I have deleted the OSDs one by one (sometimes two by two) and then recreated them, but this time with an SSD partition as DB device (previously the DB was on the OSD HDDs). So the SSDs should not have been added as OSDs. `ceph osd tree` gives me ---------------------------------- $ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 55.66704 root default -3 18.40823 host pve01 0 hdd 3.68259 osd.0 up 1.00000 1.00000 1 hdd 3.68259 osd.1 up 1.00000 1.00000 2 hdd 3.68259 osd.2 up 1.00000 1.00000 9 hdd 1.84380 osd.9 up 1.00000 1.00000 10 hdd 1.84380 osd.10 up 1.00000 1.00000 11 hdd 1.84380 osd.11 up 1.00000 1.00000 12 hdd 1.82909 osd.12 up 1.00000 1.00000 -5 19.06548 host pve02 3 hdd 3.81450 osd.3 up 1.00000 1.00000 4 hdd 3.81450 osd.4 up 1.00000 1.00000 5 hdd 3.81450 osd.5 up 1.00000 1.00000 13 hdd 1.90720 osd.13 up 1.00000 1.00000 14 hdd 1.90720 osd.14 up 1.00000 1.00000 15 hdd 1.90720 osd.15 up 1.00000 1.00000 16 hdd 1.90039 osd.16 up 1.00000 1.00000 -7 18.19333 host pve03 6 hdd 3.63869 osd.6 up 1.00000 1.00000 7 hdd 3.63869 osd.7 up 1.00000 1.00000 8 hdd 3.63869 osd.8 up 1.00000 1.00000 17 hdd 1.81929 osd.17 up 1.00000 1.00000 18 hdd 1.81929 osd.18 up 1.00000 1.00000 19 hdd 1.81940 osd.19 up 1.00000 1.00000 20 hdd 1.81929 osd.20 up 1.00000 1.00000 $ ---------------------------------- Cheers, Frank On 07.09.24 21:27, David der Nederlanden | ITTY via pve-user wrote: > Hi Frank, > > Can you share your OSD layout too? > > My first thought is that you added the SSD's as OSD which caused that OSD to get full, with a near full pool as a result. > > You can get some insights with: > `ceph osd tree` > > And if needed you can reweight the OSD's, but that would require a good OSD layout: > `ceph osd reweight-by-utilization` > > Sources: > https://forum.proxmox.com/threads/ceph-pool-full.47810/ > https://docs.ceph.com/en/reef/rados/operations/health-checks/#pool-near-full > > Kind regards, > David der Nederlanden _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] "nearfull" status in PVE Dashboard not consistent 2024-09-07 19:14 ` Frank Thommen 2024-09-07 19:27 ` David der Nederlanden | ITTY via pve-user @ 2024-09-09 0:46 ` Bryan Fields 2024-09-10 11:48 ` [PVE-User] [Extern] - " Frank Thommen 1 sibling, 1 reply; 17+ messages in thread From: Bryan Fields @ 2024-09-09 0:46 UTC (permalink / raw) To: Proxmox VE user list On 9/7/24 3:14 PM, Frank Thommen wrote: > Sorry, I digress... Please find the two images here: > > * https://pasteboard.co/CUPNjkTmyYV8.jpg > (Ceph_dashboard_nearfull_warning.jpg) > * https://pasteboard.co/34GBggOiUNII.jpg (Ceph_pool_overview.jpg) > > HTH, Frank The links are broken, "502 Bad Gateway, nginx/1.1.19" -- Bryan Fields 727-409-1194 - Voice http://bryanfields.net _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent 2024-09-09 0:46 ` Bryan Fields @ 2024-09-10 11:48 ` Frank Thommen 0 siblings, 0 replies; 17+ messages in thread From: Frank Thommen @ 2024-09-10 11:48 UTC (permalink / raw) To: Proxmox VE user list, Bryan Fields pasteboard.co seems to be out-of-order at the time. Are there good alternatives? It would really be easier, if I could just send attachments to the list :-/ On 09.09.24 02:46, Bryan Fields wrote: > On 9/7/24 3:14 PM, Frank Thommen wrote: >> Sorry, I digress... Please find the two images here: >> >> * https://pasteboard.co/CUPNjkTmyYV8.jpg >> (Ceph_dashboard_nearfull_warning.jpg) >> * https://pasteboard.co/34GBggOiUNII.jpg (Ceph_pool_overview.jpg) >> >> HTH, Frank > > The links are broken, "502 Bad Gateway, nginx/1.1.19" > _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-09-11 14:24 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-09-07 19:04 [PVE-User] "nearfull" status in PVE Dashboard not consistent Frank Thommen 2024-09-07 19:07 ` Frank Thommen 2024-09-07 19:14 ` Frank Thommen 2024-09-07 19:27 ` David der Nederlanden | ITTY via pve-user 2024-09-07 19:49 ` Peter Eisch via pve-user 2024-09-08 12:17 ` [PVE-User] [Extern] - " Frank Thommen 2024-09-09 10:36 ` Eneko Lacunza via pve-user 2024-09-10 12:02 ` Frank Thommen 2024-09-10 18:31 ` David der Nederlanden | ITTY via pve-user 2024-09-11 11:00 ` Frank Thommen 2024-09-11 11:52 ` Daniel Oliver 2024-09-11 14:24 ` Frank Thommen 2024-09-11 14:22 ` Frank Thommen 2024-09-11 10:51 ` Frank Thommen 2024-09-08 12:17 ` [PVE-User] " Frank Thommen 2024-09-09 0:46 ` Bryan Fields 2024-09-10 11:48 ` [PVE-User] [Extern] - " Frank Thommen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox