* [PVE-User] Ceph: sudden slow ops, freezes, and slow-downs @ 2022-06-22 18:39 Branislav Viest [not found] ` <mailman.54.1655965821.338.pve-user@lists.proxmox.com> 0 siblings, 1 reply; 4+ messages in thread From: Branislav Viest @ 2022-06-22 18:39 UTC (permalink / raw) To: pve-user Hello, I have experienced some serious difficulties with our Ceph cluster built-in from 4 nodes. I have created a proxmox forum topic regarding this problem and I think it is useless to type all the information in this mailing thread. I would like to ask you guys to check it out and if anyone has any idea what should cause this problem, what should be a problem, or how to solve it, I would be very sincere and thankful. Any idea or your experience is worthy. We have already begun the migration process to another provider but I want to know the answer or the problem roots anyway to prevent possible failure or problems after migration. The topic is here: [ https://forum.proxmox.com/threads/ceph-sudden-slow-ops-freezes-and-slow-downs.111144/ | https://forum.proxmox.com/threads/ceph-sudden-slow-ops-freezes-and-slow-downs.111144/ ] Thank you all. ------------ Best Regards Branislav Brian Viest linux administrator ------------ Legal Disclaimer: This e-mail and any attached files are confidential and may be legally privileged. If you are not the addressee, any disclosure, reproduction, copying, distribution, or other dissemination or use of this communication is strictly prohibited. If you have received this transmission in error please notify the sender immediately and then delete this e-mail. The sender does not accept liability for the correct and complete transmission of the information, nor for any delay or interruption of the transmission, nor for damages arising from the use of or reliance on the information. All e-mail messages addressed to, received or sent by sender are deemed to be professional in nature. Accordingly, the sender or recipient of these messages agrees that they may be read by other sender employees than the official recipient or sender in order to ensure the continuity of work-related activities and allow supervision thereof. ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <mailman.54.1655965821.338.pve-user@lists.proxmox.com>]
* Re: [PVE-User] Ceph: sudden slow ops, freezes, and slow-downs [not found] ` <mailman.54.1655965821.338.pve-user@lists.proxmox.com> @ 2022-06-23 6:54 ` Branislav Viest [not found] ` <4c507d5f-5ad9-9fb1-6eab-6ca1913784d8@binovo.es> 0 siblings, 1 reply; 4+ messages in thread From: Branislav Viest @ 2022-06-23 6:54 UTC (permalink / raw) To: Proxmox VE user list Hello, ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 15.52213 root default -3 5.18097 host node1 0 ssd 1.72699 osd.0 up 1.00000 1.00000 1 ssd 1.72699 osd.1 up 1.00000 1.00000 2 ssd 1.72699 osd.2 up 1.00000 1.00000 -5 3.45398 host node2 3 ssd 1.72699 osd.3 up 1.00000 1.00000 5 ssd 1.72699 osd.5 up 1.00000 1.00000 -7 1.70740 host node3 6 ssd 0.85370 osd.6 up 1.00000 1.00000 7 ssd 0.85370 osd.7 up 1.00000 1.00000 -9 5.17978 host node4 8 ssd 1.72659 osd.8 up 1.00000 1.00000 9 ssd 1.72659 osd.9 up 1.00000 1.00000 10 ssd 1.72659 osd.10 up 1.00000 1.00000 Since slow ops are reported the most of the time within multiple OSDs, I did not try to perform tests with some OSDs out. Now I check the logs from the last 2-3 days and slow ops are reported mostly on the 5 OSDs from total 10. ------------ Best Regards Branislav Brian Viest ------------ Legal Disclaimer: This e-mail and any attached files are confidential and may be legally privileged. If you are not the addressee, any disclosure, reproduction, copying, distribution, or other dissemination or use of this communication is strictly prohibited. If you have received this transmission in error please notify the sender immediately and then delete this e-mail. The sender does not accept liability for the correct and complete transmission of the information, nor for any delay or interruption of the transmission, nor for damages arising from the use of or reliance on the information. All e-mail messages addressed to, received or sent by sender are deemed to be professional in nature. Accordingly, the sender or recipient of these messages agrees that they may be read by other sender employees than the official recipient or sender in order to ensure the continuity of work-related activities and allow supervision thereof. ----- Pôvodná správa ----- Od: "Eneko Lacunza via pve-user" <pve-user@lists.proxmox.com> Komu: "pve-user" <pve-user@lists.proxmox.com> Kópia: "Eneko Lacunza" <elacunza@binovo.es> Odoslané: štvrtok, 23. jún 2022 8:29:40 Predmet: Re: [PVE-User] Ceph: sudden slow ops, freezes, and slow-downs _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <4c507d5f-5ad9-9fb1-6eab-6ca1913784d8@binovo.es>]
* Re: [PVE-User] Ceph: sudden slow ops, freezes, and slow-downs [not found] ` <4c507d5f-5ad9-9fb1-6eab-6ca1913784d8@binovo.es> @ 2022-06-23 8:24 ` Branislav Viest [not found] ` <1cdc99da-0537-7533-2987-8678223e2971@binovo.es> 0 siblings, 1 reply; 4+ messages in thread From: Branislav Viest @ 2022-06-23 8:24 UTC (permalink / raw) To: Eneko Lacunza; +Cc: Proxmox VE user list OSD1,3,5,2,8 All drives are Samsung NVMe Model Number: SAMSUNG MZQLB1T9HAJR-00007 According to SMART value "Percentage Used", all are up to 10%. All SMART overall-health self-assessment test result are PASSED. ------------ Best Regards Branislav Brian Viest ------------ Legal Disclaimer: This e-mail and any attached files are confidential and may be legally privileged. If you are not the addressee, any disclosure, reproduction, copying, distribution, or other dissemination or use of this communication is strictly prohibited. If you have received this transmission in error please notify the sender immediately and then delete this e-mail. The sender does not accept liability for the correct and complete transmission of the information, nor for any delay or interruption of the transmission, nor for damages arising from the use of or reliance on the information. All e-mail messages addressed to, received or sent by sender are deemed to be professional in nature. Accordingly, the sender or recipient of these messages agrees that they may be read by other sender employees than the official recipient or sender in order to ensure the continuity of work-related activities and allow supervision thereof. Od: "Eneko Lacunza" <elacunza@binovo.es> Komu: "Branislav Viest" <info@branoviest.com>, "Proxmox VE user list" <pve-user@lists.proxmox.com> Odoslané: štvrtok, 23. jún 2022 9:22:08 Predmet: Re: [PVE-User] Ceph: sudden slow ops, freezes, and slow-downs Hi, What numbers are those 5 OSDs? Hace you checked SSD drive manufacturer and models? El 23/6/22 a las 8:54, Branislav Viest escribió: Hello, ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 15.52213 root default -3 5.18097 host node1 0 ssd 1.72699 osd.0 up 1.00000 1.00000 1 ssd 1.72699 osd.1 up 1.00000 1.00000 2 ssd 1.72699 osd.2 up 1.00000 1.00000 -5 3.45398 host node2 3 ssd 1.72699 osd.3 up 1.00000 1.00000 5 ssd 1.72699 osd.5 up 1.00000 1.00000 -7 1.70740 host node3 6 ssd 0.85370 osd.6 up 1.00000 1.00000 7 ssd 0.85370 osd.7 up 1.00000 1.00000 -9 5.17978 host node4 8 ssd 1.72659 osd.8 up 1.00000 1.00000 9 ssd 1.72659 osd.9 up 1.00000 1.00000 10 ssd 1.72659 osd.10 up 1.00000 1.00000 Since slow ops are reported the most of the time within multiple OSDs, I did not try to perform tests with some OSDs out. Now I check the logs from the last 2-3 days and slow ops are reported mostly on the 5 OSDs from total 10. ------------ Best Regards Branislav Brian Viest ------------ Legal Disclaimer: This e-mail and any attached files are confidential and may be legally privileged. If you are not the addressee, any disclosure, reproduction, copying, distribution, or other dissemination or use of this communication is strictly prohibited. If you have received this transmission in error please notify the sender immediately and then delete this e-mail. The sender does not accept liability for the correct and complete transmission of the information, nor for any delay or interruption of the transmission, nor for damages arising from the use of or reliance on the information. All e-mail messages addressed to, received or sent by sender are deemed to be professional in nature. Accordingly, the sender or recipient of these messages agrees that they may be read by other sender employees than the official recipient or sender in order to ensure the continuity of work-related activities and allow supervision thereof. ----- Pôvodná správa ----- Od: "Eneko Lacunza via pve-user" [ mailto:pve-user@lists.proxmox.com | <pve-user@lists.proxmox.com> ] Komu: "pve-user" [ mailto:pve-user@lists.proxmox.com | <pve-user@lists.proxmox.com> ] Kópia: "Eneko Lacunza" [ mailto:elacunza@binovo.es | <elacunza@binovo.es> ] Odoslané: štvrtok, 23. jún 2022 8:29:40 Predmet: Re: [PVE-User] Ceph: sudden slow ops, freezes, and slow-downs _______________________________________________ pve-user mailing list [ mailto:pve-user@lists.proxmox.com | pve-user@lists.proxmox.com ] [ https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user | https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ] Eneko Lacunza Zuzendari teknikoa | Director técnico Binovo IT Human Project Tel. +34 943 569 206 | [ https://www.binovo.es/ | https://www.binovo.es ] Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun [ https://www.youtube.com/user/CANALBINOVO | https://www.youtube.com/user/CANALBINOVO ] [ https://www.linkedin.com/company/37269706/ | https://www.linkedin.com/company/37269706/ ] ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <1cdc99da-0537-7533-2987-8678223e2971@binovo.es>]
* Re: [PVE-User] Ceph: sudden slow ops, freezes, and slow-downs [not found] ` <1cdc99da-0537-7533-2987-8678223e2971@binovo.es> @ 2022-06-23 13:54 ` Branislav Viest 0 siblings, 0 replies; 4+ messages in thread From: Branislav Viest @ 2022-06-23 13:54 UTC (permalink / raw) To: Eneko Lacunza; +Cc: Proxmox VE user list 3 nodes have the same firmware on Drives (EDA5402Q) except the last one (newest) where is higher firmware version, EDA5702Q ------------ Best Regards Branislav Viest ------------ Legal Disclaimer: This e-mail and any attached files are confidential and may be legally privileged. If you are not the addressee, any disclosure, reproduction, copying, distribution, or other dissemination or use of this communication is strictly prohibited. If you have received this transmission in error please notify the sender immediately and then delete this e-mail. The sender does not accept liability for the correct and complete transmission of the information, nor for any delay or interruption of the transmission, nor for damages arising from the use of or reliance on the information. All e-mail messages addressed to, received or sent by sender are deemed to be professional in nature. Accordingly, the sender or recipient of these messages agrees that they may be read by other sender employees than the official recipient or sender in order to ensure the continuity of work-related activities and allow supervision thereof. Od: "Eneko Lacunza" <elacunza@binovo.es> Komu: "Branislav Viest" <info@branoviest.com> Kópia: "Proxmox VE user list" <pve-user@lists.proxmox.com> Odoslané: štvrtok, 23. jún 2022 13:12:23 Predmet: Re: [PVE-User] Ceph: sudden slow ops, freezes, and slow-downs Drives are DC so should be good. All have the same firmware version? El 23/6/22 a las 10:24, Branislav Viest escribió: OSD1,3,5,2,8 All drives are Samsung NVMe Model Number: SAMSUNG MZQLB1T9HAJR-00007 According to SMART value "Percentage Used", all are up to 10%. All SMART overall-health self-assessment test result are PASSED. ------------ Best Regards Branislav Brian Viest ------------ Legal Disclaimer: This e-mail and any attached files are confidential and may be legally privileged. If you are not the addressee, any disclosure, reproduction, copying, distribution, or other dissemination or use of this communication is strictly prohibited. If you have received this transmission in error please notify the sender immediately and then delete this e-mail. The sender does not accept liability for the correct and complete transmission of the information, nor for any delay or interruption of the transmission, nor for damages arising from the use of or reliance on the information. All e-mail messages addressed to, received or sent by sender are deemed to be professional in nature. Accordingly, the sender or recipient of these messages agrees that they may be read by other sender employees than the official recipient or sender in order to ensure the continuity of work-related activities and allow supervision thereof. Od: "Eneko Lacunza" [ mailto:elacunza@binovo.es | <elacunza@binovo.es> ] Komu: "Branislav Viest" [ mailto:info@branoviest.com | <info@branoviest.com> ] , "Proxmox VE user list" [ mailto:pve-user@lists.proxmox.com | <pve-user@lists.proxmox.com> ] Odoslané: štvrtok, 23. jún 2022 9:22:08 Predmet: Re: [PVE-User] Ceph: sudden slow ops, freezes, and slow-downs Hi, What numbers are those 5 OSDs? Hace you checked SSD drive manufacturer and models? El 23/6/22 a las 8:54, Branislav Viest escribió: BQ_BEGIN Hello, ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 15.52213 root default -3 5.18097 host node1 0 ssd 1.72699 osd.0 up 1.00000 1.00000 1 ssd 1.72699 osd.1 up 1.00000 1.00000 2 ssd 1.72699 osd.2 up 1.00000 1.00000 -5 3.45398 host node2 3 ssd 1.72699 osd.3 up 1.00000 1.00000 5 ssd 1.72699 osd.5 up 1.00000 1.00000 -7 1.70740 host node3 6 ssd 0.85370 osd.6 up 1.00000 1.00000 7 ssd 0.85370 osd.7 up 1.00000 1.00000 -9 5.17978 host node4 8 ssd 1.72659 osd.8 up 1.00000 1.00000 9 ssd 1.72659 osd.9 up 1.00000 1.00000 10 ssd 1.72659 osd.10 up 1.00000 1.00000 Since slow ops are reported the most of the time within multiple OSDs, I did not try to perform tests with some OSDs out. Now I check the logs from the last 2-3 days and slow ops are reported mostly on the 5 OSDs from total 10. ------------ Best Regards Branislav Brian Viest ------------ Legal Disclaimer: This e-mail and any attached files are confidential and may be legally privileged. If you are not the addressee, any disclosure, reproduction, copying, distribution, or other dissemination or use of this communication is strictly prohibited. If you have received this transmission in error please notify the sender immediately and then delete this e-mail. The sender does not accept liability for the correct and complete transmission of the information, nor for any delay or interruption of the transmission, nor for damages arising from the use of or reliance on the information. All e-mail messages addressed to, received or sent by sender are deemed to be professional in nature. Accordingly, the sender or recipient of these messages agrees that they may be read by other sender employees than the official recipient or sender in order to ensure the continuity of work-related activities and allow supervision thereof. ----- Pôvodná správa ----- Od: "Eneko Lacunza via pve-user" [ mailto:pve-user@lists.proxmox.com | <pve-user@lists.proxmox.com> ] Komu: "pve-user" [ mailto:pve-user@lists.proxmox.com | <pve-user@lists.proxmox.com> ] Kópia: "Eneko Lacunza" [ mailto:elacunza@binovo.es | <elacunza@binovo.es> ] Odoslané: štvrtok, 23. jún 2022 8:29:40 Predmet: Re: [PVE-User] Ceph: sudden slow ops, freezes, and slow-downs _______________________________________________ pve-user mailing list [ mailto:pve-user@lists.proxmox.com | pve-user@lists.proxmox.com ] [ https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user | https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ] BQ_END Eneko Lacunza Director Técnico | Zuzendari teknikoa Binovo IT Human Project 943 569 206 [ mailto:elacunza@binovo.es | elacunza@binovo.es ] [ https://binovo.es/ | binovo.es ] Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun [ https://www.youtube.com/user/CANALBINOVO/ ] [ https://www.linkedin.com/company/37269706/ ] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-06-23 13:55 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-22 18:39 [PVE-User] Ceph: sudden slow ops, freezes, and slow-downs Branislav Viest [not found] ` <mailman.54.1655965821.338.pve-user@lists.proxmox.com> 2022-06-23 6:54 ` Branislav Viest [not found] ` <4c507d5f-5ad9-9fb1-6eab-6ca1913784d8@binovo.es> 2022-06-23 8:24 ` Branislav Viest [not found] ` <1cdc99da-0537-7533-2987-8678223e2971@binovo.es> 2022-06-23 13:54 ` Branislav Viest
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox