From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 2E50B91FAB for ; Wed, 31 Jan 2024 15:22:55 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 049AB3C13B for ; Wed, 31 Jan 2024 15:22:55 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 31 Jan 2024 15:22:54 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 4351E45D70 for ; Wed, 31 Jan 2024 15:22:54 +0100 (CET) Message-ID: <335071eb-f206-41b6-b70d-e621c276af40@proxmox.com> Date: Wed, 31 Jan 2024 15:22:53 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Proxmox VE development discussion , Max Carrara References: <20240130184041.1125674-1-m.carrara@proxmox.com> From: Friedrich Weber In-Reply-To: <20240130184041.1125674-1-m.carrara@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.088 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: Re: [pve-devel] [PATCH master ceph, quincy-stable-8 ceph, pve-storage, pve-manager 0/8] Fix #4759: Configure Permissions for ceph-crash.service X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Jan 2024 14:22:55 -0000 Thanks a lot for tackling this issue! Gave this a quick spin on a pre-existing 3-node Quincy cluster on which I provoked a few crashes with `kill -n11 $(pidof ceph-osd)`. ceph-base with patch 2 applied (provided by Max off-list) correctly changed the /var/lib/ceph/crash/posted permissions to ceph:ceph for me. Installing pve-manager (with patch 8 applied) on node 1 created a keyring and added the section to /etc/pve/ceph.conf. However, installing on node 2 added a second `keyring` line to the section: [client.crash] keyring = /etc/pve/ceph/$cluster.$name.keyring keyring = /etc/pve/ceph/$cluster.$name.keyring Same thing happens on each `dpkg-reconfigure pve-manager` I think. Also, looks like every time ceph-crash posts a report, the syslog reads: Jan 31 15:02:30 ceph1 ceph-crash[110939]: WARNING:ceph-crash:post /var/lib/ceph/crash/2024-01-31T13:53:16.419342Z_1b5a078a-f665-4fcd-abd5-9bf602048d1f as client.crash.ceph1 failed: 2024-01-31T15:02:30.105+0100 7f10bf7ae6c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 7f10bf7ae6c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 7f10bf7ae6c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 7f10bf7ae6c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 7f10bf7ae6c0 -1 monclient: keyring not found Jan 31 15:02:30 ceph1 ceph-crash[110939]: [errno 13] RADOS permission denied (error connecting to the cluster) I remember you mentioned this before. Do I remember correctly there is no easy way to prevent these messages? Having them appear only when a crash is posted is certainly better than every 10 minutes, but they are a bit misleading as they very much look like an error that needs attention.