From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 85E6B92272 for ; Thu, 1 Feb 2024 14:35:29 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 65E3B106E1 for ; Thu, 1 Feb 2024 14:35:29 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 1 Feb 2024 14:35:28 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 8D95A40B5D for ; Thu, 1 Feb 2024 14:35:28 +0100 (CET) Date: Thu, 01 Feb 2024 14:35:22 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Max Carrara , Proxmox VE development discussion References: <20240130184041.1125674-1-m.carrara@proxmox.com> <335071eb-f206-41b6-b70d-e621c276af40@proxmox.com> In-Reply-To: <335071eb-f206-41b6-b70d-e621c276af40@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid) Message-Id: <1706793711.eyevq5nyt4.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.065 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: Re: [pve-devel] [PATCH master ceph, quincy-stable-8 ceph, pve-storage, pve-manager 0/8] Fix #4759: Configure Permissions for ceph-crash.service X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Feb 2024 13:35:29 -0000 On January 31, 2024 3:22 pm, Friedrich Weber wrote: > Also, looks like every time ceph-crash posts a report, the syslog reads: >=20 > Jan 31 15:02:30 ceph1 ceph-crash[110939]: WARNING:ceph-crash:post > /var/lib/ceph/crash/2024-01-31T13:53:16.419342Z_1b5a078a-f665-4fcd-abd5-9= bf602048d1f > as client.crash.ceph1 failed: 2024-01-31T15:02:30.105+0100 7f10bf7ae6c0 > -1 auth: unable to find a keyring on > /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied > Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 > 7f10bf7ae6c0 -1 auth: unable to find a keyring on > /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied > Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 > 7f10bf7ae6c0 -1 auth: unable to find a keyring on > /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied > Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 > 7f10bf7ae6c0 -1 auth: unable to find a keyring on > /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied > Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 > 7f10bf7ae6c0 -1 monclient: keyring not found > Jan 31 15:02:30 ceph1 ceph-crash[110939]: [errno 13] RADOS permission > denied (error connecting to the cluster) >=20 > I remember you mentioned this before. Do I remember correctly there is > no easy way to prevent these messages? Having them appear only when a > crash is posted is certainly better than every 10 minutes, but they are > a bit misleading as they very much look like an error that needs attentio= n. so I did a few more experiments. ceph-crash does two things A) it executes `ceph -s` without specifying a client name, which means that part will always try to use the `client.admin` config/keyring B) it tries to post crashes if they exist, using the keys `client.crash.$HOST`, `client.crash`, `client.admin` A happens at startup to "exercise the key", irrespective of crash files existing or not. we'd need to patch ceph-crash once we settled which client name to use to avoid it. B happens for every crash, once posting worked the other keyrings are not tried again for that particular crash, but will for the next. this means to avoid warnings altogether, we'd need to make the first entry in auth_names work or patch the `auth_names` part of the ceph-crash binary. I played around a bit and it seems we could do the following: - change the [client] section in our config to only affect [client.admin] (simple renaming is enough, all `ceph` invocations without `-n` or `-i` should continue to work as before, since "client.admin" is the default `-n` value) - generate (on each node) a `client.crash.$HOSTNAME` keyring with crash profile and store it in /etc/ceph/ceph.client.crash.$HOSTNAME ceph-crash will then (at least for crash posting purposes) invoke `ceph -n client.crash.$HOSTNAME` first, which will pick up that keyring since `/etc/ceph/$cluster.$name.keyring` is part of the default value(s) for the client keyring. this doesn't work without modifying our ceph.conf since the current global "client.keyring" setting overrides the built-in defaults for *all* invocations, even for `ceph -n XXX`. using the current approach with "client.crash" and a key on pmxcfs also works, to silence the warnings we could then patch ceph-crash to use that key (/client name) for `ceph -s` and remove the `client.crash.$HOSTNAME` from auth_names. but I assume since that comes first, that upstream actually expects people to use that keyring, the rest are just fallbacks, so we'd need to watch for regressions when pulling in updates.