From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id A57CA69018 for ; Wed, 10 Mar 2021 10:08:59 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 919B81AA42 for ; Wed, 10 Mar 2021 10:08:29 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 0AA211AA34 for ; Wed, 10 Mar 2021 10:08:29 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id C858A46011; Wed, 10 Mar 2021 10:08:28 +0100 (CET) Message-ID: <8db6f13c-6ff9-3a6a-b6a5-42b60766186d@proxmox.com> Date: Wed, 10 Mar 2021 10:08:28 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:87.0) Gecko/20100101 Thunderbird/87.0 Content-Language: en-US To: Proxmox VE development discussion , Roland References: <792c380c-6757-e058-55f6-f7d5436417f9@web.de> <3463e859-a6d9-ea66-481e-4f7548306e7d@proxmox.com> <39c0fe3b-8261-8477-e501-f728bcb4c051@web.de> From: Thomas Lamprecht In-Reply-To: <39c0fe3b-8261-8477-e501-f728bcb4c051@web.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.549 Adjusted score from AWL reputation of From: address CTE_8BIT_MISMATCH 0.999 Header says 7bits but body disagrees KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] avoidable writes of pmxcfs to /var/lib/pve-cluster/config.db ? X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2021 09:08:59 -0000 On 10.03.21 09:18, Roland wrote: > >>> corruption in particular problem situations like server crash or whatever. >> So the prime candidate for this write load are the PVE HA Local Resource >> Manager services on each node, they update their status and that is often >> required to signal the current Cluster Resource Manager's master service >> that the HA stack on that node is well alive and that commands got >> executed with result X. So yes, this is required and intentional. >> There maybe some room for optimization, but its not that straight forward, >> and (over-)clever solutions are often the wrong ones for an HA stack - as >> failure here is something we really want to avoid. But yeah, some easier >> to pick fruits could maybe be found here. >> >> The other thing I just noticed when checking out: >> # ls -l "/proc/$(pidof pmxcfs)/fd" >> >> to get the FDs for all db related FDs and then watch writes with: >> # strace -v -s $[1<<16] -f -p "$(pidof pmxcfs)" -e write=4,5,6 >> >> Was seeing additionally some writes for the RSA key files which should just >> not be there, but I need to closer investigate this, seemed a bit too odd >> to >> me. > not only these, i also see constant rewrite of  (non-changing?) vm > configuration data , too. > > just cat config.db-wal |strings|grep ..... |sort | uniq -c   to see > what's getting there. > but that's not a real issue though, the WAL is dimensioned quite big (4 MiB, while DB is often only 1 or 2 MiB), so it will always contain lots of DB data. This big WAL actually reduces additional write+syncs as we do not need to checkpoint it that often, so at least for reads it should be more performant. Also, the WAL is accessed in read and writes with off-sets (e.g., pwrite64) and thus only some specific small and contained parts are actually written newly. Thus you cannot really conclude anything from the total content in it, only from actual new writes (which can be seen with my strace command). Regarding the extra data I mentioned, it could be that this is due to sqlite handling memory pages directly, I need to still check it out closer. > the weird thing is, that it does not happen for every VM. just some. i > send you an email with additional data (don't want to post all my VMs > mac adresses in public) > for now I'm good, thanks, I can check that on my test clusters too - but if I need anything I'll come back to this offer. cheers, Thomas