From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id A57CA69018
 for <pve-devel@lists.proxmox.com>; Wed, 10 Mar 2021 10:08:59 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 919B81AA42
 for <pve-devel@lists.proxmox.com>; Wed, 10 Mar 2021 10:08:29 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 0AA211AA34
 for <pve-devel@lists.proxmox.com>; Wed, 10 Mar 2021 10:08:29 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id C858A46011;
 Wed, 10 Mar 2021 10:08:28 +0100 (CET)
Message-ID: <8db6f13c-6ff9-3a6a-b6a5-42b60766186d@proxmox.com>
Date: Wed, 10 Mar 2021 10:08:28 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:87.0) Gecko/20100101
 Thunderbird/87.0
Content-Language: en-US
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 Roland <devzero@web.de>
References: <792c380c-6757-e058-55f6-f7d5436417f9@web.de>
 <3463e859-a6d9-ea66-481e-4f7548306e7d@proxmox.com>
 <39c0fe3b-8261-8477-e501-f728bcb4c051@web.de>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
In-Reply-To: <39c0fe3b-8261-8477-e501-f728bcb4c051@web.de>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.549 Adjusted score from AWL reputation of From: address
 CTE_8BIT_MISMATCH       0.999 Header says 7bits but body disagrees
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] avoidable writes of pmxcfs to
 /var/lib/pve-cluster/config.db ?
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Wed, 10 Mar 2021 09:08:59 -0000

On 10.03.21 09:18, Roland wrote:
> 
>>> corruption in particular problem situations like server crash or whatever.
>> So the prime candidate for this write load are the PVE HA Local Resource
>> Manager services on each node, they update their status and that is often
>> required to signal the current Cluster Resource Manager's master service
>> that the HA stack on that node is well alive and that commands got
>> executed with result X. So yes, this is required and intentional.
>> There maybe some room for optimization, but its not that straight forward,
>> and (over-)clever solutions are often the wrong ones for an HA stack - as
>> failure here is something we really want to avoid. But yeah, some easier
>> to pick fruits could maybe be found here.
>>
>> The other thing I just noticed when checking out:
>> # ls -l "/proc/$(pidof pmxcfs)/fd"
>>
>> to get the FDs for all db related FDs and then watch writes with:
>> # strace -v -s $[1<<16] -f -p "$(pidof pmxcfs)" -e write=4,5,6
>>
>> Was seeing additionally some writes for the RSA key files which should just
>> not be there, but I need to closer investigate this, seemed a bit too odd
>> to
>> me.
> not only these, i also see constant rewrite of  (non-changing?) vm
> configuration data , too.
> 
> just cat config.db-wal |strings|grep ..... |sort | uniq -c   to see
> what's getting there.
> 

but that's not a real issue though, the WAL is dimensioned quite big (4 MiB,
while DB is often only 1 or 2 MiB), so it will always contain lots of DB data.
This big WAL actually reduces additional write+syncs as we do not need to
checkpoint it that often, so at least for reads it should be more performant.

Also, the WAL is accessed in read and writes with off-sets (e.g., pwrite64)
and thus only some specific small and contained parts are actually written
newly. Thus you cannot really conclude anything from the total content in it,
only from actual new writes (which can be seen with my strace command).

Regarding the extra data I mentioned, it could be that this is due to sqlite
handling memory pages directly, I need to still check it out closer.

> the weird thing is, that it does not happen for every VM. just some. i
> send you an email with additional data (don't want to post all my VMs
> mac adresses in public)
> 

for now I'm good, thanks, I can check that on my test clusters too - but if
I need anything I'll come back to this offer.

cheers,
Thomas