public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Roland <devzero@web.de>
Subject: Re: [pve-devel] avoidable writes of pmxcfs to /var/lib/pve-cluster/config.db ?
Date: Wed, 10 Mar 2021 10:08:28 +0100	[thread overview]
Message-ID: <8db6f13c-6ff9-3a6a-b6a5-42b60766186d@proxmox.com> (raw)
In-Reply-To: <39c0fe3b-8261-8477-e501-f728bcb4c051@web.de>

On 10.03.21 09:18, Roland wrote:
> 
>>> corruption in particular problem situations like server crash or whatever.
>> So the prime candidate for this write load are the PVE HA Local Resource
>> Manager services on each node, they update their status and that is often
>> required to signal the current Cluster Resource Manager's master service
>> that the HA stack on that node is well alive and that commands got
>> executed with result X. So yes, this is required and intentional.
>> There maybe some room for optimization, but its not that straight forward,
>> and (over-)clever solutions are often the wrong ones for an HA stack - as
>> failure here is something we really want to avoid. But yeah, some easier
>> to pick fruits could maybe be found here.
>>
>> The other thing I just noticed when checking out:
>> # ls -l "/proc/$(pidof pmxcfs)/fd"
>>
>> to get the FDs for all db related FDs and then watch writes with:
>> # strace -v -s $[1<<16] -f -p "$(pidof pmxcfs)" -e write=4,5,6
>>
>> Was seeing additionally some writes for the RSA key files which should just
>> not be there, but I need to closer investigate this, seemed a bit too odd
>> to
>> me.
> not only these, i also see constant rewrite of  (non-changing?) vm
> configuration data , too.
> 
> just cat config.db-wal |strings|grep ..... |sort | uniq -c   to see
> what's getting there.
> 

but that's not a real issue though, the WAL is dimensioned quite big (4 MiB,
while DB is often only 1 or 2 MiB), so it will always contain lots of DB data.
This big WAL actually reduces additional write+syncs as we do not need to
checkpoint it that often, so at least for reads it should be more performant.

Also, the WAL is accessed in read and writes with off-sets (e.g., pwrite64)
and thus only some specific small and contained parts are actually written
newly. Thus you cannot really conclude anything from the total content in it,
only from actual new writes (which can be seen with my strace command).

Regarding the extra data I mentioned, it could be that this is due to sqlite
handling memory pages directly, I need to still check it out closer.

> the weird thing is, that it does not happen for every VM. just some. i
> send you an email with additional data (don't want to post all my VMs
> mac adresses in public)
> 

for now I'm good, thanks, I can check that on my test clusters too - but if
I need anything I'll come back to this offer.

cheers,
Thomas




      reply	other threads:[~2021-03-10  9:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-09 20:45 Roland
2021-03-10  6:55 ` Thomas Lamprecht
2021-03-10  8:18   ` Roland
2021-03-10  9:08     ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8db6f13c-6ff9-3a6a-b6a5-42b60766186d@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=devzero@web.de \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal