public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Roland <devzero@web.de>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] avoidable writes of pmxcfs to /var/lib/pve-cluster/config.db ?
Date: Tue, 9 Mar 2021 21:45:11 +0100	[thread overview]
Message-ID: <792c380c-6757-e058-55f6-f7d5436417f9@web.de> (raw)

hello proxmox team,

i found that pmxcfs process is quite "chatty" and one of the top
disk-writers on our proxmox nodes.

i had a closer look, because i was curious, why wearout of our samsung
EVO is already at 4% .  as disk I/O of our vms is typically very low, so
we used lower end ssd for those maschines.

it seems pmxcfs is constantly writing into config.db-wal at a >10 kB/s
and >10 writes/s rate level, whereas i can only see few changes in config.db

from my rough calculation, these writes probably sum up to several
hundreds of gigabytes of disk blocks and >100mio iops written in a year,
which isn't "just nothing" for lower-end ssd  (small and cheap ssd's may
only have some tens of TBW lifetime).

i know that it's recommended to use enterprise ssd for proxmox, but as
they are expensive i also dislike if they get avoidable wearout on any
of our systems.


what makes me raise my eyebrowe is, that it seems that most of the data
written to the sqlite db seems to be unchanged data, i.e. i don't see
significant changes in config.db over time, (compared with sqldiff),
whereas the write-ahead-log at config.db-wal has quite high "flow rate".

I cannot decide if this really is a must have, but it looks that writing
of (at least parts of)  the cluster runtime data (like rsa key
information) is being done in a "just dump it all down into the
database" way. this may make it easy at the implementation level and
easy for the programmer.

i would love to hear a comment on this finding .

maybe there is will/room for optimisation to avoid unnecessary disk
wearout, saving avoidable database write/workload (ok it's tiny
workload)  , but thus probably also lower the risk of database
corruption in particular problem situations like server crash or whatever.

regards
Roland Kletzing

ps:
sorry, if this may look pointy-headed or bean-counting from a
non-involved and sorry for posting here, but i was unsure if bugzilla
was better for this, especially because i could not select
"corosync-pve" as component for the bug/rfe ticket.  this is an
opensource project and typically, opensource is something getting closer
look and better optimization , which often makes it superior. at least
it should be allowed to ask "is this intentional ?".



# strace -f -p $(pidof pmxcfs)  -s32768  2>&1|grep pwrite64 | pv -br
-i10 >/dev/null
1.93MiB [18.8KiB/s]
^C

# fatrace -c |stdbuf -oL grep pmxcfs | stdbuf -oL pv -lr -i60 >/dev/null
[11.5 /s]

# cat config.db-wal |strings|grep "BEGIN RSA" |cut -b 1-50|sort |uniq -c
     331 pve-www.key-----BEGIN RSA PRIVATE KEY-----

# cat config.db-wal |strings|grep "hp-ml350" |cut -b 1-50|sort |uniq -c
     114 hp-ml350 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDGN

# iotop -a

Total DISK READ:         0.00 B/s | Total DISK WRITE:        71.79 K/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:       0.00 B/s
   TID  PRIO  USER     DISK READ DISK WRITE>  SWAPIN      IO COMMAND
26962 be/4 root         20.00 K   1295.00 K  0.00 %  0.00 % pmxcfs
[cfs_loop]
  8425 be/4 root          0.00 B     32.00 K  0.00 %  0.00 %
[kworker/u16:1-poll_mpt2sas0_statu]
26992 be/4 root          0.00 B      8.00 K  0.00 %  0.00 % rrdcached -B
-b /var/lib/rrdcached/db/ -j /var/lib/rrdcached/journal/ -p
/var/run/rrdcached.pid -l unix:/var/run/rrdcached.sock
  7832 be/4 www-data      0.00 B   1024.00 B  0.00 %  0.00 % pveproxy worker
     1 be/4 root          0.00 B      0.00 B  0.00 %  0.00 % init
     2 be/4 root          0.00 B      0.00 B  0.00 %  0.00 % [kthreadd]
     3 be/0 root          0.00 B      0.00 B  0.00 %  0.00 % [rcu_gp]
     4 be/0 root          0.00 B      0.00 B  0.00 %  0.00 % [rcu_par_gp]
     6 be/0 root          0.00 B      0.00 B  0.00 %  0.00 %
[kworker/0:0H-kblockd]
     8 be/0 root          0.00 B      0.00 B  0.00 %  0.00 % [mm_percpu_wq]
     9 be/4 root          0.00 B      0.00 B  0.00 %  0.00 % [ksoftirqd/0]


             reply	other threads:[~2021-03-09 20:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-09 20:45 Roland [this message]
2021-03-10  6:55 ` Thomas Lamprecht
2021-03-10  8:18   ` Roland
2021-03-10  9:08     ` Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=792c380c-6757-e058-55f6-f7d5436417f9@web.de \
    --to=devzero@web.de \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal