all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: Friedrich Weber <f.weber@proxmox.com>
Cc: pmg-devel@lists.proxmox.com
Subject: Re: [pmg-devel] [PATCH pmg-api] fix #5189: cluster: do not synchronize statistics that already exist
Date: Wed, 21 Feb 2024 15:09:20 +0100	[thread overview]
Message-ID: <20240221150920.5301e870@rosa.proxmox.com> (raw)
In-Reply-To: <20240124135626.217720-1-f.weber@proxmox.com>

Thanks for finding this and directly tackling it! much appreciated

The approach works in general - and from looking through the code I think
resetting this on every sync makes more sense than once upon cluster-join
(e.g. the clusterinfo table gets truncated/wiped when restoring a backup)

I managed to reproduce the issue and verify the fix - question is why not
take the same approach for the cmailstore sync (from a quick glance the
only other table that gets synced by id not mtime) - and afair we had some
users who had to wait for their quarantine-lifetime to get their cluster
in sync again)

If you want to look even further into the cluster-stack - one of the more
common sources of constraint violation errors in a pmg cluster is removing
a cluster-node by deleting it's stanza from the config (or removing the
cluster.conf, and then rejoining that node, or a fresh one (the cid will
clash again):
https://forum.proxmox.com/threads/ha-cluster-warning-at-creation.122289/
https://forum.proxmox.com/threads/master-aus-postgresdb-entfernen.85255/

afaict this removing and joining a fresh node might be needed to get the
cmailstore sync to fail.


On Wed, 24 Jan 2024 14:56:26 +0100
Friedrich Weber <f.weber@proxmox.com> wrote:

> After restoring a backup from a cluster on a fresh node with
> statistics, and then creating a cluster, the following can happen
> (node 1 being master and node 2 being a node): `ClusterInfo` on node
> 1 has no record about the last-synchronized `CStatistic` row id of
> node 2. Thus, pmgmirror on node 1 initializes the record with -1 and
> tries to synchronize *all* `CStatistic` rows with cid 2 from node 2.
> But (some of) these rows may already exist on cid 1, because they
> were part of the backup, so pmgmirror on node 1 triggers a Postgres
> unique constraint violation, statistics synchronization on node 1
> fails, and node 1 remains in the "synchronizing" state.
> 
> Fix this as follows: If `ClusterInfo` does not have a record on the
> last-synchronized `CStatistic` row id of some node cid, initialize it
> with the maximum row id that exists in the local `CStatistic` for that
> node cid, or with -1 if the local `CStatistic` has no row for that
> node cid. For this, use a SQL query that is already used in
> DBTools::update_client_clusterinfo. As a result, pmgmirror will only
> synchronize rows with row ids larger than the maximum row id from the
> backup, and avoid running into a unique constraint violation.
> 
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
> 
> Notes:
>     Tested that the steps from #5189 [1] do not produce a constraint
>     violation anymore, and both cluster nodes eventually become "active".
>     Also tested that this patch doesn't break anything in the "restore
>     without statistics" scenario.
>     
>     However, not 100% sure I haven't overlooked any edge cases, so
>     additional testing is very welcome!
>     
>     [1] https://bugzilla.proxmox.com/show_bug.cgi?id=5189
> 
>  src/PMG/Cluster.pm |  3 +--
>  src/PMG/DBTools.pm | 15 +++++++++++++++
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/src/PMG/Cluster.pm b/src/PMG/Cluster.pm
> index 015e66a..3bf4c41 100644
> --- a/src/PMG/Cluster.pm
> +++ b/src/PMG/Cluster.pm
> @@ -698,8 +698,7 @@ sub sync_statistic_db {
>  
>      my $count;
>  
> -    PMG::DBTools::create_clusterinfo_default(
> -	$ldb, $rcid, 'lastid_CStatistic', -1, undef);
> +    PMG::DBTools::create_clusterinfo_statistic_default($ldb, $rcid);
>  
>      do { # get new values
>  
> diff --git a/src/PMG/DBTools.pm b/src/PMG/DBTools.pm
> index 9e133bc..6e682f8 100644
> --- a/src/PMG/DBTools.pm
> +++ b/src/PMG/DBTools.pm
> @@ -1139,6 +1139,21 @@ sub create_clusterinfo_default {
>      $sth->finish();
>  }
>  
> +sub create_clusterinfo_statistic_default {
> +    my ($dbh, $rcid) = @_;
> +
> +    my $sth = $dbh->prepare("SELECT * FROM ClusterInfo WHERE " .
> +	"CID = ? AND Name = 'lastid_CStatistic'");
> +    $sth->execute($rcid);
> +    if (!$sth->fetchrow_hashref()) {
> +	# initialize to maximum CStatistic rid locally present for $rcid (if
> +	# any), might already exist if the node was restored from backup
> +	$dbh->do("INSERT INTO ClusterInfo (cid, name, ivalue) SELECT $rcid, 'lastid_CStatistic', " .
> +		  "COALESCE (max (rid), -1) FROM CStatistic WHERE cid = $rcid");
> +    }
> +    $sth->finish();
> +}
> +
>  sub read_int_clusterinfo {
>      my ($dbh, $rcid, $name) = @_;
>  





      reply	other threads:[~2024-02-21 14:09 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24 13:56 Friedrich Weber
2024-02-21 14:09 ` Stoiko Ivanov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240221150920.5301e870@rosa.proxmox.com \
    --to=s.ivanov@proxmox.com \
    --cc=f.weber@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal