From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 538F9922B1 for ; Fri, 5 Apr 2024 12:49:59 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 26B1B1196B for ; Fri, 5 Apr 2024 12:49:29 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Fri, 5 Apr 2024 12:49:28 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 3CFF146438 for ; Fri, 5 Apr 2024 12:49:28 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Fri, 05 Apr 2024 12:49:26 +0200 Message-Id: From: "Gabriel Goller" To: "Proxmox Backup Server development discussion" X-Mailer: aerc 0.17.0-37-g3aa8b6308482-dirty References: <20240402133627.235028-1-h.duerr@proxmox.com> <20240402133627.235028-2-h.duerr@proxmox.com> In-Reply-To: <20240402133627.235028-2-h.duerr@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL -0.092 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [wikipedia.org] Subject: Re: [pbs-devel] [PATCH proxmox-backup 1/3] docs: centralise and update garbage collection description X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Apr 2024 10:49:59 -0000 On Tue Apr 2, 2024 at 3:36 PM CEST, Hannes Duerr wrote: > The "backup client usage" chapter describes a grace period that is 24 > hours and 5 minutes long, and unconnected to this a cut-off time is > mentioned under "maintenance tasks", which leads to confusion. Therefore > we summarise the entire description of garbage collection under > "maintenance tasks" and link to it in the "backup client usage" chapter > > Signed-off-by: Hannes Duerr > --- > docs/backup-client.rst | 16 +++--------- > docs/maintenance.rst | 55 ++++++++++++++++++++++++++++++------------ > 2 files changed, 43 insertions(+), 28 deletions(-) > > diff --git a/docs/backup-client.rst b/docs/backup-client.rst > index 00a1abbb..d015b844 100644 > --- a/docs/backup-client.rst > +++ b/docs/backup-client.rst > @@ -735,25 +735,15 @@ command. It is recommended to carry out garbage col= lection on a regular basis. > =20 > The garbage collection works in two phases. In the first phase, all > data blocks that are still in use are marked. In the second phase, > -unused data blocks are removed. > +unused data blocks are removed. A more detailed description of the GC > +can be found :ref:`here `. > + > =20 > .. note:: This command needs to read all existing backup index files > and touches the complete chunk-store. This can take a long time > depending on the number of chunks and the speed of the underlying > disks. > =20 > -.. note:: The garbage collection will only remove chunks that haven't be= en used > - for at least one day (exactly 24h 5m). This grace period is necessary= because > - chunks in use are marked by touching the chunk which updates the ``at= ime`` > - (access time) property. Filesystems are mounted with the ``relatime``= option > - by default. This results in a better performance by only updating the > - ``atime`` property if the last access has been at least 24 hours ago.= The > - downside is that touching a chunk within these 24 hours will not alwa= ys > - update its ``atime`` property. > - > - Chunks in the grace period will be logged at the end of the garbage > - collection task as *Pending removals*. > - > .. code-block:: console > =20 > # proxmox-backup-client garbage-collect > diff --git a/docs/maintenance.rst b/docs/maintenance.rst > index 6dbb6941..baa1241e 100644 > --- a/docs/maintenance.rst > +++ b/docs/maintenance.rst > @@ -171,7 +171,7 @@ It's recommended to setup a schedule to ensure that u= nused space is cleaned up > periodically. For most setups a weekly schedule provides a good interval= to > start. > =20 > -GC Background > +Overview > ^^^^^^^^^^^^^ Small nit: adjust the length of the underline to match the length of the title. > =20 > In `Proxmox Backup`_ Server, backup data is not saved directly, but rath= er as > @@ -187,26 +187,51 @@ references to the same chunks on every snapshot del= etion. Moreover, locking the > entire datastore is not feasible because new backups would be blocked un= til the deletion > process was complete. > =20 > -Therefore, Proxmox Backup Server uses a garbage collection (GC) process = to > +Therefore, Proxmox Backup Server uses a `tracing garbage collection > +`_ algorithm t= o > identify and remove the unused backup chunks that are no longer needed b= y any > -snapshot in the datastore. The GC process is designed to efficiently rec= laim > +snapshot in the datastore. The GC algorithm is designed to efficiently r= eclaim > the space occupied by these chunks with low impact on the performance of= the > datastore or interfering with other backups. > =20 > -The garbage collection (GC) process is performed per datastore and is sp= lit > -into two phases: > +The GC is performed per datastore and is split into two phases: > =20 > -- Phase one: Mark > - All index files are read, and the access time of the referred chunk fi= les is > - updated. > +- Phase one - Mark: > + > + Read all index files and update the ``atime`` (access time) of the rel= evant > + chunk files. > + > +- Phase two - Sweep: > + > + Iterate over all chunks and check the ``atime`` of the files. If > + the ``atime`` is older than the cut-off time, the chunk was neither > + referenced in a backup index nor is it part of a running backup that > + does not yet have an index to search. As such, safely remove the chunk= . > + > + > +Cut-off Time > +^^^^^^^^^^^^ > + > +The GC only clears the chunks that were last accessed before the > +cut-off time. The cut-off time is determined by whichever is earlier: > + > +- 24 hours and 5 minutes before the start of the garbage collection > + due to the mounting of the data storage with relatime, or I would also make relatime a inline literal like this: ``relatime`` > + > +- the start time of the oldest active backup job that has been running > + for longer than 24 hours and 5 minutes at the beginning of the > + garbage collection. This is necessary because the newly created > + backup could refer to blocks, but the GC would not notice this as > + there is no index of the backup that could be searched. > + > +Chunks accessed after the cut-off time are marked as *Pending removals* > +by the GC as it cannot be certain whether they are still needed. > + > +.. Note:: Mounting a volume with relatime means that the ``atime`` Same here > + of the chunk files is not updated every time, but only when the > + data has changed or the ``atime`` was before a certain time, > + which is 24 hours by default. > =20 > -- Phase two: Sweep > - The task iterates over all chunks, checks their file access time, and = if it > - is older than the cutoff time (i.e., the time when GC started, plus so= me > - headroom for safety and Linux file system behavior), the task knows th= at the > - chunk was neither referred to in any backup index nor part of any curr= ently > - running backup that has no index to scan for. As such, the chunk can b= e > - safely deleted. > =20 > Manually Starting GC > ^^^^^^^^^^^^^^^^^^^^