From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id B67C81FF15C for ; Wed, 13 Nov 2024 14:50:38 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7AABC165B4; Wed, 13 Nov 2024 14:50:39 +0100 (CET) Date: Wed, 13 Nov 2024 14:50:02 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox Backup Server development discussion References: <20241031154554.585068-1-c.ebner@proxmox.com> <20241031154554.585068-2-c.ebner@proxmox.com> In-Reply-To: <20241031154554.585068-2-c.ebner@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid) Message-Id: <1731505655.nd5vn0ks6l.astroid@yuna.none> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.047 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox-backup 2/2] docs: deduplicate background details for garbage collection X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" On October 31, 2024 4:45 pm, Christian Ebner wrote: > Currently, common details regarding garbage collection are documented > in the backup client and the maintenance task. Deduplicate this > information by moving the details to the background section of the > maintenance task and reference that section in the backup client > part. > > Signed-off-by: Christian Ebner > --- > docs/backup-client.rst | 28 ++++++++++++---------------- > docs/maintenance.rst | 35 ++++++++++++++++++++++++----------- > 2 files changed, 36 insertions(+), 27 deletions(-) > > diff --git a/docs/backup-client.rst b/docs/backup-client.rst > index e56e0625b..892be11d9 100644 > --- a/docs/backup-client.rst > +++ b/docs/backup-client.rst > @@ -789,29 +789,25 @@ Garbage Collection > ------------------ > > The ``prune`` command removes only the backup index files, not the data > -from the datastore. This task is left to the garbage collection > -command. It is recommended to carry out garbage collection on a regular basis. > +from the datastore. Deletion of unused backup data from the datastore is done by > +:ref:`garbage collection<_maintenance_gc>`. It is therefore recommended to > +schedule garbage collection tasks on a regular basis. The working principle of > +garbage collection is described in more details in the related :ref:`background > +section `. > > -The garbage collection works in two phases. In the first phase, all > -data blocks that are still in use are marked. In the second phase, > -unused data blocks are removed. > +To start garbage collection from the client side, run the following command: > + > +.. code-block:: console > + > + # proxmox-backup-client garbage-collect > > .. note:: This command needs to read all existing backup index files > and touches the complete chunk-store. This can take a long time > depending on the number of chunks and the speed of the underlying > disks. > > -.. note:: The garbage collection will only remove chunks that haven't been used > - for at least one day (exactly 24h 5m). This grace period is necessary because > - chunks in use are marked by touching the chunk which updates the ``atime`` > - (access time) property. Filesystems are mounted with the ``relatime`` option > - by default. This results in a better performance by only updating the > - ``atime`` property if the last access has been at least 24 hours ago. The > - downside is that touching a chunk within these 24 hours will not always > - update its ``atime`` property. > - > - Chunks in the grace period will be logged at the end of the garbage > - collection task as *Pending removals*. > +The progress of the garbage collection will be displayed as shown in the example > +below: > > .. code-block:: console > > diff --git a/docs/maintenance.rst b/docs/maintenance.rst > index b6d42ecc2..01c24ea7d 100644 > --- a/docs/maintenance.rst > +++ b/docs/maintenance.rst > @@ -190,6 +190,8 @@ It's recommended to setup a schedule to ensure that unused space is cleaned up > periodically. For most setups a weekly schedule provides a good interval to > start. > > +.. _gc_background: > + > GC Background > ^^^^^^^^^^^^^ > > @@ -215,17 +217,28 @@ datastore or interfering with other backups. > The garbage collection (GC) process is performed per datastore and is split > into two phases: > > -- Phase one: Mark > - All index files are read, and the access time of the referred chunk files is > - updated. > - > -- Phase two: Sweep > - The task iterates over all chunks, checks their file access time, and if it > - is older than the cutoff time (i.e., the time when GC started, plus some > - headroom for safety and Linux file system behavior), the task knows that the > - chunk was neither referred to in any backup index nor part of any currently > - running backup that has no index to scan for. As such, the chunk can be > - safely deleted. > +- Phase one (Mark): > + > + All index files are read, and the access time (``atime``) of the referred pre-existing, but "referenced" fits better IMHO > + chunk files is updated. > + > +- Phase two (Sweep): > + > + The task iterates over all chunks and checks their file access time. If it is > + older than the cutoff time given by either 24 hours and 5 minutes after the > + start time of the garbage collection or the start time of the oldest backup > + writer instance, the garbage collection can consider the chunk as neither > + referenced by any backup index nor part of any currently running backup. > + Therefore, the chunk can be safely deleted. Should we re-order/simplify this, and first explain/define the cutoff, and then (in a separate sentence) describe how it is used? > + > + Chunks within the grace period will not be deleted and logged at the end of > + the garbage collection task as *Pending removals*. > + > +.. note:: The grace period for backup chunk removal is not arbitrary, but stems > + from the fact that filesystems are typically mounted with the ``relatime`` > + option by default. This results in better performance by only updating the > + ``atime`` property if a file has been modified since the last access or the > + last access has been at least 24 hours ago. > > Manually Starting GC > ^^^^^^^^^^^^^^^^^^^^ > -- > 2.39.5 > > > > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel > > > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel