From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id E38FB94667 for ; Wed, 10 Apr 2024 15:38:55 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id C25DB1158B for ; Wed, 10 Apr 2024 15:38:25 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 10 Apr 2024 15:38:24 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id A140343B6C for ; Wed, 10 Apr 2024 15:38:24 +0200 (CEST) Message-ID: Date: Wed, 10 Apr 2024 15:38:23 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Proxmox Backup Server development discussion , =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= References: <20240405130543.259220-1-h.duerr@proxmox.com> <20240405130543.259220-2-h.duerr@proxmox.com> <1712567154.6c6yxorn2q.astroid@yuna.none> Content-Language: en-US From: =?UTF-8?Q?Hannes_D=C3=BCrr?= In-Reply-To: <1712567154.6c6yxorn2q.astroid@yuna.none> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.366 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_ASCII_DIVIDERS 0.8 Email that uses ascii formatting dividers and possible spam tricks KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 1/3] docs: centralise and update garbage collection description X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Apr 2024 13:38:55 -0000 On 4/8/24 11:20, Fabian Grünbichler wrote: > On April 5, 2024 3:05 pm, Hannes Duerr wrote: >> The "backup client usage" chapter describes a grace period that is 24 >> hours and 5 minutes long, and unconnected to this a cut-off time is >> mentioned under "maintenance tasks", which leads to confusion. Therefore >> we summarise the entire description of garbage collection under >> "maintenance tasks" and link to it in the "backup client usage" chapter >> >> Signed-off-by: Hannes Duerr >> --- >> docs/backup-client.rst | 16 +++--------- >> docs/maintenance.rst | 57 ++++++++++++++++++++++++++++++------------ >> 2 files changed, 44 insertions(+), 29 deletions(-) >> >> diff --git a/docs/backup-client.rst b/docs/backup-client.rst >> index 00a1abbb..d015b844 100644 >> --- a/docs/backup-client.rst >> +++ b/docs/backup-client.rst >> @@ -735,25 +735,15 @@ command. It is recommended to carry out garbage collection on a regular basis. >> >> The garbage collection works in two phases. In the first phase, all >> data blocks that are still in use are marked. In the second phase, >> -unused data blocks are removed. >> +unused data blocks are removed. A more detailed description of the GC >> +can be found :ref:`here `. >> + >> >> .. note:: This command needs to read all existing backup index files >> and touches the complete chunk-store. This can take a long time >> depending on the number of chunks and the speed of the underlying >> disks. >> >> -.. note:: The garbage collection will only remove chunks that haven't been used >> - for at least one day (exactly 24h 5m). This grace period is necessary because >> - chunks in use are marked by touching the chunk which updates the ``atime`` >> - (access time) property. Filesystems are mounted with the ``relatime`` option >> - by default. This results in a better performance by only updating the >> - ``atime`` property if the last access has been at least 24 hours ago. The >> - downside is that touching a chunk within these 24 hours will not always >> - update its ``atime`` property. >> - >> - Chunks in the grace period will be logged at the end of the garbage >> - collection task as *Pending removals*. >> - >> .. code-block:: console >> >> # proxmox-backup-client garbage-collect >> diff --git a/docs/maintenance.rst b/docs/maintenance.rst >> index 6dbb6941..e25c8f19 100644 >> --- a/docs/maintenance.rst >> +++ b/docs/maintenance.rst >> @@ -171,8 +171,8 @@ It's recommended to setup a schedule to ensure that unused space is cleaned up >> periodically. For most setups a weekly schedule provides a good interval to >> start. >> >> -GC Background >> -^^^^^^^^^^^^^ >> +Overview >> +^^^^^^^^ >> >> In `Proxmox Backup`_ Server, backup data is not saved directly, but rather as >> chunks that are referred to by the indexes of each backup snapshot. This >> @@ -187,26 +187,51 @@ references to the same chunks on every snapshot deletion. Moreover, locking the >> entire datastore is not feasible because new backups would be blocked until the deletion >> process was complete. >> >> -Therefore, Proxmox Backup Server uses a garbage collection (GC) process to >> +Therefore, Proxmox Backup Server uses a `tracing garbage collection >> +`_ algorithm to >> identify and remove the unused backup chunks that are no longer needed by any >> -snapshot in the datastore. The GC process is designed to efficiently reclaim >> +snapshot in the datastore. The GC algorithm is designed to efficiently reclaim >> the space occupied by these chunks with low impact on the performance of the >> datastore or interfering with other backups. >> >> -The garbage collection (GC) process is performed per datastore and is split >> -into two phases: >> +The GC is performed per datastore and is split into two phases: >> >> -- Phase one: Mark >> - All index files are read, and the access time of the referred chunk files is >> - updated. >> +- Phase one - Mark: >> + >> + Read all index files and update the ``atime`` (access time) of the relevant >> + chunk files. > I'd replace "relevant" with "referenced" here, it is more concrete and > matches the terminology below > >> + >> +- Phase two - Sweep: >> + >> + Iterate over all chunks and check the ``atime`` of the files. If >> + the ``atime`` is older than the cut-off time, the chunk was neither >> + referenced in a backup index nor is it part of a running backup that >> + does not yet have an index to search. As such, safely remove the chunk. > nor was it recently created as part of a running backup task, but is not > referenced yet by any finished index file. Such chunks can be safely > removed since they are no longer needed. > > (Safely remove implies that we do some special removing that is safe ;)) > >> + >> + >> +Cut-off Time >> +^^^^^^^^^^^^ >> + >> +The GC only clears the chunks that were last accessed before the > s/clears/removes/ > >> +cut-off time. The cut-off time is determined by whichever is earlier: > is determined *at the start of the GC task* > > this is an important detail that helps understanding for more > technically inclined readers > >> + >> +- 24 hours and 5 minutes before the start of the garbage collection >> + due to the mounting of the data storage with ``relatime``, or > "before the start of .. due to" is a bit confusing. maybe: > > - 24 hours before the start of the garbage collection (to > account for the datastore potentially being mounted with ``relatime``). > >> + >> +- the start time of the oldest active backup job that has been running >> + for longer than 24 hours and 5 minutes at the beginning of the >> + garbage collection. This is necessary because the newly created >> + backup could refer to blocks, but the GC would not notice this as >> + there is no index of the backup that could be searched. > the whole "that has been" can be dropped. the cut off is determined by > whichever is earlier: > - now - 24h > - start time of oldest backup writer * > > with an extra 5m of safety margin added in any case - not just the 24h > one! > > - the start time of the oldest active backup job (to account for newly > written chunks that are not yet referenced by any finished snapshot) > > is a bit shorter and IMHO conveys the same information > >> + >> +Chunks accessed after the cut-off time are marked as *Pending removals* >> +by the GC as it cannot be certain whether they are still needed. > this is rather incomplete and a bit hard to parse as well. I'd replace > "accessed after" with "with an atime after". > > pending is actually: > - chunks with atime between the cut-off and the oldest writer (if one > exists) At this point i am slightly confused as we defined earlier: the cut-off is the start of oldest backup writer* (if one exists) Which would lead to the following: - chunks with atime between the cut-off (which is the start of the oldest existing writer) and the oldest writer (if one exists) which does not make any sense, where is my mistake ? > - chunks with atime between the cut-off and the start of GC (if no > writer exists at the start) > > this normally means chunks of snapshots which have been recently > forgotten/pruned. it can also mean freshly uploaded chunks of recently > aborted backup tasks. > >> + >> +.. Note:: Mounting a volume with ``relatime`` means that the ``atime`` >> + of the chunk files is not updated every time, but only when the >> + data has changed or the ``atime`` was before a certain time, >> + which is 24 hours by default. >> >> -- Phase two: Sweep >> - The task iterates over all chunks, checks their file access time, and if it >> - is older than the cutoff time (i.e., the time when GC started, plus some >> - headroom for safety and Linux file system behavior), the task knows that the >> - chunk was neither referred to in any backup index nor part of any currently >> - running backup that has no index to scan for. As such, the chunk can be >> - safely deleted. >> >> Manually Starting GC >> ^^^^^^^^^^^^^^^^^^^^ >> -- >> 2.39.2 >> >> >> >> _______________________________________________ >> pbs-devel mailing list >> pbs-devel@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel >> >> >> > > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel > >