From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pbs-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 9F3291FF17C for <inbox@lore.proxmox.com>; Wed, 16 Apr 2025 08:12:25 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 839841D8E7; Wed, 16 Apr 2025 08:12:23 +0200 (CEST) Message-ID: <b7278ea5-8d67-46b6-bb50-ce696da8c925@proxmox.com> Date: Wed, 16 Apr 2025 08:12:18 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: pbs-devel@lists.proxmox.com References: <20250415144930.426824-1-c.ebner@proxmox.com> Content-Language: en-US, de-DE From: Christian Ebner <c.ebner@proxmox.com> In-Reply-To: <20250415144930.426824-1-c.ebner@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.028 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [datastore.rs] Subject: Re: [pbs-devel] [PATCH v3 proxmox-backup] garbage collection: fix rare race in chunk marking phase X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion <pbs-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/> List-Post: <mailto:pbs-devel@lists.proxmox.com> List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com> On 4/15/25 16:49, Christian Ebner wrote: > During phase 1 of garbage collection referenced chunks are marked as > in use by iterating over all index files and updating the atime on > the chunks referenced by these. > > In an edge case for long running garbage collection jobs, where a > newly added snapshot (created after the start of GC) reused known > chunks from a previous snapshot, but the previous snapshot index > referencing them disappeared before the marking phase could reach > that index (e.g. pruned because only 1 snapshot to be kept by > retention setting), known chunks from that previous index file might > not be marked (given that by none of the other index files it was > marked). > > Since commit 74361da8 ("garbage collection: generate index file list > via datastore iterators") this is even less likely as now the > iteration reads also index files added during phase 1, and > therefore either the new or the previous index file will account for > these chunks (the previous backup snapshot can only be prunded after > the new one finished, since locked). There remains however a small > race window between the reading of the snapshots in the backup group > and the reading of the actual index files for marking. > > Fix this race by: > 1. Checking if the last snapshot of a group disappeared and if so > 2. generate the list again, looking for new index files previously > not accounted for > 3. To avoid possible endless looping, lock the group if the snapshot > list changed even after the 10th time (which will lead to > concurrent operations to this group failing). > > Signed-off-by: Christian Ebner <c.ebner@proxmox.com> > --- > changes since version 2: > - replace needs_retry variable by labeled loop operations > - check if lock could actually be acquired in last restort branch, fail > otherwise > - include catchall case for retry counter overrun > - catch case where snapshot listing fails if group vanished, only fail > if the group still exists on the filesystem. > > pbs-datastore/src/datastore.rs | 116 +++++++++++++++++++++++---------- > 1 file changed, 80 insertions(+), 36 deletions(-) > > diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs > index aa38e2ac1..8ef5c6860 100644 > --- a/pbs-datastore/src/datastore.rs > +++ b/pbs-datastore/src/datastore.rs > @@ -1143,47 +1143,91 @@ impl DataStore { > let namespace = namespace.context("iterating namespaces failed")?; > for group in arc_self.iter_backup_groups(namespace)? { > let group = group.context("iterating backup groups failed")?; > - let mut snapshots = group.list_backups().context("listing snapshots failed")?; > - // Sort by snapshot timestamp to iterate over consecutive snapshots for each image. > - BackupInfo::sort_list(&mut snapshots, true); > - for snapshot in snapshots { > - for file in snapshot.files { > - worker.check_abort()?; > - worker.fail_on_shutdown()?; > - > - let mut path = snapshot.backup_dir.full_path(); > - path.push(file); > - > - let index = match self.open_index_reader(&path)? { > - Some(index) => index, > - None => { > - unprocessed_index_list.remove(&path); > - continue; > + > + // Avoid race between listing/marking of snapshots by GC and pruning the last > + // snapshot in the group, following a new snapshot creation. Otherwise known chunks > + // might only be referenced by the new snapshot, so it must be read as well. > + let mut retry_counter = 0; > + let mut processed_group_indices = HashSet::new(); > + 'retry: loop { > + let _lock = match retry_counter { > + 0..=9 => None, > + 10 => Some( > + group > + .lock() > + .context("exhausted retries and failed to lock group")?, > + ), > + _ => bail!("exhausted retires and unexpected counter overrun"), > + }; > + > + let mut snapshots = match group.list_backups() { > + Ok(snapshots) => snapshots, > + Err(err) => { > + if group.exists() { > + return Err(err).context("listing snapshots failed")?; > } > - }; > - self.index_mark_used_chunks( > - index, > - &path, > - &mut chunk_lru_cache, > - status, > - worker, > - )?; > - > - if !unprocessed_index_list.remove(&path) { > - info!("Encountered new index file '{path:?}', increment total index file count"); > - index_count += 1; > + break 'retry; > } > + }; > + > + let snapshot_count = snapshots.len(); > + BackupInfo::sort_list(&mut snapshots, true); > + for (count, snapshot) in snapshots.into_iter().enumerate() { This can be further optimized and reduced in complexity by iterating in reverse order (from newest to oldest). That has the following advantages: - Reduces the window between the list generation and reading the last snapshots in the group. - Allows to get rid of the processed_group_indices set, as now either one has to continue the outer loop anyways if the last snapshot cannot be accessed - The subset of chunks shared between consecutive snapshots remains unchanged, so the cache to avoid multiple atime updates will have an similar hit/miss ratio. Will adapt this and send a new version of the patch. > + for file in snapshot.files { > + worker.check_abort()?; > + worker.fail_on_shutdown()?; > + > + match ArchiveType::from_path(&file) { > + Ok(ArchiveType::FixedIndex) | Ok(ArchiveType::DynamicIndex) => (), > + Ok(ArchiveType::Blob) | Err(_) => continue, > + }; > + > + let mut path = snapshot.backup_dir.full_path(); > + path.push(file); > + > + // Avoid reprocessing of already seen index files on retry > + if retry_counter > 0 && processed_group_indices.contains(&path) { > + continue; > + } > + > + let index = match self.open_index_reader(&path)? { > + Some(index) => index, > + None => { > + unprocessed_index_list.remove(&path); > + if count + 1 == snapshot_count { > + retry_counter += 1; > + continue 'retry; > + } > + continue; > + } > + }; > + self.index_mark_used_chunks( > + index, > + &path, > + &mut chunk_lru_cache, > + status, > + worker, > + )?; > + processed_group_indices.insert(path.clone()); > + > + if !unprocessed_index_list.remove(&path) { > + info!("Encountered new index file '{path:?}', increment total index file count"); > + index_count += 1; > + } > > - let percentage = (processed_index_files + 1) * 100 / index_count; > - if percentage > last_percentage { > - info!( > - "marked {percentage}% ({} of {index_count} index files)", > - processed_index_files + 1, > - ); > - last_percentage = percentage; > + let percentage = (processed_index_files + 1) * 100 / index_count; > + if percentage > last_percentage { > + info!( > + "marked {percentage}% ({} of {index_count} index files)", > + processed_index_files + 1, > + ); > + last_percentage = percentage; > + } > + processed_index_files += 1; > } > - processed_index_files += 1; > } > + > + break; > } > } > } _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel