From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 9F3291FF17C
	for <inbox@lore.proxmox.com>; Wed, 16 Apr 2025 08:12:25 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 839841D8E7;
	Wed, 16 Apr 2025 08:12:23 +0200 (CEST)
Message-ID: <b7278ea5-8d67-46b6-bb50-ce696da8c925@proxmox.com>
Date: Wed, 16 Apr 2025 08:12:18 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: pbs-devel@lists.proxmox.com
References: <20250415144930.426824-1-c.ebner@proxmox.com>
Content-Language: en-US, de-DE
From: Christian Ebner <c.ebner@proxmox.com>
In-Reply-To: <20250415144930.426824-1-c.ebner@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.028 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [datastore.rs]
Subject: Re: [pbs-devel] [PATCH v3 proxmox-backup] garbage collection: fix
 rare race in chunk marking phase
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: pbs-devel-bounces@lists.proxmox.com
Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com>

On 4/15/25 16:49, Christian Ebner wrote:
> During phase 1 of garbage collection referenced chunks are marked as
> in use by iterating over all index files and updating the atime on
> the chunks referenced by these.
> 
> In an edge case for long running garbage collection jobs, where a
> newly added snapshot (created after the start of GC) reused known
> chunks from a previous snapshot, but the previous snapshot index
> referencing them disappeared before the marking phase could reach
> that index (e.g. pruned because only 1 snapshot to be kept by
> retention setting), known chunks from that previous index file might
> not be marked (given that by none of the other index files it was
> marked).
> 
> Since commit 74361da8 ("garbage collection: generate index file list
> via datastore iterators") this is even less likely as now the
> iteration reads also index files added during phase 1, and
> therefore either the new or the previous index file will account for
> these chunks (the previous backup snapshot can only be prunded after
> the new one finished, since locked). There remains however a small
> race window between the reading of the snapshots in the backup group
> and the reading of the actual index files for marking.
> 
> Fix this race by:
> 1. Checking if the last snapshot of a group disappeared and if so
> 2. generate the list again, looking for new index files previously
>     not accounted for
> 3. To avoid possible endless looping, lock the group if the snapshot
>     list changed even after the 10th time (which will lead to
>     concurrent operations to this group failing).
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 2:
> - replace needs_retry variable by labeled loop operations
> - check if lock could actually be acquired in last restort branch, fail
>    otherwise
> - include catchall case for retry counter overrun
> - catch case where snapshot listing fails if group vanished, only fail
>    if the group still exists on the filesystem.
> 
>   pbs-datastore/src/datastore.rs | 116 +++++++++++++++++++++++----------
>   1 file changed, 80 insertions(+), 36 deletions(-)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index aa38e2ac1..8ef5c6860 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1143,47 +1143,91 @@ impl DataStore {
>               let namespace = namespace.context("iterating namespaces failed")?;
>               for group in arc_self.iter_backup_groups(namespace)? {
>                   let group = group.context("iterating backup groups failed")?;
> -                let mut snapshots = group.list_backups().context("listing snapshots failed")?;
> -                // Sort by snapshot timestamp to iterate over consecutive snapshots for each image.
> -                BackupInfo::sort_list(&mut snapshots, true);
> -                for snapshot in snapshots {
> -                    for file in snapshot.files {
> -                        worker.check_abort()?;
> -                        worker.fail_on_shutdown()?;
> -
> -                        let mut path = snapshot.backup_dir.full_path();
> -                        path.push(file);
> -
> -                        let index = match self.open_index_reader(&path)? {
> -                            Some(index) => index,
> -                            None => {
> -                                unprocessed_index_list.remove(&path);
> -                                continue;
> +
> +                // Avoid race between listing/marking of snapshots by GC and pruning the last
> +                // snapshot in the group, following a new snapshot creation. Otherwise known chunks
> +                // might only be referenced by the new snapshot, so it must be read as well.
> +                let mut retry_counter = 0;
> +                let mut processed_group_indices = HashSet::new();
> +                'retry: loop {
> +                    let _lock = match retry_counter {
> +                        0..=9 => None,
> +                        10 => Some(
> +                            group
> +                                .lock()
> +                                .context("exhausted retries and failed to lock group")?,
> +                        ),
> +                        _ => bail!("exhausted retires and unexpected counter overrun"),
> +                    };
> +
> +                    let mut snapshots = match group.list_backups() {
> +                        Ok(snapshots) => snapshots,
> +                        Err(err) => {
> +                            if group.exists() {
> +                                return Err(err).context("listing snapshots failed")?;
>                               }
> -                        };
> -                        self.index_mark_used_chunks(
> -                            index,
> -                            &path,
> -                            &mut chunk_lru_cache,
> -                            status,
> -                            worker,
> -                        )?;
> -
> -                        if !unprocessed_index_list.remove(&path) {
> -                            info!("Encountered new index file '{path:?}', increment total index file count");
> -                            index_count += 1;
> +                            break 'retry;
>                           }
> +                    };
> +
> +                    let snapshot_count = snapshots.len();
> +                    BackupInfo::sort_list(&mut snapshots, true);
> +                    for (count, snapshot) in snapshots.into_iter().enumerate() {

This can be further optimized and reduced in complexity by iterating in 
reverse order (from newest to oldest). That has the following advantages:
- Reduces the window between the list generation and reading the last 
snapshots in the group.
- Allows to get rid of the processed_group_indices set, as now either 
one has to continue the outer loop anyways if the last snapshot cannot 
be accessed
- The subset of chunks shared between consecutive snapshots remains 
unchanged, so the cache to avoid multiple atime updates will have an 
similar hit/miss ratio.

Will adapt this and send a new version of the patch.

> +                        for file in snapshot.files {
> +                            worker.check_abort()?;
> +                            worker.fail_on_shutdown()?;
> +
> +                            match ArchiveType::from_path(&file) {
> +                                Ok(ArchiveType::FixedIndex) | Ok(ArchiveType::DynamicIndex) => (),
> +                                Ok(ArchiveType::Blob) | Err(_) => continue,
> +                            };
> +
> +                            let mut path = snapshot.backup_dir.full_path();
> +                            path.push(file);
> +
> +                            // Avoid reprocessing of already seen index files on retry
> +                            if retry_counter > 0 && processed_group_indices.contains(&path) {
> +                                continue;
> +                            }
> +
> +                            let index = match self.open_index_reader(&path)? {
> +                                Some(index) => index,
> +                                None => {
> +                                    unprocessed_index_list.remove(&path);
> +                                    if count + 1 == snapshot_count {
> +                                        retry_counter += 1;
> +                                        continue 'retry;
> +                                    }
> +                                    continue;
> +                                }
> +                            };
> +                            self.index_mark_used_chunks(
> +                                index,
> +                                &path,
> +                                &mut chunk_lru_cache,
> +                                status,
> +                                worker,
> +                            )?;
> +                            processed_group_indices.insert(path.clone());
> +
> +                            if !unprocessed_index_list.remove(&path) {
> +                                info!("Encountered new index file '{path:?}', increment total index file count");
> +                                index_count += 1;
> +                            }
>   
> -                        let percentage = (processed_index_files + 1) * 100 / index_count;
> -                        if percentage > last_percentage {
> -                            info!(
> -                                "marked {percentage}% ({} of {index_count} index files)",
> -                                processed_index_files + 1,
> -                            );
> -                            last_percentage = percentage;
> +                            let percentage = (processed_index_files + 1) * 100 / index_count;
> +                            if percentage > last_percentage {
> +                                info!(
> +                                    "marked {percentage}% ({} of {index_count} index files)",
> +                                    processed_index_files + 1,
> +                                );
> +                                last_percentage = percentage;
> +                            }
> +                            processed_index_files += 1;
>                           }
> -                        processed_index_files += 1;
>                       }
> +
> +                    break;
>                   }
>               }
>           }



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel