public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup v2 1/3] GC: S3: reduce number of open FDs for to-be-deleted objects
Date: Fri, 21 Nov 2025 11:18:41 +0100	[thread overview]
Message-ID: <20251121101849.463119-2-f.gruenbichler@proxmox.com> (raw)
In-Reply-To: <20251121101849.463119-1-f.gruenbichler@proxmox.com>

listing objects on the S3 side will return batches containing up to 1000
objects. previously, if all those objects were garbage, phase2 would open and
hold the lock file for each of them and delete them using a single call. this
can easily run afoul the maximum number of open files allowed by the default
process limits, which is 1024.

converting the code to instead delete batches of (at most) 100 objects should
alleviate this issue until bumping the limit is deemed safe, while (in the
worst case) causing 10x the number of delete requests.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Christian Ebner <c.ebner@proxmox.com>
Tested-by: Christian Ebner <c.ebner@proxmox.com>
---

Notes:
    v2: >= LIMIT instead of > LIMIT, thanks Chris

 pbs-datastore/src/datastore.rs | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 0a5179230..09ec23fc4 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -58,6 +58,8 @@ pub const S3_DATASTORE_IN_USE_MARKER: &str = ".in-use";
 const NAMESPACE_MARKER_FILENAME: &str = ".namespace";
 // s3 put request times out after upload_size / 1 Kib/s, so about 2.3 hours for 8 MiB
 const CHUNK_LOCK_TIMEOUT: Duration = Duration::from_secs(3 * 60 * 60);
+// s3 deletion batch size to avoid 1024 open files soft limit
+const S3_DELETE_BATCH_LIMIT: usize = 100;
 
 /// checks if auth_id is owner, or, if owner is a token, if
 /// auth_id is the user of the token
@@ -1657,7 +1659,7 @@ impl DataStore {
                 proxmox_async::runtime::block_on(s3_client.list_objects_v2(&prefix, None))
                     .context("failed to list chunk in s3 object store")?;
 
-            let mut delete_list = Vec::with_capacity(1000);
+            let mut delete_list = Vec::with_capacity(S3_DELETE_BATCH_LIMIT);
             loop {
                 for content in list_bucket_result.contents {
                     let (chunk_path, digest, bad) =
@@ -1716,8 +1718,29 @@ impl DataStore {
                     }
 
                     chunk_count += 1;
+
+                    // drop guard because of async S3 call below
+                    drop(_guard);
+
+                    // limit pending deletes to avoid holding too many chunk flocks
+                    if delete_list.len() >= S3_DELETE_BATCH_LIMIT {
+                        let delete_objects_result = proxmox_async::runtime::block_on(
+                            s3_client.delete_objects(
+                                &delete_list
+                                    .iter()
+                                    .map(|(key, _)| key.clone())
+                                    .collect::<Vec<S3ObjectKey>>(),
+                            ),
+                        )?;
+                        if let Some(_err) = delete_objects_result.error {
+                            bail!("failed to delete some objects");
+                        }
+                        // release all chunk guards
+                        delete_list.clear();
+                    }
                 }
 
+                // delete the last batch of objects, if there are any remaining
                 if !delete_list.is_empty() {
                     let delete_objects_result = proxmox_async::runtime::block_on(
                         s3_client.delete_objects(
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-11-21 10:18 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-21 10:18 [pbs-devel] [PATCH proxmox-backup v2 0/3] reduce GC S3 locking Fabian Grünbichler
2025-11-21 10:18 ` Fabian Grünbichler [this message]
2025-11-21 10:18 ` [pbs-devel] [PATCH proxmox-backup v2 2/3] GC: S3: factor out batch object deletion Fabian Grünbichler
2025-11-21 10:18 ` [pbs-devel] [PATCH proxmox-backup v2 3/3] GC: S3: phase2: do not force delete for every list iteration Fabian Grünbichler
2025-11-21 11:28   ` Christian Ebner
2025-11-21 11:54   ` [pbs-devel] [PATCH RESEND " Fabian Grünbichler
2025-11-21 12:04     ` Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251121101849.463119-2-f.gruenbichler@proxmox.com \
    --to=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal