From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pbs-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 1DDD51FF17F for <inbox@lore.proxmox.com>; Mon, 19 May 2025 13:47:40 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 343308723; Mon, 19 May 2025 13:47:37 +0200 (CEST) From: Christian Ebner <c.ebner@proxmox.com> To: pbs-devel@lists.proxmox.com Date: Mon, 19 May 2025 13:46:36 +0200 Message-Id: <20250519114640.303640-36-c.ebner@proxmox.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250519114640.303640-1-c.ebner@proxmox.com> References: <20250519114640.303640-1-c.ebner@proxmox.com> MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.032 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [RFC proxmox-backup 35/39] api: backup: use local datastore cache on S3 backend chunk upload X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion <pbs-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/> List-Post: <mailto:pbs-devel@lists.proxmox.com> List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com> Take advantage of the local datastore cache to avoid re-uploading of already known chunks. This not only helps improve the backup/upload speeds, but also avoids additionally costs by reducing the number of requests and transferred payload data to the S3 object store api. If the cache is present, lookup if it contains the chunk, skipping upload altogether if it is. Otherwise, upload the chunk into memory, upload it to the S3 object store api and insert it into the local datastore cache. Signed-off-by: Christian Ebner <c.ebner@proxmox.com> --- src/api2/backup/upload_chunk.rs | 47 ++++++++++++++++++++++++++++++--- 1 file changed, 43 insertions(+), 4 deletions(-) diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs index 59f9ca558..1d82936e6 100644 --- a/src/api2/backup/upload_chunk.rs +++ b/src/api2/backup/upload_chunk.rs @@ -248,10 +248,49 @@ async fn upload_to_backend( UploadChunk::new(req_body, datastore, digest, size, encoded_size).await } DatastoreBackend::S3(s3_client) => { - let is_duplicate = match s3_client.put_object(digest.into(), req_body).await? { - PutObjectResponse::PreconditionFailed => true, - PutObjectResponse::NeedsRetry => bail!("concurrent operation, reupload required"), - PutObjectResponse::Success(_content) => false, + if datastore.cache_contains(&digest) { + return Ok((digest, size, encoded_size, true)); + } + // TODO: Avoid this altoghether? put_object already loads the whole + // chunk into memory and does also hashing and crc32sum calculation + // for s3 request. + // + // Load chunk data into memory, need to write it twice, + // to S3 object store and local cache store. + let data = req_body + .map_err(Error::from) + .try_fold(Vec::new(), |mut acc, chunk| { + acc.extend_from_slice(&chunk); + future::ok::<_, Error>(acc) + }) + .await?; + + if encoded_size != data.len() as u32 { + bail!( + "got blob with unexpected length ({encoded_size} != {})", + data.len() + ); + } + + let upload_body = hyper::Body::from(data.clone()); + let upload = s3_client.put_object(digest.into(), upload_body); + let cache_insert = tokio::task::spawn_blocking(move || { + let chunk = DataBlob::from_raw(data)?; + datastore.cache_insert(&digest, &chunk) + }); + let is_duplicate = match futures::join!(upload, cache_insert) { + (Ok(upload_response), Ok(Ok(()))) => match upload_response { + PutObjectResponse::PreconditionFailed => true, + PutObjectResponse::NeedsRetry => { + bail!("concurrent operation, reupload required") + } + PutObjectResponse::Success(_content) => false, + }, + (Ok(_), Ok(Err(err))) => return Err(err.context("chunk cache insert failed")), + (Ok(_), Err(err)) => { + return Err(Error::from(err).context("chunk cache insert task failed")) + } + (Err(err), _) => return Err(err.context("chunk upload failed")), }; Ok((digest, size, encoded_size, is_duplicate)) } -- 2.39.5 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel