From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id DE33F1FF185 for ; Mon, 21 Jul 2025 17:04:03 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 61A9F133DD; Mon, 21 Jul 2025 17:05:14 +0200 (CEST) Mime-Version: 1.0 Date: Mon, 21 Jul 2025 17:05:10 +0200 Message-Id: From: "Lukas Wagner" To: "Proxmox Backup Server development discussion" , "Christian Ebner" X-Mailer: aerc 0.20.1-0-g2ecb8770224a References: <20250719125035.9926-1-c.ebner@proxmox.com> In-Reply-To: <20250719125035.9926-1-c.ebner@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1753110303345 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.107 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_LOTSOFHASH 0.25 Emails with lots of hash-like gibberish RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox{, -backup} v9 00/49] fix #2943: S3 storage backend for datastores X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" Retested these patches on the latest master branch(es). Retested basic backups, sync jobs, verification, GC, pruning, etc. This time I tried to focus more on different failure scenarios, e.g. a failing connection to the S3 server during different operations. Here's what I found, most of these issues I did already discuss and debug off-list with @Chris: 1.) When doing an S3 Refresh and PBS cannot connect to S3, a `tmp_xxxxxxx` directory is left over in the local datastore directory. After clearing S3 Refresh maintenance mode (or doing a successful S3 refresh), GC jobs will fail because they cannot access this left-over directory (it is owned by root:root). AFAIK Chris has already prepared a fix for this. 2.) I backed up some VMs to my local MinIO server which ran out of disk space during backup. Since even delete operations failed in this scenario, PBS could not clean up the snapshot directory, which was left over after this failed backup. In some instances, the snapshot directory was completely empty, in some other case two blobs were written, but the fidx files were missing: root@pbs-s3:/s3-store/ns/pali/vm# ls 160/2025-07-21T12\:51\:44Z/ fw.conf.blob qemu-server.conf.blob root@pbs-s3:/s3-store/ns/pali/vm# ls 165/ 2025-07-21T12:52:42Z/ owner root@pbs-s3:/s3-store/ns/pali/vm# ls 165/2025-07-21T12\:52\:42Z/ root@pbs-s3:/s3-store/ns/pali/vm# I could fix this by doing a "S3 Refresh" and then manually deleting the affected snapshot under the "Content" view - something that could be very annoying if one has hundred/thousands of snapshots, so I think we need some form of automatic cleanup for fragments from incomplete/failed backups. After all, I'm pretty sure that one could end up in a similar situation by just cutting the network connection to the S3 server at the right moment in time. 3.) Cut the connection to my MinIO server during a verification job. The task log was spammed by the following messages: 2025-07-21T16:06:51+02:00: failed to copy corrupt chunk on s3 backend: 747835eb948591da7c4ebe892a9eb28c0daa8978bb80b70350f5b07225a1b9b0 2025-07-21T16:06:51+02:00: corrupted chunk renamed to "/s3-store/.chunks/7478/747835eb948591da7c4ebe892a9eb28c0daa8978bb80b70350f5b07225a1b9b0.0.bad" 2025-07-21T16:06:51+02:00: "can't verify chunk, load failed - client error (Connect)" 2025-07-21T16:06:51+02:00: failed to copy corrupt chunk on s3 backend: 5680458c0dba35dd1b528b5e38d32d410aee285f4d0328bbd8814fb5eb129aaf 2025-07-21T16:06:51+02:00: corrupted chunk renamed to "/s3-store/.chunks/5680/5680458c0dba35dd1b528b5e38d32d410aee285f4d0328bbd8814fb5eb129aaf.0.bad" While not really catastrophic, since these chunks would then just be refetched from S3 on the next access, this probably should be handled better/more gracefully. One thing that I spotted in the documentation was the following: proxmox-backup-manager s3 client create my-s3-client --secrets-id my-s3-client ... The user has to specify the client ID twice, one for the regular config, one for the secret config. This was implemented this way due to how parameter flattening for API type structs work. I discussed this with @Chris and suggested another approach, one that works without duplicating the ID to hopefully make the UX a bit nicer. Apart from these issues everything seemed to work fine. Tested-by: Lukas Wagner _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel