From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 2CF161FF185 for ; Mon, 21 Jul 2025 17:37:18 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 8042B1425B; Mon, 21 Jul 2025 17:38:28 +0200 (CEST) Message-ID: <0fa63092-3e6b-4f1c-a9bd-59bbb5cfc117@proxmox.com> Date: Mon, 21 Jul 2025 17:37:30 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Lukas Wagner , Proxmox Backup Server development discussion References: <20250719125035.9926-1-c.ebner@proxmox.com> Content-Language: en-US, de-DE From: Christian Ebner In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1753112243941 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.081 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_LOTSOFHASH 0.25 Emails with lots of hash-like gibberish RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox{, -backup} v9 00/49] fix #2943: S3 storage backend for datastores X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" On 7/21/25 5:05 PM, Lukas Wagner wrote: > Retested these patches on the latest master branch(es). > > Retested basic backups, sync jobs, verification, GC, pruning, etc. > > This time I tried to focus more on different failure scenarios, e.g. a > failing connection to the S3 server during different operations. > > Here's what I found, most of these issues I did already discuss and > debug off-list with @Chris: > > 1.) > > When doing an S3 Refresh and PBS cannot connect to S3, a `tmp_xxxxxxx` > directory is left over in the local datastore directory. After clearing > S3 Refresh maintenance mode (or doing a successful S3 refresh), GC jobs > will fail because they cannot access this left-over directory (it is > owned by root:root). > AFAIK Chris has already prepared a fix for this. Will be fixed in the next version of the patch series, thanks! > > 2.) > > I backed up some VMs to my local MinIO server which ran out of disk > space during backup. Since even delete operations failed in this > scenario, PBS could not clean up the snapshot directory, which was > left over after this failed backup. In some instances, the snapshot > directory was completely empty, in some other case two blobs were > written, but the fidx files were missing: > > root@pbs-s3:/s3-store/ns/pali/vm# ls 160/2025-07-21T12\:51\:44Z/ > fw.conf.blob qemu-server.conf.blob > root@pbs-s3:/s3-store/ns/pali/vm# ls 165/ > 2025-07-21T12:52:42Z/ owner > root@pbs-s3:/s3-store/ns/pali/vm# ls 165/2025-07-21T12\:52\:42Z/ > root@pbs-s3:/s3-store/ns/pali/vm# > > I could fix this by doing a "S3 Refresh" and then manually deleting the > affected snapshot under the "Content" view - something that could be > very annoying if one has hundred/thousands of snapshots, so I think we > need some form of automatic cleanup for fragments from incomplete/failed > backups. After all, I'm pretty sure that one could end up in a similar > situation by just cutting the network connection to the S3 server at the > right moment in time. As discussed already a bit off-list, this would indeed be nice to have, however I see no way of doing this consistently atm without manual user interaction. In your tests cleanup of objects from the s3 backend failed because of the out-of-memory, so the user needs to fix that first. And automatic cleanup of fragments from the S3 store after a connection loss might be doable during garbage collection, or verification, I will however have to think this through in detail. So best for a followup. > > 3.) > > Cut the connection to my MinIO server during a verification job. > The task log was spammed by the following messages: > > 2025-07-21T16:06:51+02:00: failed to copy corrupt chunk on s3 backend: 747835eb948591da7c4ebe892a9eb28c0daa8978bb80b70350f5b07225a1b9b0 > 2025-07-21T16:06:51+02:00: corrupted chunk renamed to "/s3-store/.chunks/7478/747835eb948591da7c4ebe892a9eb28c0daa8978bb80b70350f5b07225a1b9b0.0.bad" > 2025-07-21T16:06:51+02:00: "can't verify chunk, load failed - client error (Connect)" > 2025-07-21T16:06:51+02:00: failed to copy corrupt chunk on s3 backend: 5680458c0dba35dd1b528b5e38d32d410aee285f4d0328bbd8814fb5eb129aaf > 2025-07-21T16:06:51+02:00: corrupted chunk renamed to "/s3-store/.chunks/5680/5680458c0dba35dd1b528b5e38d32d410aee285f4d0328bbd8814fb5eb129aaf.0.bad" > > While not really catastrophic, since these chunks would then just be > refetched from S3 on the next access, this probably should be handled > better/more gracefully. Fixed this as well already for the upcoming v10 of the patches, thanks! > > One thing that I spotted in the documentation was the following: > > proxmox-backup-manager s3 client create my-s3-client --secrets-id my-s3-client ... > > The user has to specify the client ID twice, one for the regular config, > one for the secret config. This was implemented this way due to how > parameter flattening for API type structs work. I discussed this > with @Chris and suggested another approach, one that works without > duplicating the ID to hopefully make the UX a bit nicer. Same, this will be fixed with the next iteration. > Apart from these issues everything seemed to work fine. > > Tested-by: Lukas Wagner _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel