From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id B00571FF13B for ; Mon, 08 Jun 2026 11:10:06 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E26E717A11; Mon, 8 Jun 2026 11:10:05 +0200 (CEST) Message-ID: Date: Mon, 8 Jun 2026 11:10:01 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH proxmox-backup] fix #7670: datastore: s3: allow for per-chunk file lock cleanup To: Christian Ebner , pbs-devel@lists.proxmox.com References: <20260604140919.97686-1-c.ebner@proxmox.com> <706c6feb-8f7c-4f07-af09-c0df91f95469@proxmox.com> <3863ce57-42b9-4e43-a6ab-30e297732054@proxmox.com> <906d1390-9fca-4592-8c71-bc15e1ac3844@proxmox.com> Content-Language: en-US, de-AT From: Robert Obkircher In-Reply-To: <906d1390-9fca-4592-8c71-bc15e1ac3844@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1780909758322 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.049 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Message-ID-Hash: WIWUFEEQ4NDGGJSCKKX23APPOQWBIQT7 X-Message-ID-Hash: WIWUFEEQ4NDGGJSCKKX23APPOQWBIQT7 X-MailFrom: r.obkircher@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 06.06.26 10:41, Christian Ebner wrote: > On 6/5/26 6:20 PM, Robert Obkircher wrote: >> [..] >> >> With a regular cleanup task we could also get rid of the unlink on >> drop, which would probably make repeated lock acquisition a bit >> cheaper. >> >> But I'm not sure if it's worth the complexity. > > Cleanup could be done as part of GC phase 3 on s3 backed datastores > though, by that it should not introduce to much complexity. But > probably not the best way forwards, agreed.  GC might not reach phase 3 once you are out of memory. >> >>>>> [...] >>>>> --- >>>>> Sending this as RFC in case there are ideas for a different >>>>> solution >>>>> to the problem at hand which might be preferable. >> >> Besides shared memory or a coordinator process, we could also just >> store a list of locked files in the active operations file. > > I do not think that will scale very well neither? Not performance > wise nor usage wise, after all this would need to keep the list for > all active operations on each chunk file up-to-date?  Yeah, active operations probably wouldn't work. > >> Flock on the actual chunk file should work as well. We only need to >> coordinate on the local machine, so flock should be fine even if it >> is a remote file system, no? > > There are some caveats with that IIRC, especially with respect to > latency and proper support from e.g. older NFS implementations. > Further, the existence of a chunk file is currently used as marker > for the chunk to being present on the S3 backend (at least one > successful and acknowledge upload). For this to work that would need > to be reworked as well and is probably not backwards compatible.  Right, I somehow thought they use local locks, but that is not the case. > >> Do we actually need the locks at all? Wouldn't it be sufficient if we >> prevent duplicate S3 uploads on a per-process basis? > > The locking must protect against TOCTOU races, not only with upload > but also chunk insertion in the local cache, GC, renaming of bad > chunks for failed verifies, ...  > > The initial implementation did not perform the locking but relied on > atomic operations for local marker files only, causing quite some > churn due to races and resulting in the current implementation [1,2]. > > [1] > https://lore.proxmox.com/pbs-devel/20251114131901.441650-1-c.ebner@proxmox.com/ > [2] > https://lore.proxmox.com/pbs-devel/20251120100523.12147-1-c.ebner@proxmox.com/  Thank you for the links.