From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 64D911FF143
	for <inbox@lore.proxmox.com>; Sat, 06 Jun 2026 10:42:27 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 452BA84A;
	Sat,  6 Jun 2026 10:42:26 +0200 (CEST)
Message-ID: <906d1390-9fca-4592-8c71-bc15e1ac3844@proxmox.com>
Date: Sat, 6 Jun 2026 10:42:20 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [RFC PATCH proxmox-backup] fix #7670: datastore: s3: allow for
 per-chunk file lock cleanup
To: Robert Obkircher <r.obkircher@proxmox.com>, pbs-devel@lists.proxmox.com
References: <20260604140919.97686-1-c.ebner@proxmox.com>
 <bc4a82e8-ac16-470f-ba6b-95a936b052be@proxmox.com>
 <706c6feb-8f7c-4f07-af09-c0df91f95469@proxmox.com>
 <3863ce57-42b9-4e43-a6ab-30e297732054@proxmox.com>
Content-Language: en-US, de-DE
From: Christian Ebner <c.ebner@proxmox.com>
In-Reply-To: <3863ce57-42b9-4e43-a6ab-30e297732054@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1780735300558
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.068 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
	URIBL_BLOCKED           0.001 ADMINISTRATOR NOTICE: The query to URIBL was
 blocked.  See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
 for more information. [proxmox.com]
Message-ID-Hash: UHHFCAWGVRJC4HO7OTJERJFKT4ZT6LLH
X-Message-ID-Hash: UHHFCAWGVRJC4HO7OTJERJFKT4ZT6LLH
X-MailFrom: c.ebner@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pbs-devel-owner@lists.proxmox.com>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Subscribe: <mailto:pbs-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pbs-devel-leave@lists.proxmox.com>

On 6/5/26 6:20 PM, Robert Obkircher wrote:
> 
> On 05.06.26 14:31, Christian Ebner wrote:
>> On 6/5/26 2:11 PM, Robert Obkircher wrote:
>>>
>>> On 04.06.26 16:09, Christian Ebner wrote:
>>>> Per-chunk file locks are located on a tmpfs but never cleaned up to
>>>> avoid TOCTOU race conditions. Therefore lock files can accumulate
>>>> over time, the memory required to store the inodes finally lead to
>>>> OOM conditions if the system is not rebooted for a long time or a
>>>> high number of different chunks is written to the s3 backed
>>>> datastore.
>>>>
>>>> To fix this, use the double stating strategy already implemented by
>>>> pbs_datastore::backup_info::lock_helper(), but adapt it so that the
>>>> lock file is cleaned up before unlocking. Since after file removal
>>>> the lock can be acquired by a different thread/process, the file lock
>>>> must also be dropped immediately without performing any other
>>>> critical operation. To assure this, a ChunkLockGuard is implemented
>>>> which removes the file and drops the file descriptor by implementing
>>>> the Drop trait.
>>> One potential problem is that drop is not guaranteed to run if the
>>> process aborts, so this could still slowly leak files over time.
>>
>> Good catch, maybe such files could be cleaned up by spawning a
>> cleanup task triggered by chunk store instantiation. It could then
>> iterate all the lock direntries and perform the same lock -> cleanup
>> lock logic but without timeout, removing all lockfiles not currently
>> being held at the cost of some IO for that, unfortunately. OTOH the
>> number of lockfiles should be not to dramatic for that case, so IO
>> should not be that dramatic and could be controlled by delays.
>>
>> Other ideas?
>> er ideas?
> 
> With a regular cleanup task we could also get rid of the unlink on
> drop, which would probably make repeated lock acquisition a bit cheaper.
> 
> But I'm not sure if it's worth the complexity.

Cleanup could be done as part of GC phase 3 on s3 backed datastores 
though, by that it should not introduce to much complexity. But probably 
not the best way forwards, agreed.

> 
>>>> [...]
>>>> ---
>>>> Sending this as RFC in case there are ideas for a different solution
>>>> to the problem at hand which might be preferable.
> 
> Besides shared memory or a coordinator process, we could also just store a list of locked files in the active operations file.

I do not think that will scale very well neither? Not performance wise 
nor usage wise, after all this would need to keep the list for all 
active operations on each chunk file up-to-date?

> Flock on the actual chunk file should work as well. We only need to coordinate on the local machine, so flock should be fine even if it is a remote file system, no?

There are some caveats with that IIRC, especially with respect to 
latency and proper support from e.g. older NFS implementations. Further, 
the existence of a chunk file is currently used as marker for the chunk 
to being present on the S3 backend (at least one successful and 
acknowledge upload). For this to work that would need to be reworked as 
well and is probably not backwards compatible.

> Do we actually need the locks at all? Wouldn't it be sufficient if we
> prevent duplicate S3 uploads on a per-process basis?

The locking must protect against TOCTOU races, not only with upload but 
also chunk insertion in the local cache, GC, renaming of bad chunks for 
failed verifies, ...

The initial implementation did not perform the locking but relied on 
atomic operations for local marker files only, causing quite some churn 
due to races and resulting in the current implementation [1,2].

[1] 
https://lore.proxmox.com/pbs-devel/20251114131901.441650-1-c.ebner@proxmox.com/
[2] 
https://lore.proxmox.com/pbs-devel/20251120100523.12147-1-c.ebner@proxmox.com/