From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 26C761FF164
	for <inbox@lore.proxmox.com>; Fri,  6 Jun 2025 13:12:40 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 4E60CAACE;
	Fri,  6 Jun 2025 13:13:00 +0200 (CEST)
Message-ID: <cae15312-019a-4a95-8f4c-88c9a25f6b54@proxmox.com>
Date: Fri, 6 Jun 2025 13:12:26 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>, Christian Ebner <c.ebner@proxmox.com>
References: <20250529143207.694497-1-c.ebner@proxmox.com>
Content-Language: de-AT, en-US
From: Lukas Wagner <l.wagner@proxmox.com>
In-Reply-To: <20250529143207.694497-1-c.ebner@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.018 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [fontawesome.com]
Subject: Re: [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup
 00/42] S3 storage backend for datastores
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pbs-devel-bounces@lists.proxmox.com
Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com>



On  2025-05-29 16:31, Christian Ebner wrote:
> Disclaimer: These patches are in a development state and are not
> intended for production use.
> 
> This patch series aims to add S3 compatible object stores as storage
> backend for PBS datastores. A PBS local cache store using the regular
> datastore layout is used for faster operation, bypassing requests to
> the S3 api when possible. Further, the local cache store allows to
> keep frequently used chunks and is used to avoid expensive metadata
> updates on the object store, e.g. by using local marker file during
> garbage collection.
> 
> Backups are created by upload chunks to the corresponding S3 bucket,
> while keeping the index files in the local cache store, on backup
> finish, the snapshot metadata are persisted to the S3 storage backend.
> 
> Snapshot restores read chunks preferably from the local cache store,
> downloading and insterting them if not present from the S3 object
> store.
> 
> Listing and snapsoht metadata operation currently rely soly on the
> local cache store, with the intention to provide a mechanism to
> re-sync and merge with object stored on the S3 backend if requested.
> 
> Sending this patch series as RFC to get some initial feedback, mostly
> on the S3 client implementation part and the corresponding
> configuration integration with PBS, which is already in an advanced
> stage and warants initial review and real world testing.
> 
> Datastore operations on the S3 backend are still work in progress,
> but feedback on that is appreciated very much as well.
> 
> Among the open points still being worked on are:
> - Consistency between local cache and S3 store.
> - Sync and merge of namespace, group snapshot and index files when
>   required or requested.
> - Advanced packing mechanism for chunks to significantly reduce the
>   number of api requests and therefore be more cost effective.
> - Reduction of in-memory copies for chunks/blobs and recalculation of
>   checksums.
> 

Had some off-list discussions with Christian about a couple of aspects of this
version of the series, here is a quick summary:

With regards to the 'Create Datastore' dialog:
In the current version, the S3 bucket can be selected under 'Advanced'. This might be
a bit hard to find for some users, so I suggested revising the dialog in general.
For example, perhaps we could start by having the user select a type
right away (Normal / Removable / Existing / S3-backed), and then show or hide
the required UI elements accordingly. For the S3-backed store specifically,
my intuitive expectation would be to first select the bucket, and then, as a second
step, choose the location for the local cache.

If we still want to keep S3 a bit hidden for now, we could either add a global setting
or an option within the dialog under 'Advanced' to opt into the experimental S3 feature,
or something along those lines.

Also I mentioned that the 'trash-can' icon - albeit being a bucket - might not be the best
fit for 'S3 Buckets', because it creates the association of 'trash' or 'throwing something away'.
I suggested fa-cloud-upload [1] instead for now, which should be quite fitting for a 'syncing
something to the cloud' feature.

Furthermore, I suggested that maybe the 'bucket' should be a property of the
datastore config, not of the s3 config. That way, the s3 config contains only the
connection info and credentials, which make it easy to use the same s3 config for
multiple datastores which use different buckets as as a backing storage.

Last, we probably should encode the name of the datastore into the key
of the S3 object, unless we want a strict 1:1 relationship between bucket and
datastore. Maybe it could even make sense to allow the user to set custom
prefixes for objects, in case they want PBSs objects not to be stored at the
top level of the bucket.

[1] https://fontawesome.com/v4/icon/cloud-upload

-- 
- Lukas



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel