* [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores
@ 2025-05-29 14:31 Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox/bookworm-stable 1/42] pbs-api-types: add types for S3 client configs and secrets Christian Ebner
` (43 more replies)
0 siblings, 44 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Disclaimer: These patches are in a development state and are not
intended for production use.
This patch series aims to add S3 compatible object stores as storage
backend for PBS datastores. A PBS local cache store using the regular
datastore layout is used for faster operation, bypassing requests to
the S3 api when possible. Further, the local cache store allows to
keep frequently used chunks and is used to avoid expensive metadata
updates on the object store, e.g. by using local marker file during
garbage collection.
Backups are created by upload chunks to the corresponding S3 bucket,
while keeping the index files in the local cache store, on backup
finish, the snapshot metadata are persisted to the S3 storage backend.
Snapshot restores read chunks preferably from the local cache store,
downloading and insterting them if not present from the S3 object
store.
Listing and snapsoht metadata operation currently rely soly on the
local cache store, with the intention to provide a mechanism to
re-sync and merge with object stored on the S3 backend if requested.
Sending this patch series as RFC to get some initial feedback, mostly
on the S3 client implementation part and the corresponding
configuration integration with PBS, which is already in an advanced
stage and warants initial review and real world testing.
Datastore operations on the S3 backend are still work in progress,
but feedback on that is appreciated very much as well.
Among the open points still being worked on are:
- Consistency between local cache and S3 store.
- Sync and merge of namespace, group snapshot and index files when
required or requested.
- Advanced packing mechanism for chunks to significantly reduce the
number of api requests and therefore be more cost effective.
- Reduction of in-memory copies for chunks/blobs and recalculation of
checksums.
Testing:
For testing, an S3 compatible object store provided via Ceph RADOS
gateway can be used by the following setup. This was performed on a
pre-existing Ceph Reef 18.2 cluster.
Install radosgw on all the nodes:
```
apt install radosgw
```
On one node, generate client keyring:
```
ceph-authtool --create-keyring /etc/pve/priv/ceph.client.radosgw.keyring
```
For each node, generate key and add it to the keyring (adapt name
accordingly):
```
ceph-authtool /etc/pve/priv/ceph.client.radosgw.keyring -n client.radosgw.pve-c0-n1 --gen-key
```
Setup capabilities for client keys:
```
ceph-authtool -n client.radosgw.pve-c0-n1 --cap osd 'allow rwx' --cap mon 'allow rwx' /etc/pve/priv/ceph.client.radosgw.keyring
```
Add the keys (repeat for each) to the cluster:
```
ceph -k /etc/pve/priv/ceph.client.admin.keyring auth add client.radosgw.pve-c0-n1 -i /etc/pve/priv/ceph.client.radosgw.keyring
```
For each client, add a config based on the one below to /etc/ceph/ceph.conf
```
[client.radosgw.pve-c0-n1]
host = pve-c0-n1
keyring = /etc/pve/priv/ceph.client.radosgw.keyring
log file = /var/log/ceph/client.radosgw.$host.log
rgw_dns_name = s3.pve-c0-n1.local
```
Restart the service for each node, e.g.
```
systemctl daemon-reload
systemctl restart radosgw.service
```
Setup a new user, generating access key and secret key shown in
output:
```
radosgw-admin user create --uid=testuser --display-name="TestUser" --email=your@mail.com
```
Since the configuration and keyring are located on the pmxcfs, add
the following override so the gateway service is only started after
pve-cluster by adding to
`/etc/systemd/system/radosgw.service.d/override.conf`:
```
[Unit]
Documentation=man:systemd-sysv-generator(8)
SourcePath=/etc/init.d/radosgw
Description=LSB: radosgw RESTful rados gateway
After=pve-cluster.service
Wants=pve-cluster.service
```
A custom certificate must be added since the client forces tls by
extending the config with a path to a custom generated certificate and
key:
```
[client.radosgw.pve-c0-n1]
host = pve-c0-n1
keyring = /etc/pve/priv/ceph.client.radosgw.keyring
logfile = /var/log/ceph/client.radsgw.$host.log
rgw_dns_name = s3.pve-c0-n1.local
rgw_frontends = "beast ssl_port=7480 ssl_certificate=/etc/pve/ceph/server-cert.pem ssl_private_key=/etc/pve/ceph/server-key.pem"
```
A new bucket can finally be created using the `s3cmd` cli tool after
initial configuration.
Most notable changes since the previous RFC version 1 [0]:
- Improved and fixed various issues with consistency and locking,
especially with respect to backup group/snapshot pruning
- Fix and improve listing and deletion of multiple object, also taking
S3 api object count limits into account.
- Fix namespace handling, especially with respect to prune.
- Fix pull sync jobs not uploading chunks to S3 object store backend
- Fix permissions for s3 config api endpoints
- Fixed issues with hyper::Body not being consumed when skipping cached
chunks, resulting in stream errors on upload
- Use md5 checksum for consistency checks, crc32 is not yet implemented
and ignored by many S3 compatible apis, e.g. rados gateway.
- Use iso6801 parser for last modified timestamp parser over limited own
implementation.
- Rework proxmox-backup-manager s3 command for basic sanity checking
- Smaller bugfixes, code style cleanups and refactoring
[0] https://lore.proxmox.com/pbs-devel/20250519114640.303640-1-c.ebner@proxmox.com/T/
proxmox:
Christian Ebner (2):
pbs-api-types: add types for S3 client configs and secrets
pbs-api-types: extend datastore config by backend config enum
pbs-api-types/src/datastore.rs | 58 +++++++++++++-
pbs-api-types/src/lib.rs | 3 +
pbs-api-types/src/s3.rs | 138 +++++++++++++++++++++++++++++++++
3 files changed, 198 insertions(+), 1 deletion(-)
create mode 100644 pbs-api-types/src/s3.rs
proxmox-backup:
Christian Ebner (40):
api: fix minor formatting issues
bin: sort submodules alphabetically
datastore: ignore missing owner file when removing group directory
verify: refactor verify related functions to be methods of worker
s3 client: add crate for AWS S3 compatible object store client
s3 client: implement AWS signature v4 request authentication
s3 client: add dedicated type for s3 object keys
s3 client: add type for last modified timestamp in responses
s3 client: add helper to parse http date headers
s3 client: implement methods to operate on s3 objects in bucket
config: introduce s3 object store client configuration
api: config: implement endpoints to manipulate and list s3 configs
api: datastore: check S3 backend bucket access on datastore create
api/bin: add endpoint and command to check s3 client connection
datastore: allow to get the backend for a datastore
api: backup: store datastore backend in runtime environment
api: backup: conditionally upload chunks to S3 object store backend
api: backup: conditionally upload blobs to S3 object store backend
api: backup: conditionally upload indices to S3 object store backend
api: backup: conditionally upload manifest to S3 object store backend
sync: pull: conditionally upload content to S3 backend
api: reader: fetch chunks based on datastore backend
datastore: local chunk reader: read chunks based on backend
verify worker: add datastore backed to verify worker
verify: implement chunk verification for stores with s3 backend
datastore: create namespace marker in S3 backend
datastore: create/delete protected marker file on S3 storage backend
datastore: prune groups/snapshots from S3 object store backend
datastore: get and set owner for S3 store backend
datastore: implement garbage collection for s3 backend
ui: add S3 client edit window for configuration create/edit
ui: add S3 client view for configuration
ui: expose the S3 client view in the navigation tree
ui: add s3 bucket selector and allow to set s3 backend
tools: lru cache: add removed callback for evicted cache nodes
tools: async lru cache: implement insert, remove and contains methods
datastore: add local datastore cache for network attached storages
api: backup: use local datastore cache on S3 backend chunk upload
api: reader: use local datastore cache on S3 backend chunk fetching
api: backup: add no-cache flag to bypass local datastore cache
Cargo.toml | 8 +
examples/upload-speed.rs | 1 +
pbs-client/src/backup_writer.rs | 4 +-
pbs-config/src/lib.rs | 1 +
pbs-config/src/s3.rs | 82 ++
pbs-datastore/Cargo.toml | 3 +
pbs-datastore/src/backup_info.rs | 53 +-
pbs-datastore/src/cached_chunk_reader.rs | 6 +-
pbs-datastore/src/datastore.rs | 435 ++++++++-
pbs-datastore/src/dynamic_index.rs | 1 +
pbs-datastore/src/lib.rs | 4 +
pbs-datastore/src/local_chunk_reader.rs | 37 +-
.../src/local_datastore_lru_cache.rs | 116 +++
pbs-s3-client/Cargo.toml | 29 +
pbs-s3-client/src/aws_sign_v4.rs | 140 +++
pbs-s3-client/src/client.rs | 594 ++++++++++++
pbs-s3-client/src/lib.rs | 122 +++
pbs-s3-client/src/object_key.rs | 64 ++
pbs-s3-client/src/response_reader.rs | 343 +++++++
pbs-tools/src/async_lru_cache.rs | 46 +-
pbs-tools/src/lru_cache.rs | 42 +-
proxmox-backup-client/src/benchmark.rs | 1 +
proxmox-backup-client/src/main.rs | 8 +
src/api2/admin/datastore.rs | 52 +-
src/api2/admin/mod.rs | 2 +
src/api2/admin/s3.rs | 72 ++
src/api2/backup/environment.rs | 145 ++-
src/api2/backup/mod.rs | 107 +--
src/api2/backup/upload_chunk.rs | 93 +-
src/api2/config/datastore.rs | 41 +-
src/api2/config/mod.rs | 2 +
src/api2/config/s3.rs | 305 ++++++
src/api2/reader/environment.rs | 12 +-
src/api2/reader/mod.rs | 59 +-
src/backup/verify.rs | 879 +++++++++---------
src/bin/proxmox-backup-manager.rs | 1 +
src/bin/proxmox_backup_manager/mod.rs | 30 +-
src/bin/proxmox_backup_manager/s3.rs | 34 +
src/server/pull.rs | 62 +-
src/server/push.rs | 1 +
src/server/verify_job.rs | 12 +-
www/Makefile | 3 +
www/NavigationTree.js | 6 +
www/config/S3BucketView.js | 144 +++
www/form/S3BucketSelector.js | 40 +
www/window/DataStoreEdit.js | 35 +
www/window/S3BucketEdit.js | 125 +++
47 files changed, 3753 insertions(+), 649 deletions(-)
create mode 100644 pbs-config/src/s3.rs
create mode 100644 pbs-datastore/src/local_datastore_lru_cache.rs
create mode 100644 pbs-s3-client/Cargo.toml
create mode 100644 pbs-s3-client/src/aws_sign_v4.rs
create mode 100644 pbs-s3-client/src/client.rs
create mode 100644 pbs-s3-client/src/lib.rs
create mode 100644 pbs-s3-client/src/object_key.rs
create mode 100644 pbs-s3-client/src/response_reader.rs
create mode 100644 src/api2/admin/s3.rs
create mode 100644 src/api2/config/s3.rs
create mode 100644 src/bin/proxmox_backup_manager/s3.rs
create mode 100644 www/config/S3BucketView.js
create mode 100644 www/form/S3BucketSelector.js
create mode 100644 www/window/S3BucketEdit.js
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox/bookworm-stable 1/42] pbs-api-types: add types for S3 client configs and secrets
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox/bookworm-stable 2/42] pbs-api-types: extend datastore config by backend config enum Christian Ebner
` (42 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Adds the new config types `S3ClientConfig` and `S3ClientSecret` to
configure datastore backends using an S3 compatible object store.
Secrets are stored as different config to never be returned on api
calls, only allowing to set/update the values.
Use a different name (`secrets_id`) for the unique identifier in case
of the secrets type, although the same id should be used for storing
and lookup. By this, clashing of property names when using flattened
types as api parameters is avoided.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-api-types/src/lib.rs | 3 +
pbs-api-types/src/s3.rs | 138 +++++++++++++++++++++++++++++++++++++++
2 files changed, 141 insertions(+)
create mode 100644 pbs-api-types/src/s3.rs
diff --git a/pbs-api-types/src/lib.rs b/pbs-api-types/src/lib.rs
index 99ec7961..7a5ea11d 100644
--- a/pbs-api-types/src/lib.rs
+++ b/pbs-api-types/src/lib.rs
@@ -147,6 +147,9 @@ pub use remote::*;
mod pathpatterns;
pub use pathpatterns::*;
+mod s3;
+pub use s3::*;
+
mod tape;
pub use tape::*;
diff --git a/pbs-api-types/src/s3.rs b/pbs-api-types/src/s3.rs
new file mode 100644
index 00000000..40c502ba
--- /dev/null
+++ b/pbs-api-types/src/s3.rs
@@ -0,0 +1,138 @@
+use anyhow::bail;
+use serde::{Deserialize, Serialize};
+
+use proxmox_schema::api_types::{
+ CERT_FINGERPRINT_SHA256_SCHEMA, DNS_NAME_OR_IP_SCHEMA, SAFE_ID_FORMAT,
+};
+use proxmox_schema::{api, const_regex, ApiStringFormat, Schema, StringSchema, Updater};
+
+#[rustfmt::skip]
+pub const S3_BUCKET_NAME_REGEX_STR: &str = r"^[a-z0-9]([a-z0-9\-]*[a-z0-9])?$";
+
+const_regex! {
+ /// Regex to match S3 bucket names.
+ ///
+ /// Be as strict as possible following the rules as described here:
+ /// https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html#general-purpose-bucket-names
+ pub S3_BUCKET_NAME_REGEX = r"^[a-z0-9]([a-z0-9\-]*[a-z0-9])?$";
+ /// Regex to match S3 regions.
+ pub S3_REGION_REGEX = r"^[a-z]{2}\-[a-z]{4,}\-[0-9]$";
+}
+
+pub const S3_REGION_FORMAT: ApiStringFormat = ApiStringFormat::Pattern(&S3_REGION_REGEX);
+
+pub const S3_CLIENT_ID_SCHEMA: Schema =
+ StringSchema::new("Unique ID to identify s3 client config.")
+ .format(&SAFE_ID_FORMAT)
+ .min_length(3)
+ .max_length(32)
+ .schema();
+
+pub const S3_REGION_SCHEMA: Schema = StringSchema::new("Region to access S3 object store.")
+ .format(&S3_REGION_FORMAT)
+ .min_length(3)
+ .max_length(32)
+ .schema();
+
+pub const S3_BUCKET_NAME_SCHEMA: Schema = StringSchema::new("Bucket name for S3 object store.")
+ .format(&ApiStringFormat::VerifyFn(|bucket_name| {
+ if !(S3_BUCKET_NAME_REGEX.regex_obj)().is_match(bucket_name) {
+ bail!("Bucket name does not match the regex pattern");
+ }
+
+ // Exclude pre- and postfixes described here:
+ // https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html#general-purpose-bucket-names
+ let forbidden_prefixes = ["xn--", "sthree-", "amzn-s3-demo-"];
+ for prefix in forbidden_prefixes {
+ if bucket_name.starts_with(prefix) {
+ bail!("Bucket name cannot start with '{prefix}'");
+ }
+ }
+
+ let forbidden_postfixes = ["--ol-s3", ".mrap", "--x-s3"];
+ for postfix in forbidden_postfixes {
+ if bucket_name.ends_with(postfix) {
+ bail!("Bucket name cannot end with '{postfix}'");
+ }
+ }
+
+ Ok(())
+ }))
+ .min_length(3)
+ .max_length(63)
+ .schema();
+
+#[api(
+ properties: {
+ id: {
+ schema: S3_CLIENT_ID_SCHEMA,
+ },
+ host: {
+ schema: DNS_NAME_OR_IP_SCHEMA,
+ },
+ bucket: {
+ schema: S3_BUCKET_NAME_SCHEMA,
+ },
+ port: {
+ type: u16,
+ description: "Port to access S3 object store.",
+ optional: true,
+ },
+ region: {
+ schema: S3_REGION_SCHEMA,
+ optional: true,
+ },
+ fingerprint: {
+ schema: CERT_FINGERPRINT_SHA256_SCHEMA,
+ optional: true,
+ },
+ "access-key": {
+ type: String,
+ description: "Access key for S3 object store.",
+ },
+ }
+)]
+#[derive(Serialize, Deserialize, Updater, Clone, PartialEq)]
+#[serde(rename_all = "kebab-case")]
+/// S3 client configuration properties.
+pub struct S3ClientConfig {
+ #[updater(skip)]
+ pub id: String,
+ pub host: String,
+ pub bucket: String,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub port: Option<u16>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub region: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub fingerprint: Option<String>,
+ pub access_key: String,
+}
+
+impl S3ClientConfig {
+ pub fn acl_path(&self) -> Vec<&str> {
+ // Needs permissions on root path
+ Vec::new()
+ }
+}
+
+#[api(
+ properties: {
+ "secrets-id": {
+ type: String,
+ description: "Unique ID to identify s3 client secret config.",
+ },
+ "secret-key": {
+ type: String,
+ description: "Secret key for S3 object store.",
+ },
+ }
+)]
+#[derive(Serialize, Deserialize, Updater, Clone, PartialEq)]
+#[serde(rename_all = "kebab-case")]
+/// S3 client secrets configuration properties.
+pub struct S3ClientSecretsConfig {
+ #[updater(skip)]
+ pub secrets_id: String,
+ pub secret_key: String,
+}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox/bookworm-stable 2/42] pbs-api-types: extend datastore config by backend config enum
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox/bookworm-stable 1/42] pbs-api-types: add types for S3 client configs and secrets Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 03/42] api: fix minor formatting issues Christian Ebner
` (41 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Allows to configure a backend config variant for a datastore on
creation. The current default `Filesystem` backend variant is
introduced to be compatible with existing storages. A new S3 backend
variant allows to create datastores backed by an S3 compatible object
store instead.
For S3 backends, the id of the corresponding S3 client configuration
is storered. A valid datastore backend configuration for S3 therefore
contains:
```
...
backend s3=<S3_CONFIG_ID>
...
```
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-api-types/src/datastore.rs | 58 +++++++++++++++++++++++++++++++++-
1 file changed, 57 insertions(+), 1 deletion(-)
diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index 5bd953ac..2b983cb2 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -336,7 +336,11 @@ pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore
optional: true,
format: &proxmox_schema::api_types::UUID_FORMAT,
type: String,
- }
+ },
+ backend: {
+ schema: DATASTORE_BACKEND_SCHEMA,
+ optional: true,
+ },
}
)]
#[derive(Serialize, Deserialize, Updater, Clone, PartialEq)]
@@ -389,8 +393,59 @@ pub struct DataStoreConfig {
#[updater(skip)]
#[serde(skip_serializing_if = "Option::is_none")]
pub backing_device: Option<String>,
+
+ /// Backend to be used by datastore
+ #[updater(skip)]
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub backend: Option<String>,
+}
+
+pub const DATASTORE_BACKEND_SCHEMA: Schema = StringSchema::new("Backend config to be used for datastore.")
+ .format(&ApiStringFormat::VerifyFn(verify_datastore_backend))
+ .type_text("<filesystem|s3=S3_CONFIG_ID>")
+ .schema();
+
+fn verify_datastore_backend(input: &str) -> Result<(), Error> {
+ DatastoreBackendConfig::from_str(input).map(|_| ())
+}
+
+#[derive(Clone, Default)]
+/// Available backend configurations for datastores.
+pub enum DatastoreBackendConfig {
+ #[default]
+ Filesystem,
+ S3(String),
}
+impl std::str::FromStr for DatastoreBackendConfig {
+ type Err = Error;
+
+ fn from_str(s: &str) -> Result<Self, Self::Err> {
+ if s == "filesystem" {
+ return Ok(Self::Filesystem);
+ }
+ match s.split_once('=') {
+ Some(("s3", value)) => {
+ let s3_config_id = value.parse()?;
+ Ok(Self::S3(s3_config_id))
+ }
+ _ => bail!("invalid datastore backend configuration"),
+ }
+ }
+}
+
+impl std::fmt::Display for DatastoreBackendConfig {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ match self {
+ Self::Filesystem => write!(f, "filesystem"),
+ Self::S3(s3_config_id) => write!(f, "s3:{s3_config_id}"),
+ }
+ }
+}
+
+proxmox_serde::forward_serialize_to_display!(DatastoreBackendConfig);
+proxmox_serde::forward_deserialize_to_from_str!(DatastoreBackendConfig);
+
#[api]
#[derive(Serialize, Deserialize, Updater, Clone, PartialEq, Default)]
#[serde(rename_all = "kebab-case")]
@@ -424,6 +479,7 @@ impl DataStoreConfig {
tuning: None,
maintenance_mode: None,
backing_device: None,
+ backend: None,
}
}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 03/42] api: fix minor formatting issues
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox/bookworm-stable 1/42] pbs-api-types: add types for S3 client configs and secrets Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox/bookworm-stable 2/42] pbs-api-types: extend datastore config by backend config enum Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 04/42] bin: sort submodules alphabetically Christian Ebner
` (40 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
These are currently not shown by a `cargo fmt --check`.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/backup/mod.rs | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
index 629df933e..567bca3ef 100644
--- a/src/api2/backup/mod.rs
+++ b/src/api2/backup/mod.rs
@@ -166,7 +166,7 @@ fn upgrade_to_backup_protocol(
Ok(None) => {
// no verify state found, treat as valid
Some(info)
- },
+ }
Err(err) => {
warn!("error parsing the snapshot manifest: {err:#}");
Some(info)
@@ -236,7 +236,8 @@ fn upgrade_to_backup_protocol(
.and_then(move |conn| {
env2.debug("protocol upgrade done");
- let mut http = hyper::server::conn::http2::Builder::new(ExecInheritLogContext);
+ let mut http =
+ hyper::server::conn::http2::Builder::new(ExecInheritLogContext);
// increase window size: todo - find optiomal size
let window_size = 32 * 1024 * 1024; // max = (1 << 31) - 2
http.initial_stream_window_size(window_size);
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 04/42] bin: sort submodules alphabetically
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (2 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 03/42] api: fix minor formatting issues Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 05/42] datastore: ignore missing owner file when removing group directory Christian Ebner
` (39 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Makes it easier to find existing entries or insert new modules.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/bin/proxmox_backup_manager/mod.rs | 28 +++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/src/bin/proxmox_backup_manager/mod.rs b/src/bin/proxmox_backup_manager/mod.rs
index 11fb6dd3b..9b5c73e9a 100644
--- a/src/bin/proxmox_backup_manager/mod.rs
+++ b/src/bin/proxmox_backup_manager/mod.rs
@@ -8,31 +8,31 @@ mod cert;
pub use cert::*;
mod datastore;
pub use datastore::*;
+mod disk;
+pub use disk::*;
mod dns;
pub use dns::*;
mod ldap;
pub use ldap::*;
mod network;
pub use network::*;
-mod prune;
-pub use prune::*;
-mod remote;
-pub use remote::*;
-mod sync;
-pub use sync::*;
-mod verify;
-pub use verify::*;
-mod user;
-pub use user::*;
-mod subscription;
-pub use subscription::*;
-mod disk;
-pub use disk::*;
mod node;
pub use node::*;
mod notifications;
pub use notifications::*;
mod openid;
pub use openid::*;
+mod prune;
+pub use prune::*;
+mod remote;
+pub use remote::*;
+mod subscription;
+pub use subscription::*;
+mod sync;
+pub use sync::*;
mod traffic_control;
pub use traffic_control::*;
+mod user;
+pub use user::*;
+mod verify;
+pub use verify::*;
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 05/42] datastore: ignore missing owner file when removing group directory
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (3 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 04/42] bin: sort submodules alphabetically Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 06/42] verify: refactor verify related functions to be methods of worker Christian Ebner
` (38 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Since commit 23be00a4 ("fix #3336: datastore: remove group if the
last snapshot is removed"), a backup group directory is cleaned up
when the new locking mechanism is in use once:
- the group is requested to be destroyed and all the snapshots have
been deleted
- the last snapshot of a group has been destroyed
Since then, the owner file is also cleaned up separately.
However, the owner file might be already missing due to removal of
the group directory executed when removing the last backup snapshot
of the group, making the subsequent call in the backup group destroy
method fail.
Fix this by ignoring a missing owner file and continue with trying to
emove the group directory itself.
Fixes: 23be00a4 ("fix #3336: datastore: remove group if the last snapshot is removed")
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/backup_info.rs | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index d4732fdd9..1422fe865 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -246,9 +246,11 @@ impl BackupGroup {
fn remove_group_dir(&self) -> Result<(), Error> {
let owner_path = self.store.owner_path(&self.ns, &self.group);
- std::fs::remove_file(&owner_path).map_err(|err| {
- format_err!("removing the owner file '{owner_path:?}' failed - {err}")
- })?;
+ if let Err(err) = std::fs::remove_file(&owner_path) {
+ if err.kind() != std::io::ErrorKind::NotFound {
+ bail!("removing the owner file '{owner_path:?}' failed - {err}");
+ }
+ }
let path = self.full_group_path();
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 06/42] verify: refactor verify related functions to be methods of worker
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (4 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 05/42] datastore: ignore missing owner file when removing group directory Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 07/42] s3 client: add crate for AWS S3 compatible object store client Christian Ebner
` (37 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Instead of passing the VerifyWorker state as reference to the various
verification related functions, implement them as methods or
associated functions of the VerifyWorker. This does not only make
their correlation more clear, but it also reduces the number of
function call parameters and improves readability.
No functional changes intended.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/admin/datastore.rs | 28 +-
src/api2/backup/environment.rs | 7 +-
src/backup/verify.rs | 830 ++++++++++++++++-----------------
src/server/verify_job.rs | 12 +-
4 files changed, 423 insertions(+), 454 deletions(-)
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 392494488..7dc881ade 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -70,10 +70,7 @@ use proxmox_rest_server::{formatter, WorkerTask};
use crate::api2::backup::optional_ns_param;
use crate::api2::node::rrd::create_value_from_rrd;
-use crate::backup::{
- check_ns_privs_full, verify_all_backups, verify_backup_dir, verify_backup_group, verify_filter,
- ListAccessibleBackupGroups, NS_PRIVS_OK,
-};
+use crate::backup::{check_ns_privs_full, ListAccessibleBackupGroups, VerifyWorker, NS_PRIVS_OK};
use crate::server::jobstate::{compute_schedule_status, Job, JobState};
@@ -896,14 +893,15 @@ pub fn verify(
auth_id.to_string(),
to_stdout,
move |worker| {
- let verify_worker = crate::backup::VerifyWorker::new(worker.clone(), datastore);
+ let verify_worker = VerifyWorker::new(worker.clone(), datastore);
let failed_dirs = if let Some(backup_dir) = backup_dir {
let mut res = Vec::new();
- if !verify_backup_dir(
- &verify_worker,
+ if !verify_worker.verify_backup_dir(
&backup_dir,
worker.upid().clone(),
- Some(&move |manifest| verify_filter(ignore_verified, outdated_after, manifest)),
+ Some(&move |manifest| {
+ VerifyWorker::verify_filter(ignore_verified, outdated_after, manifest)
+ }),
)? {
res.push(print_ns_and_snapshot(
backup_dir.backup_ns(),
@@ -912,12 +910,13 @@ pub fn verify(
}
res
} else if let Some(backup_group) = backup_group {
- verify_backup_group(
- &verify_worker,
+ verify_worker.verify_backup_group(
&backup_group,
&mut StoreProgress::new(1),
worker.upid(),
- Some(&move |manifest| verify_filter(ignore_verified, outdated_after, manifest)),
+ Some(&move |manifest| {
+ VerifyWorker::verify_filter(ignore_verified, outdated_after, manifest)
+ }),
)?
} else {
let owner = if owner_check_required {
@@ -926,13 +925,14 @@ pub fn verify(
None
};
- verify_all_backups(
- &verify_worker,
+ verify_worker.verify_all_backups(
worker.upid(),
ns,
max_depth,
owner,
- Some(&move |manifest| verify_filter(ignore_verified, outdated_after, manifest)),
+ Some(&move |manifest| {
+ VerifyWorker::verify_filter(ignore_verified, outdated_after, manifest)
+ }),
)?
};
if !failed_dirs.is_empty() {
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 3d541b461..6cd29f512 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -18,7 +18,7 @@ use pbs_datastore::fixed_index::FixedIndexWriter;
use pbs_datastore::{DataBlob, DataStore};
use proxmox_rest_server::{formatter::*, WorkerTask};
-use crate::backup::verify_backup_dir_with_lock;
+use crate::backup::VerifyWorker;
use hyper::{Body, Response};
@@ -671,9 +671,8 @@ impl BackupEnvironment {
move |worker| {
worker.log_message("Automatically verifying newly added snapshot");
- let verify_worker = crate::backup::VerifyWorker::new(worker.clone(), datastore);
- if !verify_backup_dir_with_lock(
- &verify_worker,
+ let verify_worker = VerifyWorker::new(worker.clone(), datastore);
+ if !verify_worker.verify_backup_dir_with_lock(
&backup_dir,
worker.upid().clone(),
None,
diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index 3d2cba8ac..0b954ae23 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -44,517 +44,491 @@ impl VerifyWorker {
corrupt_chunks: Arc::new(Mutex::new(HashSet::with_capacity(64))),
}
}
-}
-
-fn verify_blob(backup_dir: &BackupDir, info: &FileInfo) -> Result<(), Error> {
- let blob = backup_dir.load_blob(&info.filename)?;
- let raw_size = blob.raw_size();
- if raw_size != info.size {
- bail!("wrong size ({} != {})", info.size, raw_size);
- }
-
- let csum = openssl::sha::sha256(blob.raw_data());
- if csum != info.csum {
- bail!("wrong index checksum");
- }
+ fn verify_blob(backup_dir: &BackupDir, info: &FileInfo) -> Result<(), Error> {
+ let blob = backup_dir.load_blob(&info.filename)?;
- match blob.crypt_mode()? {
- CryptMode::Encrypt => Ok(()),
- CryptMode::None => {
- // digest already verified above
- blob.decode(None, None)?;
- Ok(())
+ let raw_size = blob.raw_size();
+ if raw_size != info.size {
+ bail!("wrong size ({} != {})", info.size, raw_size);
}
- CryptMode::SignOnly => bail!("Invalid CryptMode for blob"),
- }
-}
-
-fn rename_corrupted_chunk(datastore: Arc<DataStore>, digest: &[u8; 32]) {
- let (path, digest_str) = datastore.chunk_path(digest);
- let mut counter = 0;
- let mut new_path = path.clone();
- loop {
- new_path.set_file_name(format!("{}.{}.bad", digest_str, counter));
- if new_path.exists() && counter < 9 {
- counter += 1;
- } else {
- break;
+ let csum = openssl::sha::sha256(blob.raw_data());
+ if csum != info.csum {
+ bail!("wrong index checksum");
}
- }
- match std::fs::rename(&path, &new_path) {
- Ok(_) => {
- info!("corrupted chunk renamed to {:?}", &new_path);
- }
- Err(err) => {
- match err.kind() {
- std::io::ErrorKind::NotFound => { /* ignored */ }
- _ => info!("could not rename corrupted chunk {:?} - {err}", &path),
+ match blob.crypt_mode()? {
+ CryptMode::Encrypt => Ok(()),
+ CryptMode::None => {
+ // digest already verified above
+ blob.decode(None, None)?;
+ Ok(())
}
+ CryptMode::SignOnly => bail!("Invalid CryptMode for blob"),
}
- };
-}
+ }
-fn verify_index_chunks(
- verify_worker: &VerifyWorker,
- index: Box<dyn IndexFile + Send>,
- crypt_mode: CryptMode,
-) -> Result<(), Error> {
- let errors = Arc::new(AtomicUsize::new(0));
+ fn rename_corrupted_chunk(datastore: Arc<DataStore>, digest: &[u8; 32]) {
+ let (path, digest_str) = datastore.chunk_path(digest);
- let start_time = Instant::now();
+ let mut counter = 0;
+ let mut new_path = path.clone();
+ loop {
+ new_path.set_file_name(format!("{}.{}.bad", digest_str, counter));
+ if new_path.exists() && counter < 9 {
+ counter += 1;
+ } else {
+ break;
+ }
+ }
- let mut read_bytes = 0;
- let mut decoded_bytes = 0;
+ match std::fs::rename(&path, &new_path) {
+ Ok(_) => {
+ info!("corrupted chunk renamed to {:?}", &new_path);
+ }
+ Err(err) => {
+ match err.kind() {
+ std::io::ErrorKind::NotFound => { /* ignored */ }
+ _ => info!("could not rename corrupted chunk {:?} - {err}", &path),
+ }
+ }
+ };
+ }
- let datastore2 = Arc::clone(&verify_worker.datastore);
- let corrupt_chunks2 = Arc::clone(&verify_worker.corrupt_chunks);
- let verified_chunks2 = Arc::clone(&verify_worker.verified_chunks);
- let errors2 = Arc::clone(&errors);
+ fn verify_index_chunks(
+ &self,
+ index: Box<dyn IndexFile + Send>,
+ crypt_mode: CryptMode,
+ ) -> Result<(), Error> {
+ let errors = Arc::new(AtomicUsize::new(0));
+
+ let start_time = Instant::now();
+
+ let mut read_bytes = 0;
+ let mut decoded_bytes = 0;
+
+ let datastore2 = Arc::clone(&self.datastore);
+ let corrupt_chunks2 = Arc::clone(&self.corrupt_chunks);
+ let verified_chunks2 = Arc::clone(&self.verified_chunks);
+ let errors2 = Arc::clone(&errors);
+
+ let decoder_pool = ParallelHandler::new(
+ "verify chunk decoder",
+ 4,
+ move |(chunk, digest, size): (DataBlob, [u8; 32], u64)| {
+ let chunk_crypt_mode = match chunk.crypt_mode() {
+ Err(err) => {
+ corrupt_chunks2.lock().unwrap().insert(digest);
+ info!("can't verify chunk, unknown CryptMode - {err}");
+ errors2.fetch_add(1, Ordering::SeqCst);
+ return Ok(());
+ }
+ Ok(mode) => mode,
+ };
+
+ if chunk_crypt_mode != crypt_mode {
+ info!(
+ "chunk CryptMode {chunk_crypt_mode:?} does not match index CryptMode {crypt_mode:?}"
+ );
+ errors2.fetch_add(1, Ordering::SeqCst);
+ }
- let decoder_pool = ParallelHandler::new(
- "verify chunk decoder",
- 4,
- move |(chunk, digest, size): (DataBlob, [u8; 32], u64)| {
- let chunk_crypt_mode = match chunk.crypt_mode() {
- Err(err) => {
+ if let Err(err) = chunk.verify_unencrypted(size as usize, &digest) {
corrupt_chunks2.lock().unwrap().insert(digest);
- info!("can't verify chunk, unknown CryptMode - {err}");
+ info!("{err}");
errors2.fetch_add(1, Ordering::SeqCst);
- return Ok(());
+ Self::rename_corrupted_chunk(datastore2.clone(), &digest);
+ } else {
+ verified_chunks2.lock().unwrap().insert(digest);
}
- Ok(mode) => mode,
- };
- if chunk_crypt_mode != crypt_mode {
- info!(
- "chunk CryptMode {chunk_crypt_mode:?} does not match index CryptMode {crypt_mode:?}"
- );
- errors2.fetch_add(1, Ordering::SeqCst);
- }
+ Ok(())
+ },
+ );
- if let Err(err) = chunk.verify_unencrypted(size as usize, &digest) {
- corrupt_chunks2.lock().unwrap().insert(digest);
- info!("{err}");
- errors2.fetch_add(1, Ordering::SeqCst);
- rename_corrupted_chunk(datastore2.clone(), &digest);
+ let skip_chunk = |digest: &[u8; 32]| -> bool {
+ if self.verified_chunks.lock().unwrap().contains(digest) {
+ true
+ } else if self.corrupt_chunks.lock().unwrap().contains(digest) {
+ let digest_str = hex::encode(digest);
+ info!("chunk {digest_str} was marked as corrupt");
+ errors.fetch_add(1, Ordering::SeqCst);
+ true
} else {
- verified_chunks2.lock().unwrap().insert(digest);
+ false
}
+ };
+ let check_abort = |pos: usize| -> Result<(), Error> {
+ if pos & 1023 == 0 {
+ self.worker.check_abort()?;
+ self.worker.fail_on_shutdown()?;
+ }
Ok(())
- },
- );
-
- let skip_chunk = |digest: &[u8; 32]| -> bool {
- if verify_worker
- .verified_chunks
- .lock()
- .unwrap()
- .contains(digest)
- {
- true
- } else if verify_worker
- .corrupt_chunks
- .lock()
- .unwrap()
- .contains(digest)
- {
- let digest_str = hex::encode(digest);
- info!("chunk {digest_str} was marked as corrupt");
- errors.fetch_add(1, Ordering::SeqCst);
- true
- } else {
- false
- }
- };
-
- let check_abort = |pos: usize| -> Result<(), Error> {
- if pos & 1023 == 0 {
- verify_worker.worker.check_abort()?;
- verify_worker.worker.fail_on_shutdown()?;
- }
- Ok(())
- };
+ };
- let chunk_list =
- verify_worker
+ let chunk_list = self
.datastore
.get_chunks_in_order(&*index, skip_chunk, check_abort)?;
- for (pos, _) in chunk_list {
- verify_worker.worker.check_abort()?;
- verify_worker.worker.fail_on_shutdown()?;
+ for (pos, _) in chunk_list {
+ self.worker.check_abort()?;
+ self.worker.fail_on_shutdown()?;
- let info = index.chunk_info(pos).unwrap();
+ let info = index.chunk_info(pos).unwrap();
- // we must always recheck this here, the parallel worker below alter it!
- if skip_chunk(&info.digest) {
- continue; // already verified or marked corrupt
- }
-
- match verify_worker.datastore.load_chunk(&info.digest) {
- Err(err) => {
- verify_worker
- .corrupt_chunks
- .lock()
- .unwrap()
- .insert(info.digest);
- error!("can't verify chunk, load failed - {err}");
- errors.fetch_add(1, Ordering::SeqCst);
- rename_corrupted_chunk(verify_worker.datastore.clone(), &info.digest);
+ // we must always recheck this here, the parallel worker below alter it!
+ if skip_chunk(&info.digest) {
+ continue; // already verified or marked corrupt
}
- Ok(chunk) => {
- let size = info.size();
- read_bytes += chunk.raw_size();
- decoder_pool.send((chunk, info.digest, size))?;
- decoded_bytes += size;
+
+ match self.datastore.load_chunk(&info.digest) {
+ Err(err) => {
+ self.corrupt_chunks.lock().unwrap().insert(info.digest);
+ error!("can't verify chunk, load failed - {err}");
+ errors.fetch_add(1, Ordering::SeqCst);
+ Self::rename_corrupted_chunk(self.datastore.clone(), &info.digest);
+ }
+ Ok(chunk) => {
+ let size = info.size();
+ read_bytes += chunk.raw_size();
+ decoder_pool.send((chunk, info.digest, size))?;
+ decoded_bytes += size;
+ }
}
}
- }
- decoder_pool.complete()?;
+ decoder_pool.complete()?;
+
+ let elapsed = start_time.elapsed().as_secs_f64();
- let elapsed = start_time.elapsed().as_secs_f64();
+ let read_bytes_mib = (read_bytes as f64) / (1024.0 * 1024.0);
+ let decoded_bytes_mib = (decoded_bytes as f64) / (1024.0 * 1024.0);
- let read_bytes_mib = (read_bytes as f64) / (1024.0 * 1024.0);
- let decoded_bytes_mib = (decoded_bytes as f64) / (1024.0 * 1024.0);
+ let read_speed = read_bytes_mib / elapsed;
+ let decode_speed = decoded_bytes_mib / elapsed;
- let read_speed = read_bytes_mib / elapsed;
- let decode_speed = decoded_bytes_mib / elapsed;
+ let error_count = errors.load(Ordering::SeqCst);
- let error_count = errors.load(Ordering::SeqCst);
+ info!(
+ " verified {read_bytes_mib:.2}/{decoded_bytes_mib:.2} MiB in {elapsed:.2} seconds, speed {read_speed:.2}/{decode_speed:.2} MiB/s ({error_count} errors)"
+ );
- info!(
- " verified {read_bytes_mib:.2}/{decoded_bytes_mib:.2} MiB in {elapsed:.2} seconds, speed {read_speed:.2}/{decode_speed:.2} MiB/s ({error_count} errors)"
- );
+ if errors.load(Ordering::SeqCst) > 0 {
+ bail!("chunks could not be verified");
+ }
- if errors.load(Ordering::SeqCst) > 0 {
- bail!("chunks could not be verified");
+ Ok(())
}
- Ok(())
-}
+ fn verify_fixed_index(&self, backup_dir: &BackupDir, info: &FileInfo) -> Result<(), Error> {
+ let mut path = backup_dir.relative_path();
+ path.push(&info.filename);
-fn verify_fixed_index(
- verify_worker: &VerifyWorker,
- backup_dir: &BackupDir,
- info: &FileInfo,
-) -> Result<(), Error> {
- let mut path = backup_dir.relative_path();
- path.push(&info.filename);
+ let index = self.datastore.open_fixed_reader(&path)?;
- let index = verify_worker.datastore.open_fixed_reader(&path)?;
+ let (csum, size) = index.compute_csum();
+ if size != info.size {
+ bail!("wrong size ({} != {})", info.size, size);
+ }
- let (csum, size) = index.compute_csum();
- if size != info.size {
- bail!("wrong size ({} != {})", info.size, size);
- }
+ if csum != info.csum {
+ bail!("wrong index checksum");
+ }
- if csum != info.csum {
- bail!("wrong index checksum");
+ self.verify_index_chunks(Box::new(index), info.chunk_crypt_mode())
}
- verify_index_chunks(verify_worker, Box::new(index), info.chunk_crypt_mode())
-}
-
-fn verify_dynamic_index(
- verify_worker: &VerifyWorker,
- backup_dir: &BackupDir,
- info: &FileInfo,
-) -> Result<(), Error> {
- let mut path = backup_dir.relative_path();
- path.push(&info.filename);
+ fn verify_dynamic_index(&self, backup_dir: &BackupDir, info: &FileInfo) -> Result<(), Error> {
+ let mut path = backup_dir.relative_path();
+ path.push(&info.filename);
- let index = verify_worker.datastore.open_dynamic_reader(&path)?;
+ let index = self.datastore.open_dynamic_reader(&path)?;
- let (csum, size) = index.compute_csum();
- if size != info.size {
- bail!("wrong size ({} != {})", info.size, size);
- }
-
- if csum != info.csum {
- bail!("wrong index checksum");
- }
+ let (csum, size) = index.compute_csum();
+ if size != info.size {
+ bail!("wrong size ({} != {})", info.size, size);
+ }
- verify_index_chunks(verify_worker, Box::new(index), info.chunk_crypt_mode())
-}
+ if csum != info.csum {
+ bail!("wrong index checksum");
+ }
-/// Verify a single backup snapshot
-///
-/// This checks all archives inside a backup snapshot.
-/// Errors are logged to the worker log.
-///
-/// Returns
-/// - Ok(true) if verify is successful
-/// - Ok(false) if there were verification errors
-/// - Err(_) if task was aborted
-pub fn verify_backup_dir(
- verify_worker: &VerifyWorker,
- backup_dir: &BackupDir,
- upid: UPID,
- filter: Option<&dyn Fn(&BackupManifest) -> bool>,
-) -> Result<bool, Error> {
- if !backup_dir.full_path().exists() {
- info!(
- "SKIPPED: verify {}:{} - snapshot does not exist (anymore).",
- verify_worker.datastore.name(),
- backup_dir.dir(),
- );
- return Ok(true);
+ self.verify_index_chunks(Box::new(index), info.chunk_crypt_mode())
}
- let snap_lock = backup_dir.lock_shared();
-
- match snap_lock {
- Ok(snap_lock) => {
- verify_backup_dir_with_lock(verify_worker, backup_dir, upid, filter, snap_lock)
- }
- Err(err) => {
+ /// Verify a single backup snapshot
+ ///
+ /// This checks all archives inside a backup snapshot.
+ /// Errors are logged to the worker log.
+ ///
+ /// Returns
+ /// - Ok(true) if verify is successful
+ /// - Ok(false) if there were verification errors
+ /// - Err(_) if task was aborted
+ pub fn verify_backup_dir(
+ &self,
+ backup_dir: &BackupDir,
+ upid: UPID,
+ filter: Option<&dyn Fn(&BackupManifest) -> bool>,
+ ) -> Result<bool, Error> {
+ if !backup_dir.full_path().exists() {
info!(
- "SKIPPED: verify {}:{} - could not acquire snapshot lock: {}",
- verify_worker.datastore.name(),
+ "SKIPPED: verify {}:{} - snapshot does not exist (anymore).",
+ self.datastore.name(),
backup_dir.dir(),
- err,
);
- Ok(true)
+ return Ok(true);
}
- }
-}
-/// See verify_backup_dir
-pub fn verify_backup_dir_with_lock(
- verify_worker: &VerifyWorker,
- backup_dir: &BackupDir,
- upid: UPID,
- filter: Option<&dyn Fn(&BackupManifest) -> bool>,
- _snap_lock: BackupLockGuard,
-) -> Result<bool, Error> {
- let datastore_name = verify_worker.datastore.name();
- let backup_dir_name = backup_dir.dir();
-
- let manifest = match backup_dir.load_manifest() {
- Ok((manifest, _)) => manifest,
- Err(err) => {
- info!("verify {datastore_name}:{backup_dir_name} - manifest load error: {err}");
- return Ok(false);
- }
- };
+ let snap_lock = backup_dir.lock_shared();
- if let Some(filter) = filter {
- if !filter(&manifest) {
- info!("SKIPPED: verify {datastore_name}:{backup_dir_name} (recently verified)");
- return Ok(true);
+ match snap_lock {
+ Ok(snap_lock) => self.verify_backup_dir_with_lock(backup_dir, upid, filter, snap_lock),
+ Err(err) => {
+ info!(
+ "SKIPPED: verify {}:{} - could not acquire snapshot lock: {}",
+ self.datastore.name(),
+ backup_dir.dir(),
+ err,
+ );
+ Ok(true)
+ }
}
}
- info!("verify {datastore_name}:{backup_dir_name}");
-
- let mut error_count = 0;
+ /// See verify_backup_dir
+ pub fn verify_backup_dir_with_lock(
+ &self,
+ backup_dir: &BackupDir,
+ upid: UPID,
+ filter: Option<&dyn Fn(&BackupManifest) -> bool>,
+ _snap_lock: BackupLockGuard,
+ ) -> Result<bool, Error> {
+ let datastore_name = self.datastore.name();
+ let backup_dir_name = backup_dir.dir();
+
+ let manifest = match backup_dir.load_manifest() {
+ Ok((manifest, _)) => manifest,
+ Err(err) => {
+ info!("verify {datastore_name}:{backup_dir_name} - manifest load error: {err}");
+ return Ok(false);
+ }
+ };
- let mut verify_result = VerifyState::Ok;
- for info in manifest.files() {
- let result = proxmox_lang::try_block!({
- info!(" check {}", info.filename);
- match ArchiveType::from_path(&info.filename)? {
- ArchiveType::FixedIndex => verify_fixed_index(verify_worker, backup_dir, info),
- ArchiveType::DynamicIndex => verify_dynamic_index(verify_worker, backup_dir, info),
- ArchiveType::Blob => verify_blob(backup_dir, info),
+ if let Some(filter) = filter {
+ if !filter(&manifest) {
+ info!("SKIPPED: verify {datastore_name}:{backup_dir_name} (recently verified)");
+ return Ok(true);
}
- });
+ }
- verify_worker.worker.check_abort()?;
- verify_worker.worker.fail_on_shutdown()?;
+ info!("verify {datastore_name}:{backup_dir_name}");
- if let Err(err) = result {
- info!(
- "verify {datastore_name}:{backup_dir_name}/{file_name} failed: {err}",
- file_name = info.filename,
- );
- error_count += 1;
- verify_result = VerifyState::Failed;
- }
- }
+ let mut error_count = 0;
- let verify_state = SnapshotVerifyState {
- state: verify_result,
- upid,
- };
-
- if let Err(err) = {
- let verify_state = serde_json::to_value(verify_state)?;
- backup_dir.update_manifest(|manifest| {
- manifest.unprotected["verify_state"] = verify_state;
- })
- } {
- info!("verify {datastore_name}:{backup_dir_name} - manifest update error: {err}");
- return Ok(false);
- }
+ let mut verify_result = VerifyState::Ok;
+ for info in manifest.files() {
+ let result = proxmox_lang::try_block!({
+ info!(" check {}", info.filename);
+ match ArchiveType::from_path(&info.filename)? {
+ ArchiveType::FixedIndex => self.verify_fixed_index(backup_dir, info),
+ ArchiveType::DynamicIndex => self.verify_dynamic_index(backup_dir, info),
+ ArchiveType::Blob => Self::verify_blob(backup_dir, info),
+ }
+ });
- Ok(error_count == 0)
-}
+ self.worker.check_abort()?;
+ self.worker.fail_on_shutdown()?;
-/// Verify all backups inside a backup group
-///
-/// Errors are logged to the worker log.
-///
-/// Returns
-/// - Ok((count, failed_dirs)) where failed_dirs had verification errors
-/// - Err(_) if task was aborted
-pub fn verify_backup_group(
- verify_worker: &VerifyWorker,
- group: &BackupGroup,
- progress: &mut StoreProgress,
- upid: &UPID,
- filter: Option<&dyn Fn(&BackupManifest) -> bool>,
-) -> Result<Vec<String>, Error> {
- let mut errors = Vec::new();
- let mut list = match group.list_backups() {
- Ok(list) => list,
- Err(err) => {
- info!(
- "verify {}, group {} - unable to list backups: {}",
- print_store_and_ns(verify_worker.datastore.name(), group.backup_ns()),
- group.group(),
- err,
- );
- return Ok(errors);
- }
- };
-
- let snapshot_count = list.len();
- info!(
- "verify group {}:{} ({} snapshots)",
- verify_worker.datastore.name(),
- group.group(),
- snapshot_count
- );
-
- progress.group_snapshots = snapshot_count as u64;
-
- BackupInfo::sort_list(&mut list, false); // newest first
- for (pos, info) in list.into_iter().enumerate() {
- if !verify_backup_dir(verify_worker, &info.backup_dir, upid.clone(), filter)? {
- errors.push(print_ns_and_snapshot(
- info.backup_dir.backup_ns(),
- info.backup_dir.as_ref(),
- ));
+ if let Err(err) = result {
+ info!(
+ "verify {datastore_name}:{backup_dir_name}/{file_name} failed: {err}",
+ file_name = info.filename,
+ );
+ error_count += 1;
+ verify_result = VerifyState::Failed;
+ }
}
- progress.done_snapshots = pos as u64 + 1;
- info!("percentage done: {progress}");
- }
- Ok(errors)
-}
-/// Verify all (owned) backups inside a datastore
-///
-/// Errors are logged to the worker log.
-///
-/// Returns
-/// - Ok(failed_dirs) where failed_dirs had verification errors
-/// - Err(_) if task was aborted
-pub fn verify_all_backups(
- verify_worker: &VerifyWorker,
- upid: &UPID,
- ns: BackupNamespace,
- max_depth: Option<usize>,
- owner: Option<&Authid>,
- filter: Option<&dyn Fn(&BackupManifest) -> bool>,
-) -> Result<Vec<String>, Error> {
- let mut errors = Vec::new();
-
- info!("verify datastore {}", verify_worker.datastore.name());
-
- let owner_filtered = if let Some(owner) = &owner {
- info!("limiting to backups owned by {owner}");
- true
- } else {
- false
- };
-
- // FIXME: This should probably simply enable recursion (or the call have a recursion parameter)
- let store = &verify_worker.datastore;
- let max_depth = max_depth.unwrap_or(pbs_api_types::MAX_NAMESPACE_DEPTH);
-
- let mut list = match ListAccessibleBackupGroups::new_with_privs(
- store,
- ns.clone(),
- max_depth,
- Some(PRIV_DATASTORE_VERIFY),
- Some(PRIV_DATASTORE_BACKUP),
- owner,
- ) {
- Ok(list) => list
- .filter_map(|group| match group {
- Ok(group) => Some(group),
- Err(err) if owner_filtered => {
- // intentionally not in task log, the user might not see this group!
- println!("error on iterating groups in ns '{ns}' - {err}");
- None
- }
- Err(err) => {
- // we don't filter by owner, but we want to log the error
- info!("error on iterating groups in ns '{ns}' - {err}");
- errors.push(err.to_string());
- None
- }
- })
- .filter(|group| {
- !(group.backup_type() == BackupType::Host && group.backup_id() == "benchmark")
+ let verify_state = SnapshotVerifyState {
+ state: verify_result,
+ upid,
+ };
+
+ if let Err(err) = {
+ let verify_state = serde_json::to_value(verify_state)?;
+ backup_dir.update_manifest(|manifest| {
+ manifest.unprotected["verify_state"] = verify_state;
})
- .collect::<Vec<BackupGroup>>(),
- Err(err) => {
- info!("unable to list backups: {err}");
- return Ok(errors);
+ } {
+ info!("verify {datastore_name}:{backup_dir_name} - manifest update error: {err}");
+ return Ok(false);
}
- };
- list.sort_unstable_by(|a, b| a.group().cmp(b.group()));
+ Ok(error_count == 0)
+ }
- let group_count = list.len();
- info!("found {group_count} groups");
+ /// Verify all backups inside a backup group
+ ///
+ /// Errors are logged to the worker log.
+ ///
+ /// Returns
+ /// - Ok((count, failed_dirs)) where failed_dirs had verification errors
+ /// - Err(_) if task was aborted
+ pub fn verify_backup_group(
+ &self,
+ group: &BackupGroup,
+ progress: &mut StoreProgress,
+ upid: &UPID,
+ filter: Option<&dyn Fn(&BackupManifest) -> bool>,
+ ) -> Result<Vec<String>, Error> {
+ let mut errors = Vec::new();
+ let mut list = match group.list_backups() {
+ Ok(list) => list,
+ Err(err) => {
+ info!(
+ "verify {}, group {} - unable to list backups: {}",
+ print_store_and_ns(self.datastore.name(), group.backup_ns()),
+ group.group(),
+ err,
+ );
+ return Ok(errors);
+ }
+ };
- let mut progress = StoreProgress::new(group_count as u64);
+ let snapshot_count = list.len();
+ info!(
+ "verify group {}:{} ({} snapshots)",
+ self.datastore.name(),
+ group.group(),
+ snapshot_count
+ );
- for (pos, group) in list.into_iter().enumerate() {
- progress.done_groups = pos as u64;
- progress.done_snapshots = 0;
- progress.group_snapshots = 0;
+ progress.group_snapshots = snapshot_count as u64;
- let mut group_errors =
- verify_backup_group(verify_worker, &group, &mut progress, upid, filter)?;
- errors.append(&mut group_errors);
+ BackupInfo::sort_list(&mut list, false); // newest first
+ for (pos, info) in list.into_iter().enumerate() {
+ if !self.verify_backup_dir(&info.backup_dir, upid.clone(), filter)? {
+ errors.push(print_ns_and_snapshot(
+ info.backup_dir.backup_ns(),
+ info.backup_dir.as_ref(),
+ ));
+ }
+ progress.done_snapshots = pos as u64 + 1;
+ info!("percentage done: {progress}");
+ }
+ Ok(errors)
}
- Ok(errors)
-}
+ /// Verify all (owned) backups inside a datastore
+ ///
+ /// Errors are logged to the worker log.
+ ///
+ /// Returns
+ /// - Ok(failed_dirs) where failed_dirs had verification errors
+ /// - Err(_) if task was aborted
+ pub fn verify_all_backups(
+ &self,
+ upid: &UPID,
+ ns: BackupNamespace,
+ max_depth: Option<usize>,
+ owner: Option<&Authid>,
+ filter: Option<&dyn Fn(&BackupManifest) -> bool>,
+ ) -> Result<Vec<String>, Error> {
+ let mut errors = Vec::new();
+
+ info!("verify datastore {}", self.datastore.name());
+
+ let owner_filtered = if let Some(owner) = &owner {
+ info!("limiting to backups owned by {owner}");
+ true
+ } else {
+ false
+ };
+
+ // FIXME: This should probably simply enable recursion (or the call have a recursion parameter)
+ let store = &self.datastore;
+ let max_depth = max_depth.unwrap_or(pbs_api_types::MAX_NAMESPACE_DEPTH);
+
+ let mut list = match ListAccessibleBackupGroups::new_with_privs(
+ store,
+ ns.clone(),
+ max_depth,
+ Some(PRIV_DATASTORE_VERIFY),
+ Some(PRIV_DATASTORE_BACKUP),
+ owner,
+ ) {
+ Ok(list) => list
+ .filter_map(|group| match group {
+ Ok(group) => Some(group),
+ Err(err) if owner_filtered => {
+ // intentionally not in task log, the user might not see this group!
+ println!("error on iterating groups in ns '{ns}' - {err}");
+ None
+ }
+ Err(err) => {
+ // we don't filter by owner, but we want to log the error
+ info!("error on iterating groups in ns '{ns}' - {err}");
+ errors.push(err.to_string());
+ None
+ }
+ })
+ .filter(|group| {
+ !(group.backup_type() == BackupType::Host && group.backup_id() == "benchmark")
+ })
+ .collect::<Vec<BackupGroup>>(),
+ Err(err) => {
+ info!("unable to list backups: {err}");
+ return Ok(errors);
+ }
+ };
+
+ list.sort_unstable_by(|a, b| a.group().cmp(b.group()));
+
+ let group_count = list.len();
+ info!("found {group_count} groups");
-/// Filter out any snapshot from being (re-)verified where this fn returns false.
-pub fn verify_filter(
- ignore_verified_snapshots: bool,
- outdated_after: Option<i64>,
- manifest: &BackupManifest,
-) -> bool {
- if !ignore_verified_snapshots {
- return true;
+ let mut progress = StoreProgress::new(group_count as u64);
+
+ for (pos, group) in list.into_iter().enumerate() {
+ progress.done_groups = pos as u64;
+ progress.done_snapshots = 0;
+ progress.group_snapshots = 0;
+
+ let mut group_errors = self.verify_backup_group(&group, &mut progress, upid, filter)?;
+ errors.append(&mut group_errors);
+ }
+
+ Ok(errors)
}
- match manifest.verify_state() {
- Err(err) => {
- warn!("error reading manifest: {err:#}");
- true
+ /// Filter out any snapshot from being (re-)verified where this fn returns false.
+ pub fn verify_filter(
+ ignore_verified_snapshots: bool,
+ outdated_after: Option<i64>,
+ manifest: &BackupManifest,
+ ) -> bool {
+ if !ignore_verified_snapshots {
+ return true;
}
- Ok(None) => true, // no last verification, always include
- Ok(Some(last_verify)) => {
- match outdated_after {
- None => false, // never re-verify if ignored and no max age
- Some(max_age) => {
- let now = proxmox_time::epoch_i64();
- let days_since_last_verify = (now - last_verify.upid.starttime) / 86400;
-
- days_since_last_verify > max_age
+
+ match manifest.verify_state() {
+ Err(err) => {
+ warn!("error reading manifest: {err:#}");
+ true
+ }
+ Ok(None) => true, // no last verification, always include
+ Ok(Some(last_verify)) => {
+ match outdated_after {
+ None => false, // never re-verify if ignored and no max age
+ Some(max_age) => {
+ let now = proxmox_time::epoch_i64();
+ let days_since_last_verify = (now - last_verify.upid.starttime) / 86400;
+
+ days_since_last_verify > max_age
+ }
}
}
}
diff --git a/src/server/verify_job.rs b/src/server/verify_job.rs
index a15a257da..95a7b2a9b 100644
--- a/src/server/verify_job.rs
+++ b/src/server/verify_job.rs
@@ -5,10 +5,7 @@ use pbs_api_types::{Authid, Operation, VerificationJobConfig};
use pbs_datastore::DataStore;
use proxmox_rest_server::WorkerTask;
-use crate::{
- backup::{verify_all_backups, verify_filter},
- server::jobstate::Job,
-};
+use crate::{backup::VerifyWorker, server::jobstate::Job};
/// Runs a verification job.
pub fn do_verification_job(
@@ -44,15 +41,14 @@ pub fn do_verification_job(
None => Default::default(),
};
- let verify_worker = crate::backup::VerifyWorker::new(worker.clone(), datastore);
- let result = verify_all_backups(
- &verify_worker,
+ let verify_worker = VerifyWorker::new(worker.clone(), datastore);
+ let result = verify_worker.verify_all_backups(
worker.upid(),
ns,
verification_job.max_depth,
None,
Some(&move |manifest| {
- verify_filter(ignore_verified_snapshots, outdated_after, manifest)
+ VerifyWorker::verify_filter(ignore_verified_snapshots, outdated_after, manifest)
}),
);
let job_result = match result {
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 07/42] s3 client: add crate for AWS S3 compatible object store client
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (5 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 06/42] verify: refactor verify related functions to be methods of worker Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 08/42] s3 client: implement AWS signature v4 request authentication Christian Ebner
` (36 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Adds the client to connect to an AWS S3 compatible object store API.
Force the use of an TLS encrypted connection as the communication
with the object store will contain sensitive information.
For self-signed certificates, check the fingerprint against the one
configured. This follows along the lines of the PBS client, used to
connect to the PBS server API.
The `S3Client` stores the client state and has to be configured upon
instantiation by providing `S3ClientOptions`.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Cargo.toml | 3 +
pbs-s3-client/Cargo.toml | 16 +++++
pbs-s3-client/src/client.rs | 131 ++++++++++++++++++++++++++++++++++++
pbs-s3-client/src/lib.rs | 2 +
4 files changed, 152 insertions(+)
create mode 100644 pbs-s3-client/Cargo.toml
create mode 100644 pbs-s3-client/src/client.rs
create mode 100644 pbs-s3-client/src/lib.rs
diff --git a/Cargo.toml b/Cargo.toml
index 6de6a6527..c2b0029ac 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -36,6 +36,7 @@ members = [
"pbs-fuse-loop",
"pbs-key-config",
"pbs-pxar-fuse",
+ "pbs-s3-client",
"pbs-tape",
"pbs-tools",
@@ -105,6 +106,7 @@ pbs-datastore = { path = "pbs-datastore" }
pbs-fuse-loop = { path = "pbs-fuse-loop" }
pbs-key-config = { path = "pbs-key-config" }
pbs-pxar-fuse = { path = "pbs-pxar-fuse" }
+pbs-s3-client = { path = "pbs-s3-client" }
pbs-tape = { path = "pbs-tape" }
pbs-tools = { path = "pbs-tools" }
@@ -245,6 +247,7 @@ pbs-client.workspace = true
pbs-config.workspace = true
pbs-datastore.workspace = true
pbs-key-config.workspace = true
+pbs-s3-client.workspace = true
pbs-tape.workspace = true
pbs-tools.workspace = true
proxmox-rrd.workspace = true
diff --git a/pbs-s3-client/Cargo.toml b/pbs-s3-client/Cargo.toml
new file mode 100644
index 000000000..1999c3323
--- /dev/null
+++ b/pbs-s3-client/Cargo.toml
@@ -0,0 +1,16 @@
+[package]
+name = "pbs-s3-client"
+version = "0.1.0"
+authors.workspace = true
+edition.workspace = true
+description = "low level client for AWS S3 compatible object stores"
+rust-version.workspace = true
+
+[dependencies]
+anyhow.workspace = true
+hex = { workspace = true, features = [ "serde" ] }
+hyper.workspace = true
+openssl.workspace = true
+tracing.workspace = true
+
+proxmox-http.workspace = true
diff --git a/pbs-s3-client/src/client.rs b/pbs-s3-client/src/client.rs
new file mode 100644
index 000000000..e001cc7b0
--- /dev/null
+++ b/pbs-s3-client/src/client.rs
@@ -0,0 +1,131 @@
+use std::sync::{Arc, Mutex};
+use std::time::Duration;
+
+use anyhow::{bail, format_err, Context, Error};
+use hyper::client::{Client, HttpConnector};
+use hyper::http::uri::Authority;
+use hyper::Body;
+use openssl::hash::MessageDigest;
+use openssl::ssl::{SslConnector, SslMethod, SslVerifyMode};
+use openssl::x509::X509StoreContextRef;
+use tracing::error;
+
+use proxmox_http::client::HttpsConnector;
+
+const S3_HTTP_CONNECT_TIMEOUT: Duration = Duration::from_secs(10);
+const S3_TCP_KEEPALIVE_TIME: u32 = 120;
+
+/// Configuration options for client
+pub struct S3ClientOptions {
+ pub host: String,
+ pub port: Option<u16>,
+ pub bucket: String,
+ pub secret_key: String,
+ pub access_key: String,
+ pub region: String,
+ pub fingerprint: Option<String>,
+}
+
+/// S3 client for object stores compatible with the AWS S3 API
+pub struct S3Client {
+ client: Client<HttpsConnector>,
+ options: S3ClientOptions,
+ authority: Authority,
+}
+
+impl S3Client {
+ pub fn new(options: S3ClientOptions) -> Result<Self, Error> {
+ let expected_fingerprint = options.fingerprint.clone();
+ let verified_fingerprint = Arc::new(Mutex::new(None));
+ let trust_openssl_valid = Arc::new(Mutex::new(true));
+ let mut ssl_connector_builder = SslConnector::builder(SslMethod::tls())?;
+ ssl_connector_builder.set_verify_callback(
+ SslVerifyMode::PEER,
+ move |openssl_valid, context| match Self::verify_certificate_fingerprint(
+ openssl_valid,
+ context,
+ expected_fingerprint.clone(),
+ trust_openssl_valid.clone(),
+ ) {
+ Ok(None) => true,
+ Ok(Some(fingerprint)) => {
+ *verified_fingerprint.lock().unwrap() = Some(fingerprint);
+ true
+ }
+ Err(err) => {
+ error!("certificate validation failed {err:#}");
+ false
+ }
+ },
+ );
+
+ let mut http_connector = HttpConnector::new();
+ // want communication to object store backend api to always use https
+ http_connector.enforce_http(false);
+ http_connector.set_connect_timeout(Some(S3_HTTP_CONNECT_TIMEOUT));
+ let https_connector = HttpsConnector::with_connector(
+ http_connector,
+ ssl_connector_builder.build(),
+ S3_TCP_KEEPALIVE_TIME,
+ );
+ let client = Client::builder().build::<_, Body>(https_connector);
+ let authority = if let Some(port) = options.port {
+ format!("{}:{port}", options.host)
+ } else {
+ options.host.clone()
+ };
+ let authority = Authority::try_from(authority)?;
+
+ Ok(Self {
+ client,
+ options,
+ authority,
+ })
+ }
+
+ fn verify_certificate_fingerprint(
+ openssl_valid: bool,
+ context: &mut X509StoreContextRef,
+ expected_fingerprint: Option<String>,
+ trust_openssl: Arc<Mutex<bool>>,
+ ) -> Result<Option<String>, Error> {
+ let mut trust_openssl_valid = trust_openssl.lock().unwrap();
+
+ // only rely on openssl prevalidation if was not forced earlier
+ if openssl_valid && *trust_openssl_valid {
+ return Ok(None);
+ }
+
+ let certificate = match context.current_cert() {
+ Some(certificate) => certificate,
+ None => bail!("context lacks current certificate."),
+ };
+
+ if context.error_depth() > 0 {
+ *trust_openssl_valid = false;
+ return Ok(None);
+ }
+
+ let certificate_digest = certificate
+ .digest(MessageDigest::sha256())
+ .context("failed to calculate certificate digest")?;
+ let certificate_fingerprint = hex::encode(certificate_digest);
+ let certificate_fingerprint = certificate_fingerprint
+ .as_bytes()
+ .chunks(2)
+ .map(|v| std::str::from_utf8(v).unwrap())
+ .collect::<Vec<&str>>()
+ .join(":");
+
+ if let Some(expected_fingerprint) = expected_fingerprint {
+ let expected_fingerprint = expected_fingerprint.to_lowercase();
+ if expected_fingerprint == certificate_fingerprint {
+ return Ok(Some(certificate_fingerprint));
+ }
+ }
+
+ Err(format_err!(
+ "unexpected certificate fingerprint {certificate_fingerprint}"
+ ))
+ }
+}
diff --git a/pbs-s3-client/src/lib.rs b/pbs-s3-client/src/lib.rs
new file mode 100644
index 000000000..533ceab8e
--- /dev/null
+++ b/pbs-s3-client/src/lib.rs
@@ -0,0 +1,2 @@
+mod client;
+pub use client::{S3Client, S3ClientOptions};
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 08/42] s3 client: implement AWS signature v4 request authentication
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (6 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 07/42] s3 client: add crate for AWS S3 compatible object store client Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 09/42] s3 client: add dedicated type for s3 object keys Christian Ebner
` (35 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
The S3 API authenticates client requests by checking the
authentication signature provided by the requests `Authorization`
header. The latest AWS signature v4 signature is required for the
newest AWS regions [0] and most widely adapted [1-4], so rely soly on
that, not implementing older versions.
Adds helper methods to sign client requests, this includes encoding
and normalization of the headers, digest calculation of the request body
(if any) and signature generation.
[0] https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html
[1] https://docs.ceph.com/en/reef/radosgw/s3/authentication/#aws-signature-v4
[2] https://cloud.google.com/storage/docs/interoperability
[3] https://docs.wasabi.com/v1/docs/how-do-i-use-aws-signature-version-4-with-wasabi
[4] https://min.io/product/s3-compatibility
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-s3-client/Cargo.toml | 2 +
pbs-s3-client/src/aws_sign_v4.rs | 140 +++++++++++++++++++++++++++++++
pbs-s3-client/src/lib.rs | 1 +
3 files changed, 143 insertions(+)
create mode 100644 pbs-s3-client/src/aws_sign_v4.rs
diff --git a/pbs-s3-client/Cargo.toml b/pbs-s3-client/Cargo.toml
index 1999c3323..11189ea50 100644
--- a/pbs-s3-client/Cargo.toml
+++ b/pbs-s3-client/Cargo.toml
@@ -12,5 +12,7 @@ hex = { workspace = true, features = [ "serde" ] }
hyper.workspace = true
openssl.workspace = true
tracing.workspace = true
+url.workspace = true
proxmox-http.workspace = true
+proxmox-time.workspace = true
diff --git a/pbs-s3-client/src/aws_sign_v4.rs b/pbs-s3-client/src/aws_sign_v4.rs
new file mode 100644
index 000000000..8a538e868
--- /dev/null
+++ b/pbs-s3-client/src/aws_sign_v4.rs
@@ -0,0 +1,140 @@
+//! Helpers for request authentication using AWS signature version 4
+
+use anyhow::Error;
+use hyper::Body;
+use hyper::Request;
+use openssl::hash::MessageDigest;
+use openssl::pkey::{PKey, Private};
+use openssl::sha::sha256;
+use openssl::sign::Signer;
+use url::Url;
+
+use super::client::S3ClientOptions;
+
+pub(crate) const AWS_SIGN_V4_DATETIME_FORMAT: &str = "%Y%m%dT%H%M%SZ";
+
+const AWS_SIGN_V4_DATE_FORMAT: &str = "%Y%m%d";
+const AWS_SIGN_V4_SERVICE_S3: &str = "s3";
+const AWS_SIGN_V4_REQUEST_POSTFIX: &str = "aws4_request";
+
+/// Generate signature for S3 request authentication using AWS signature version 4.
+/// See: https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html
+pub(crate) fn aws_sign_v4_signature(
+ request: &Request<Body>,
+ options: &S3ClientOptions,
+ epoch: i64,
+ payload_digest: &str,
+) -> Result<String, Error> {
+ // Include all headers in signature calculation since the reference docs note:
+ // "For the purpose of calculating an authorization signature, only the 'host' and any 'x-amz-*'
+ // headers are required. however, in order to prevent data tampering, you should consider
+ // including all the headers in the signature calculation."
+ // See https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html
+ let mut canonical_headers = Vec::new();
+ let mut signed_headers = Vec::new();
+ for (key, value) in request.headers() {
+ canonical_headers.push(format!(
+ "{}:{}",
+ // Header name has to be lower case, key.as_str() does guarantee that, see
+ // https://docs.rs/http/0.2.0/http/header/struct.HeaderName.html
+ key.as_str(),
+ // No need to trim since `HeaderValue` only allows visible UTF8 chars, see
+ // https://docs.rs/http/0.2.0/http/header/struct.HeaderValue.html
+ value.to_str()?,
+ ));
+ signed_headers.push(key.as_str());
+ }
+ canonical_headers.sort();
+ signed_headers.sort();
+ let signed_headers_string = signed_headers.join(";");
+
+ let mut canonical_queries = Url::parse(&request.uri().to_string())?
+ .query_pairs()
+ .map(|(key, value)| {
+ format!(
+ "{}={}",
+ aws_sign_v4_uri_encode(&key, false),
+ aws_sign_v4_uri_encode(&value, false),
+ )
+ })
+ .collect::<Vec<String>>();
+ canonical_queries.sort();
+
+ let canonical_request = format!(
+ "{}\n{}\n{}\n{}\n\n{}\n{}",
+ request.method().as_str(),
+ request.uri().path(),
+ canonical_queries.join("&"),
+ canonical_headers.join("\n"),
+ signed_headers_string,
+ payload_digest,
+ );
+
+ let date = proxmox_time::strftime_utc(AWS_SIGN_V4_DATE_FORMAT, epoch)?;
+ let datetime = proxmox_time::strftime_utc(AWS_SIGN_V4_DATETIME_FORMAT, epoch)?;
+
+ let credential_scope = format!(
+ "{date}/{}/{AWS_SIGN_V4_SERVICE_S3}/{AWS_SIGN_V4_REQUEST_POSTFIX}",
+ options.region,
+ );
+ let canonical_request_hash = hex::encode(sha256(canonical_request.as_bytes()));
+ let string_to_sign =
+ format!("AWS4-HMAC-SHA256\n{datetime}\n{credential_scope}\n{canonical_request_hash}");
+
+ let date_sign_key = PKey::hmac(format!("AWS4{}", options.secret_key).as_bytes())?;
+ let date_tag = hmac_sha256(&date_sign_key, date.as_bytes())?;
+
+ let region_sign_key = PKey::hmac(&date_tag)?;
+ let region_tag = hmac_sha256(®ion_sign_key, options.region.as_bytes())?;
+
+ let service_sign_key = PKey::hmac(®ion_tag)?;
+ let service_tag = hmac_sha256(&service_sign_key, AWS_SIGN_V4_SERVICE_S3.as_bytes())?;
+
+ let signing_key = PKey::hmac(&service_tag)?;
+ let signing_tag = hmac_sha256(&signing_key, AWS_SIGN_V4_REQUEST_POSTFIX.as_bytes())?;
+
+ let signature_key = PKey::hmac(&signing_tag)?;
+ let signature = hmac_sha256(&signature_key, string_to_sign.as_bytes())?;
+ let signature = hex::encode(&signature);
+
+ Ok(format!(
+ "AWS4-HMAC-SHA256 Credential={}/{credential_scope},SignedHeaders={signed_headers_string},Signature={signature}",
+ options.access_key,
+ ))
+}
+// Custom `uri_encode` implementation as recommended by AWS docs, since possible implementation
+// incompatibility with uri encoding libraries.
+// See: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html
+pub(crate) fn aws_sign_v4_uri_encode(input: &str, is_object_key_name: bool) -> String {
+ // Assume up to 2 bytes per char max in output
+ let mut accumulator = String::with_capacity(2 * input.len());
+
+ input.chars().for_each(|char| {
+ match char {
+ // Unreserved characters, do not uri encode these bytes
+ 'A'..='Z' | 'a'..='z' | '0'..='9' | '-' | '.' | '_' | '~' => accumulator.push(char),
+ // Space character is reserved, must be encoded as '%20', not '+'
+ ' ' => accumulator.push_str("%20"),
+ // Encode the forward slash character, '/', everywhere except in the object key name
+ '/' if !is_object_key_name => accumulator.push_str("%2F"),
+ '/' if is_object_key_name => accumulator.push(char),
+ // URI encoded byte is formed by a '%' and the two-digit hexadecimal value of the byte
+ // Letters in the hexadecimal value must be uppercase
+ _ => {
+ for byte in char.to_string().as_bytes() {
+ accumulator.push_str(&format!("%{byte:02X}"));
+ }
+ }
+ }
+ });
+
+ accumulator
+}
+
+// Helper for hmac sha256 calculation
+fn hmac_sha256(key: &PKey<Private>, data: &[u8]) -> Result<Vec<u8>, Error> {
+ let mut signer = Signer::new(MessageDigest::sha256(), key)?;
+ signer.update(data)?;
+ let hmac = signer.sign_to_vec()?;
+ Ok(hmac)
+}
diff --git a/pbs-s3-client/src/lib.rs b/pbs-s3-client/src/lib.rs
index 533ceab8e..5a60b92ec 100644
--- a/pbs-s3-client/src/lib.rs
+++ b/pbs-s3-client/src/lib.rs
@@ -1,2 +1,3 @@
+mod aws_sign_v4;
mod client;
pub use client::{S3Client, S3ClientOptions};
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 09/42] s3 client: add dedicated type for s3 object keys
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (7 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 08/42] s3 client: implement AWS signature v4 request authentication Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 10/42] s3 client: add type for last modified timestamp in responses Christian Ebner
` (34 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
S3 objects are uniquely identified within a bucket by their object
key [0].
Implements conversion and utility traits to easily convert and encode
a string or a chunk digest as corresponding object key for the S3
storage backend. Adds type checking for s3 client operations requiring
an object key.
[0] https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-s3-client/src/lib.rs | 2 ++
pbs-s3-client/src/object_key.rs | 64 +++++++++++++++++++++++++++++++++
2 files changed, 66 insertions(+)
create mode 100644 pbs-s3-client/src/object_key.rs
diff --git a/pbs-s3-client/src/lib.rs b/pbs-s3-client/src/lib.rs
index 5a60b92ec..a4081df15 100644
--- a/pbs-s3-client/src/lib.rs
+++ b/pbs-s3-client/src/lib.rs
@@ -1,3 +1,5 @@
mod aws_sign_v4;
mod client;
pub use client::{S3Client, S3ClientOptions};
+mod object_key;
+pub use object_key::{S3ObjectKey, S3_CONTENT_PREFIX};
diff --git a/pbs-s3-client/src/object_key.rs b/pbs-s3-client/src/object_key.rs
new file mode 100644
index 000000000..362c0cd55
--- /dev/null
+++ b/pbs-s3-client/src/object_key.rs
@@ -0,0 +1,64 @@
+use crate::aws_sign_v4::aws_sign_v4_uri_encode;
+
+pub const S3_CONTENT_PREFIX: &str = ".content";
+
+#[derive(Clone)]
+pub struct S3ObjectKey {
+ object_key: String,
+}
+
+// All regular keys (non-digests) get prefixed by a `/.contents`, so that
+// content listing without all the chunks can be done by that prefix.
+impl core::convert::From<&str> for S3ObjectKey {
+ fn from(object_key: &str) -> Self {
+ let object_key = object_key.strip_prefix("/").unwrap_or(object_key);
+ let object_key = format!(
+ "/{S3_CONTENT_PREFIX}/{object_key}",
+ object_key = aws_sign_v4_uri_encode(object_key, true)
+ );
+
+ Self { object_key }
+ }
+}
+
+impl core::convert::From<&[u8; 32]> for S3ObjectKey {
+ fn from(digest: &[u8; 32]) -> Self {
+ // Use the same layout as on regular PBS datastores, including the 4 hex digit prefix
+ let object_key = hex::encode(digest);
+ let prefix = &object_key[..4];
+ Self {
+ object_key: format!("/.chunks/{prefix}/{object_key}"),
+ }
+ }
+}
+
+impl core::convert::From<[u8; 32]> for S3ObjectKey {
+ fn from(digest: [u8; 32]) -> Self {
+ Self::from(&digest)
+ }
+}
+
+impl std::ops::Deref for S3ObjectKey {
+ type Target = str;
+
+ fn deref(&self) -> &Self::Target {
+ &self.object_key
+ }
+}
+
+impl std::fmt::Display for S3ObjectKey {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ write!(f, "{}", self.object_key)
+ }
+}
+
+impl S3ObjectKey {
+ /// Generate source key for copy object operations given the source bucket.
+ pub fn to_copy_source_key(&self, source_bucket: &str) -> Self {
+ Self {
+ // object key already contains the required separator slash in-between source bucket
+ // and source object key.
+ object_key: format!("{source_bucket}{}", self.object_key),
+ }
+ }
+}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 10/42] s3 client: add type for last modified timestamp in responses
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (8 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 09/42] s3 client: add dedicated type for s3 object keys Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 11/42] s3 client: add helper to parse http date headers Christian Ebner
` (33 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Adds a helper to parse modified timestamps as encountered in s3 list
objects v2 and copy object api calls.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Cargo.toml | 2 ++
pbs-s3-client/Cargo.toml | 3 +++
pbs-s3-client/src/lib.rs | 20 ++++++++++++++++++++
3 files changed, 25 insertions(+)
diff --git a/Cargo.toml b/Cargo.toml
index c2b0029ac..aaa79c2aa 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -131,6 +131,7 @@ handlebars = "3.0"
hex = "0.4.3"
hickory-resolver = { version = "0.24.1", default-features = false, features = [ "system-config", "tokio-runtime" ] }
hyper = { version = "0.14", features = [ "backports", "deprecated", "full" ] }
+iso8601 = "0.4.1"
libc = "0.2"
log = "0.4.17"
nix = "0.26.1"
@@ -144,6 +145,7 @@ regex = "1.5.5"
rustyline = "9"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
+serde_plain = "1.0"
siphasher = "0.3"
syslog = "6"
tar = "0.4"
diff --git a/pbs-s3-client/Cargo.toml b/pbs-s3-client/Cargo.toml
index 11189ea50..3261a32bb 100644
--- a/pbs-s3-client/Cargo.toml
+++ b/pbs-s3-client/Cargo.toml
@@ -10,7 +10,10 @@ rust-version.workspace = true
anyhow.workspace = true
hex = { workspace = true, features = [ "serde" ] }
hyper.workspace = true
+iso8601.workspace = true
openssl.workspace = true
+serde.workspace = true
+serde_plain.workspace = true
tracing.workspace = true
url.workspace = true
diff --git a/pbs-s3-client/src/lib.rs b/pbs-s3-client/src/lib.rs
index a4081df15..dbe4bebcc 100644
--- a/pbs-s3-client/src/lib.rs
+++ b/pbs-s3-client/src/lib.rs
@@ -3,3 +3,23 @@ mod client;
pub use client::{S3Client, S3ClientOptions};
mod object_key;
pub use object_key::{S3ObjectKey, S3_CONTENT_PREFIX};
+
+use std::time::Duration;
+
+use anyhow::{anyhow, bail, Error};
+
+#[derive(Debug)]
+pub struct LastModifiedTimestamp {
+ _datetime: iso8601::DateTime,
+}
+
+impl std::str::FromStr for LastModifiedTimestamp {
+ type Err = Error;
+
+ fn from_str(timestamp: &str) -> Result<Self, Self::Err> {
+ let _datetime = iso8601::datetime(timestamp).map_err(|err| anyhow!(err))?;
+ Ok(Self { _datetime })
+ }
+}
+
+serde_plain::derive_deserialize_from_fromstr!(LastModifiedTimestamp, "last modified timestamp");
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 11/42] s3 client: add helper to parse http date headers
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (9 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 10/42] s3 client: add type for last modified timestamp in responses Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 12/42] s3 client: implement methods to operate on s3 objects in bucket Christian Ebner
` (32 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Add a helper to parse the preferred date/time format for http `Date`
headers as specified in RFC 2616 [0], which is a fixed-length subset
of the format specified in RFC 1123 [1], itself being a followup to
RFC 822 [2]. Does not implement the format as described in the
obsolete RFC 850 [3].
This allows to parse the `Date` and `Last-Modified` headers of S3 API
responses.
[0] https://datatracker.ietf.org/doc/html/rfc2616#section-3.3
[1] https://datatracker.ietf.org/doc/html/rfc1123#section-5.2.14
[2] https://datatracker.ietf.org/doc/html/rfc822#section-5
[3] https://datatracker.ietf.org/doc/html/rfc850
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-s3-client/src/lib.rs | 97 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 96 insertions(+), 1 deletion(-)
diff --git a/pbs-s3-client/src/lib.rs b/pbs-s3-client/src/lib.rs
index dbe4bebcc..b3e539bdd 100644
--- a/pbs-s3-client/src/lib.rs
+++ b/pbs-s3-client/src/lib.rs
@@ -6,7 +6,12 @@ pub use object_key::{S3ObjectKey, S3_CONTENT_PREFIX};
use std::time::Duration;
-use anyhow::{anyhow, bail, Error};
+use anyhow::{anyhow, bail, Context, Error};
+
+const VALID_DAYS_OF_WEEK: [&str; 7] = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"];
+const VALID_MONTHS: [&str; 12] = [
+ "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec",
+];
#[derive(Debug)]
pub struct LastModifiedTimestamp {
@@ -23,3 +28,93 @@ impl std::str::FromStr for LastModifiedTimestamp {
}
serde_plain::derive_deserialize_from_fromstr!(LastModifiedTimestamp, "last modified timestamp");
+
+/// Preferred date format specified by RFC2616, given as fixed-length
+/// subset of RFC1123, which itself is a followup to RFC822.
+///
+/// https://datatracker.ietf.org/doc/html/rfc2616#section-3.3
+/// https://datatracker.ietf.org/doc/html/rfc1123#section-5.2.14
+/// https://datatracker.ietf.org/doc/html/rfc822#section-5
+#[derive(Debug)]
+pub struct HttpDate {
+ epoch: i64,
+}
+
+impl HttpDate {
+ pub fn to_duration(&self) -> Result<Duration, Error> {
+ let seconds = u64::try_from(self.epoch)?;
+ Ok(Duration::from_secs(seconds))
+ }
+}
+
+impl std::str::FromStr for HttpDate {
+ type Err = Error;
+
+ fn from_str(timestamp: &str) -> Result<Self, Self::Err> {
+ let input = timestamp.as_bytes();
+ if input.len() != 29 {
+ bail!("unexpected length: got {}", input.len());
+ }
+
+ let expect = |pos: usize, c: u8| {
+ if input[pos] != c {
+ bail!("unexpected char at pos {pos}");
+ }
+ Ok(())
+ };
+
+ let digit = |pos: usize| -> Result<i32, Error> {
+ let digit = input[pos] as i32;
+ if !(48..=57).contains(&digit) {
+ bail!("unexpected char at pos {pos}");
+ }
+ Ok(digit - 48)
+ };
+
+ fn check_max(i: i32, max: i32) -> Result<i32, Error> {
+ if i > max {
+ bail!("value too large ({i} > {max})");
+ }
+ Ok(i)
+ }
+
+ let mut tm = proxmox_time::TmEditor::new(true);
+
+ if !VALID_DAYS_OF_WEEK
+ .iter()
+ .any(|valid| valid.as_bytes() == &input[0..3])
+ {
+ bail!("unexpected day of week, got {:?}", &input[0..3]);
+ }
+
+ expect(3, b',').context("unexpected separator after day of week")?;
+ expect(4, b' ').context("missing space after day of week separator")?;
+ tm.set_mday(check_max(digit(5)? * 10 + digit(6)?, 31)?)?;
+ expect(7, b' ').context("unexpected separator after day")?;
+ if let Some(month) = VALID_MONTHS
+ .iter()
+ .position(|month| month.as_bytes() == &input[8..11])
+ {
+ // valid conversion to i32, position stems from fixed size array of 12 months.
+ tm.set_mon(check_max(month as i32 + 1, 12)?)?;
+ } else {
+ bail!("invalid month");
+ }
+ expect(11, b' ').context("unexpected separator after month")?;
+ tm.set_year(digit(12)? * 1000 + digit(13)? * 100 + digit(14)? * 10 + digit(15)?)?;
+ expect(16, b' ').context("unexpected separator after year")?;
+ tm.set_hour(check_max(digit(17)? * 10 + digit(18)?, 23)?)?;
+ expect(19, b':').context("unexpected separator after hour")?;
+ tm.set_min(check_max(digit(20)? * 10 + digit(21)?, 59)?)?;
+ expect(22, b':').context("unexpected separator after minute")?;
+ tm.set_sec(check_max(digit(23)? * 10 + digit(24)?, 60)?)?;
+ expect(25, b' ').context("unexpected separator after second")?;
+ if !input.ends_with(b"GMT") {
+ bail!("unexpected timezone");
+ }
+
+ let epoch = tm.into_epoch()?;
+
+ Ok(Self { epoch })
+ }
+}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 12/42] s3 client: implement methods to operate on s3 objects in bucket
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (10 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 11/42] s3 client: add helper to parse http date headers Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 13/42] config: introduce s3 object store client configuration Christian Ebner
` (31 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Adds the basic implementation of the client to use s3 object stores
as backend for PBS datastores.
This implements the basic client actions on a bucket and objects
stored within given bucket.
This is not feature complete and intended to be extended on a
per-demand fashion rather than implementing the whole client at once.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Cargo.toml | 3 +
pbs-s3-client/Cargo.toml | 8 +
pbs-s3-client/src/client.rs | 463 +++++++++++++++++++++++++++
pbs-s3-client/src/lib.rs | 2 +
pbs-s3-client/src/response_reader.rs | 343 ++++++++++++++++++++
5 files changed, 819 insertions(+)
create mode 100644 pbs-s3-client/src/response_reader.rs
diff --git a/Cargo.toml b/Cargo.toml
index aaa79c2aa..1bc3bb88b 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -137,15 +137,18 @@ log = "0.4.17"
nix = "0.26.1"
nom = "7"
num-traits = "0.2"
+md5 = "0.7.0"
once_cell = "1.3.1"
openssl = "0.10.40"
percent-encoding = "2.1"
pin-project-lite = "0.2"
+quick-xml = "0.26"
regex = "1.5.5"
rustyline = "9"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_plain = "1.0"
+serde-xml-rs = "0.5"
siphasher = "0.3"
syslog = "6"
tar = "0.4"
diff --git a/pbs-s3-client/Cargo.toml b/pbs-s3-client/Cargo.toml
index 3261a32bb..d20cd8f38 100644
--- a/pbs-s3-client/Cargo.toml
+++ b/pbs-s3-client/Cargo.toml
@@ -8,12 +8,20 @@ rust-version.workspace = true
[dependencies]
anyhow.workspace = true
+base64.workspace = true
+bytes.workspace = true
+futures.workspace = true
hex = { workspace = true, features = [ "serde" ] }
hyper.workspace = true
iso8601.workspace = true
+md5.workspace = true
openssl.workspace = true
+quick-xml = { workspace = true, features = ["async-tokio"] }
serde.workspace = true
serde_plain.workspace = true
+serde-xml-rs.workspace = true
+tokio = { workspace = true, features = [] }
+tokio-util = { workspace = true, features = ["compat"] }
tracing.workspace = true
url.workspace = true
diff --git a/pbs-s3-client/src/client.rs b/pbs-s3-client/src/client.rs
index e001cc7b0..b7ca4e298 100644
--- a/pbs-s3-client/src/client.rs
+++ b/pbs-s3-client/src/client.rs
@@ -1,17 +1,36 @@
+use std::collections::HashMap;
+use std::io::Cursor;
use std::sync::{Arc, Mutex};
use std::time::Duration;
use anyhow::{bail, format_err, Context, Error};
+use bytes::{Bytes, BytesMut};
+use hyper::body::HttpBody;
use hyper::client::{Client, HttpConnector};
+use hyper::http::method::Method;
use hyper::http::uri::Authority;
+use hyper::http::StatusCode;
+use hyper::http::{header, HeaderValue, Uri};
use hyper::Body;
+use hyper::{Request, Response};
use openssl::hash::MessageDigest;
+use openssl::sha::Sha256;
use openssl::ssl::{SslConnector, SslMethod, SslVerifyMode};
use openssl::x509::X509StoreContextRef;
+use quick_xml::events::BytesText;
+use quick_xml::writer::Writer;
use tracing::error;
use proxmox_http::client::HttpsConnector;
+use crate::aws_sign_v4::aws_sign_v4_signature;
+use crate::aws_sign_v4::AWS_SIGN_V4_DATETIME_FORMAT;
+use crate::object_key::S3ObjectKey;
+use crate::response_reader::{
+ CopyObjectResponse, DeleteObjectsResponse, GetObjectResponse, HeadObjectResponse,
+ ListObjectsV2Response, PutObjectResponse, ResponseReader,
+};
+
const S3_HTTP_CONNECT_TIMEOUT: Duration = Duration::from_secs(10);
const S3_TCP_KEEPALIVE_TIME: u32 = 120;
@@ -128,4 +147,448 @@ impl S3Client {
"unexpected certificate fingerprint {certificate_fingerprint}"
))
}
+
+ async fn prepare(&self, mut request: Request<Body>) -> Result<Request<Body>, Error> {
+ let host_header = request
+ .uri()
+ .authority()
+ .ok_or_else(|| format_err!("request missing authority"))?
+ .to_string();
+
+ // Content verification for aws s3 signature
+ let mut hasher = Sha256::new();
+ // Load payload into memory, needed as the hash and checksum have to be calculated a-priori
+ let buffer: Bytes = {
+ let body = request.body_mut();
+ let mut buf = BytesMut::with_capacity(body.size_hint().lower() as usize);
+ while let Some(chunk) = body.data().await {
+ let chunk = chunk?;
+ hasher.update(&chunk);
+ buf.extend_from_slice(&chunk);
+ }
+ buf.freeze()
+ };
+ // Use MD5 as upload integrity check, as other methods are not supported by all S3 object
+ // store providers and might be ignored and this is recommended by AWS as described in
+ // https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html#API_PutObject_RequestSyntax
+ let payload_md5 = md5::compute(&buffer);
+ let payload_digest = hex::encode(hasher.finish());
+ let payload_len = buffer.len();
+ *request.body_mut() = Body::from(buffer);
+
+ let epoch = proxmox_time::epoch_i64();
+ let datetime = proxmox_time::strftime_utc(AWS_SIGN_V4_DATETIME_FORMAT, epoch)?;
+
+ request
+ .headers_mut()
+ .insert("x-amz-date", HeaderValue::from_str(&datetime)?);
+ request
+ .headers_mut()
+ .insert("host", HeaderValue::from_str(&host_header)?);
+ request.headers_mut().insert(
+ "x-amz-content-sha256",
+ HeaderValue::from_str(&payload_digest)?,
+ );
+ request.headers_mut().insert(
+ header::CONTENT_LENGTH,
+ HeaderValue::from_str(&payload_len.to_string())?,
+ );
+ if payload_len > 0 {
+ let md5_digest = base64::encode(*payload_md5);
+ request
+ .headers_mut()
+ .insert("Content-MD5", HeaderValue::from_str(&md5_digest)?);
+ }
+
+ let signature = aws_sign_v4_signature(&request, &self.options, epoch, &payload_digest)?;
+
+ request
+ .headers_mut()
+ .insert(header::AUTHORIZATION, HeaderValue::from_str(&signature)?);
+
+ Ok(request)
+ }
+
+ pub async fn send(&self, request: Request<Body>) -> Result<Response<Body>, Error> {
+ let request = self.prepare(request).await?;
+ let response = tokio::time::timeout(S3_HTTP_CONNECT_TIMEOUT, self.client.request(request))
+ .await
+ .context("request timeout")??;
+ Ok(response)
+ }
+
+ /// Check if bucket exists and got permissions to access it.
+ /// See reference docs: https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadBucket.html
+ pub async fn head_bucket(&self) -> Result<(), Error> {
+ let request = Request::builder()
+ .method(Method::HEAD)
+ .uri(self.uri_builder("/")?)
+ .body(Body::empty())?;
+ let response = self.send(request).await?;
+ let (parts, _body) = response.into_parts();
+
+ match parts.status {
+ StatusCode::OK => (),
+ StatusCode::BAD_REQUEST | StatusCode::FORBIDDEN | StatusCode::NOT_FOUND => {
+ bail!("bucket does not exist or no permission to access it")
+ }
+ status_code => bail!("unexpected status code {status_code}"),
+ }
+
+ Ok(())
+ }
+
+ /// Fetch metadata from an object without returning the object itself.
+ /// See reference docs: https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html
+ pub async fn head_object(
+ &self,
+ object_key: S3ObjectKey,
+ ) -> Result<Option<HeadObjectResponse>, Error> {
+ let request = Request::builder()
+ .method(Method::HEAD)
+ .uri(self.uri_builder(&object_key)?)
+ .body(Body::empty())?;
+ let response = self.send(request).await?;
+ let response_reader = ResponseReader::new(response);
+ response_reader.head_object_response().await
+ }
+
+ /// Fetch an object from object store.
+ /// See reference docs: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html
+ pub async fn get_object(
+ &self,
+ object_key: S3ObjectKey,
+ ) -> Result<Option<GetObjectResponse>, Error> {
+ let request = Request::builder()
+ .method(Method::GET)
+ .uri(self.uri_builder(&object_key)?)
+ .body(Body::empty())?;
+
+ let response = self.send(request).await?;
+ let response_reader = ResponseReader::new(response);
+ response_reader.get_object_response().await
+ }
+
+ /// Returns some or all (up to 1,000) of the objects in a bucket with each request.
+ /// See reference docs: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectTagging.html
+ pub async fn list_objects_v2(
+ &self,
+ prefix: Option<&str>,
+ max_keys: Option<u64>,
+ continuation_token: Option<&str>,
+ ) -> Result<ListObjectsV2Response, Error> {
+ let mut path_and_query = String::from("/?list-type=2");
+ if let Some(prefix) = prefix {
+ path_and_query.push_str("&prefix=");
+ path_and_query.push_str(prefix);
+ }
+ if let Some(max_keys) = max_keys {
+ path_and_query.push_str("&max-keys=");
+ path_and_query.push_str(&max_keys.to_string());
+ }
+ if let Some(token) = continuation_token {
+ path_and_query.push_str("&continuation-token=");
+ path_and_query.push_str(token);
+ }
+ let request = Request::builder()
+ .method(Method::GET)
+ .uri(self.uri_builder(&path_and_query)?)
+ .body(Body::empty())?;
+
+ let response = self.send(request).await?;
+ let response_reader = ResponseReader::new(response);
+ response_reader.list_objects_v2_response().await
+ }
+
+ /// Add a new object to a bucket.
+ ///
+ /// Do not reupload if an object with matching key already exists in the bucket.
+ /// See reference docs: https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html
+ pub async fn put_object(
+ &self,
+ object_key: S3ObjectKey,
+ object_data: Body,
+ ) -> Result<PutObjectResponse, Error> {
+ let request = Request::builder()
+ .method(Method::PUT)
+ .uri(self.uri_builder(&object_key)?)
+ .header(header::CONTENT_TYPE, "binary/octet")
+ // Never overwrite pre-existing objects with the same key.
+ //.header(header::IF_NONE_MATCH, "*")
+ .body(object_data)?;
+
+ let response = self.send(request).await?;
+ let response_reader = ResponseReader::new(response);
+ response_reader.put_object_response().await
+ }
+
+ /// Sets the supplied tag-set to an object that already exists in a bucket. A tag is a key-value pair.
+ /// See reference docs: https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectTagging.html
+ pub async fn put_object_tagging(
+ &self,
+ object_key: S3ObjectKey,
+ tagset: &HashMap<String, String>,
+ ) -> Result<bool, Error> {
+ let mut writer = Writer::new(Cursor::new(Vec::new()));
+ writer
+ .create_element("Tagging")
+ .with_attribute(("xmlns", "http://s3.amazonaws.com/doc/2006-03-01/"))
+ .write_inner_content(|writer| {
+ writer
+ .create_element("TagSet")
+ .write_inner_content(|writer| {
+ for (key, value) in tagset.iter() {
+ writer.create_element("Tag").write_inner_content(|writer| {
+ writer
+ .create_element("Key")
+ .write_text_content(BytesText::new(key))?;
+ writer
+ .create_element("Value")
+ .write_text_content(BytesText::new(value))?;
+ Ok(())
+ })?;
+ }
+ Ok(())
+ })?;
+ Ok(())
+ })?;
+
+ let body: Body = writer.into_inner().into_inner().into();
+ let request = Request::builder()
+ .method(Method::PUT)
+ .uri(self.uri_builder(&format!("{object_key}?tagging"))?)
+ .body(body)?;
+
+ let response = self.send(request).await?;
+ Ok(response.status().is_success())
+ }
+
+ /// Sets the supplied tag to an object that already exists in a bucket. A tag is a key-value pair.
+ /// Optimized version of the `put_object_tagging` to only set a single tag.
+ pub async fn put_object_tag(
+ &self,
+ object_key: S3ObjectKey,
+ tag_key: &str,
+ tag_value: &str,
+ ) -> Result<bool, Error> {
+ let body: Body = format!(
+ r#"<Tagging xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
+ <TagSet>
+ <Tag>
+ <Key>{tag_key}</Key>
+ <Value>{tag_value}</Value>
+ </Tag>
+ </TagSet>
+ </Tagging>"#
+ )
+ .into();
+
+ let request = Request::builder()
+ .method(Method::PUT)
+ .uri(self.uri_builder(&format!("{object_key}?tagging"))?)
+ .body(body)?;
+
+ let response = self.send(request).await?;
+ //TODO: Response and error handling!
+ Ok(response.status().is_success())
+ }
+
+ /// Creates a copy of an object that is already stored in Amazon S3.
+ /// Uses the `x-amz-metadata-directive` set to `REPLACE`, therefore resulting in updated metadata.
+ /// See reference docs: https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html
+ pub async fn copy_object(
+ &self,
+ destination_key: S3ObjectKey,
+ source_bucket: &str,
+ source_key: S3ObjectKey,
+ ) -> Result<CopyObjectResponse, Error> {
+ let copy_source = source_key.to_copy_source_key(source_bucket);
+ let request = Request::builder()
+ .method(Method::PUT)
+ .uri(self.uri_builder(&destination_key)?)
+ .header("x-amz-copy-source", HeaderValue::from_str(©_source)?)
+ .header(
+ "x-amz-metadata-directive",
+ HeaderValue::from_str("REPLACE")?,
+ )
+ .body(Body::empty())?;
+
+ let response = self.send(request).await?;
+ let response_reader = ResponseReader::new(response);
+ response_reader.copy_object_response().await
+ }
+
+ /// Helper to update the metadata for an object by copying it to itself. This will not cause
+ /// any additional costs other than the request cost itself.
+ ///
+ /// Note: This will actually create a new object for buckets with versioning enabled.
+ /// Return with error if that is the case, detected by checking the presence of the
+ /// `x-amz-version-id` header in the response.
+ /// See reference docs: https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html
+ pub async fn update_object_metadata(
+ &self,
+ object_key: S3ObjectKey,
+ ) -> Result<CopyObjectResponse, Error> {
+ let response = self
+ .copy_object(object_key.clone(), &self.options.bucket, object_key)
+ .await?;
+ if response.x_amz_version_id.is_some() {
+ // Return an error if the response contains an `x-amz-version-id`, indicating that the
+ // bucket has versioning enabled, as that will bloat the bucket size and therefore cost.
+ bail!("Failed to update object metadata as versioning is enabled");
+ }
+ Ok(response)
+ }
+
+ /// Removes an object from a bucket.
+ /// See reference docs: https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObject.html
+ pub async fn delete_object(&self, object_key: S3ObjectKey) -> Result<(), Error> {
+ let request = Request::builder()
+ .method(Method::DELETE)
+ .uri(self.uri_builder(&object_key)?)
+ .body(Body::empty())?;
+
+ let response = self.send(request).await?;
+ let response_reader = ResponseReader::new(response);
+ response_reader.delete_object_response().await
+ }
+
+ /// Delete multiple objects from a bucket using a single HTTP request.
+ /// See reference docs: https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html
+ pub async fn delete_objects(
+ &self,
+ object_keys: &[String],
+ ) -> Result<DeleteObjectsResponse, Error> {
+ let mut body = String::from(r#"<Delete xmlns="http://s3.amazonaws.com/doc/2006-03-01/">"#);
+ for object_key in object_keys {
+ let object = format!("<Object><Key>{object_key}</Key></Object>");
+ body.push_str(&object);
+ }
+ body.push_str("</Delete>");
+ let request = Request::builder()
+ .method(Method::POST)
+ .uri(self.uri_builder("/?delete")?)
+ .body(Body::from(body))?;
+
+ let response = self.send(request).await?;
+ let response_reader = ResponseReader::new(response);
+ response_reader.delete_objects_response().await
+ }
+
+ /// Delete objects by given key prefix.
+ /// Requires at least 2 api calls.
+ pub async fn delete_objects_by_prefix(&self, prefix: &str) -> Result<bool, Error> {
+ // S3 API does not provide a convenient way to delete objects by key prefix.
+ // List all objects with given group prefix and delete all objects found, so this
+ // requires at least 2 API calls.
+ let mut next_continuation_token: Option<String> = None;
+ let mut delete_errors = false;
+ loop {
+ let list_objects_result = self
+ .list_objects_v2(Some(prefix), None, next_continuation_token.as_deref())
+ .await?;
+ let objects_to_delete: Vec<String> = list_objects_result
+ .contents
+ .into_iter()
+ .map(|item| item.key)
+ .collect();
+ let response = self.delete_objects(&objects_to_delete).await?;
+ if response.error.is_some() {
+ delete_errors = true;
+ }
+
+ if list_objects_result.is_truncated {
+ next_continuation_token = list_objects_result
+ .next_continuation_token
+ .as_ref()
+ .cloned();
+ continue;
+ }
+ break;
+ }
+ Ok(delete_errors)
+ }
+
+ /// Delete objects by given key prefix, but exclude items pre-filter based on suffix
+ /// (including the parent component of the matched suffix). E.g. do not remove items in a
+ /// snapshot directory, by matching based on the protected file marker (given as suffix).
+ ///
+ /// Requires at least 2 api calls.
+ pub async fn delete_objects_by_prefix_with_suffix_filter(
+ &self,
+ prefix: &str,
+ suffix: &str,
+ ) -> Result<bool, Error> {
+ // S3 API does not provide a convenient way to delete objects by key prefix.
+ // List all objects with given group prefix and delete all objects found, so this
+ // requires at least 2 API calls.
+ let mut next_continuation_token: Option<String> = None;
+ let mut delete_errors = false;
+ let mut prefix_filters = Vec::new();
+ let mut list_objects = Vec::new();
+ loop {
+ let list_objects_result = self
+ .list_objects_v2(Some(prefix), None, next_continuation_token.as_deref())
+ .await?;
+ let mut prefixes: Vec<String> = list_objects_result
+ .contents
+ .iter()
+ .filter_map(|item| {
+ let prefix_filter = item
+ .key
+ .strip_suffix(suffix)
+ .map(|prefix| prefix.to_string());
+ if prefix_filter.is_none() {
+ list_objects.push(item.key.clone());
+ }
+ prefix_filter
+ })
+ .collect();
+ prefix_filters.append(&mut prefixes);
+
+ if list_objects_result.is_truncated {
+ next_continuation_token = list_objects_result
+ .next_continuation_token
+ .as_ref()
+ .cloned();
+ continue;
+ }
+ break;
+ }
+
+ // Re-filter in case the 1000 items per request boundary lead to the prefix not being
+ // filtered for some items
+ let objects_to_delete: Vec<String> = list_objects
+ .into_iter()
+ .filter_map(|item| {
+ for prefix in &prefix_filters {
+ if item.strip_prefix(prefix).is_some() {
+ return None;
+ }
+ }
+ Some(item)
+ })
+ .collect();
+
+ for objects in objects_to_delete.chunks(1000) {
+ let result = self.delete_objects(objects).await?;
+ if result.error.is_some() {
+ delete_errors = true;
+ }
+ }
+
+ Ok(delete_errors)
+ }
+
+ #[inline(always)]
+ /// Helper to generate [`Uri`] instance with common properties based on given path and query
+ /// string
+ fn uri_builder(&self, path_and_query: &str) -> Result<Uri, Error> {
+ Uri::builder()
+ .scheme("https")
+ .authority(self.authority.clone())
+ .path_and_query(path_and_query)
+ .build()
+ .context("failed to build uri")
+ }
}
diff --git a/pbs-s3-client/src/lib.rs b/pbs-s3-client/src/lib.rs
index b3e539bdd..b4e7eb497 100644
--- a/pbs-s3-client/src/lib.rs
+++ b/pbs-s3-client/src/lib.rs
@@ -3,6 +3,8 @@ mod client;
pub use client::{S3Client, S3ClientOptions};
mod object_key;
pub use object_key::{S3ObjectKey, S3_CONTENT_PREFIX};
+mod response_reader;
+pub use response_reader::PutObjectResponse;
use std::time::Duration;
diff --git a/pbs-s3-client/src/response_reader.rs b/pbs-s3-client/src/response_reader.rs
new file mode 100644
index 000000000..8c553ce8f
--- /dev/null
+++ b/pbs-s3-client/src/response_reader.rs
@@ -0,0 +1,343 @@
+use std::str::FromStr;
+
+use anyhow::{anyhow, bail, Context, Error};
+use hyper::body::HttpBody;
+use hyper::header::HeaderName;
+use hyper::http::header;
+use hyper::http::StatusCode;
+use hyper::{Body, HeaderMap, Response};
+use serde::Deserialize;
+
+use crate::{HttpDate, LastModifiedTimestamp};
+
+pub(crate) struct ResponseReader {
+ response: Response<Body>,
+}
+
+#[derive(Debug)]
+pub struct ListObjectsV2Response {
+ pub date: HttpDate,
+ pub name: String,
+ pub prefix: String,
+ pub key_count: u64,
+ pub max_keys: u64,
+ pub is_truncated: bool,
+ pub continuation_token: Option<String>,
+ pub next_continuation_token: Option<String>,
+ pub contents: Vec<ListObjectsV2Contents>,
+}
+
+#[derive(Deserialize, Debug)]
+#[serde(rename_all = "PascalCase")]
+struct ListObjectsV2ResponseBody {
+ pub name: String,
+ pub prefix: String,
+ pub key_count: u64,
+ pub max_keys: u64,
+ pub is_truncated: bool,
+ pub continuation_token: Option<String>,
+ pub next_continuation_token: Option<String>,
+ pub contents: Option<Vec<ListObjectsV2Contents>>,
+}
+
+impl ListObjectsV2ResponseBody {
+ fn with_date(self, date: HttpDate) -> ListObjectsV2Response {
+ ListObjectsV2Response {
+ date,
+ name: self.name,
+ prefix: self.prefix,
+ key_count: self.key_count,
+ max_keys: self.max_keys,
+ is_truncated: self.is_truncated,
+ continuation_token: self.continuation_token,
+ next_continuation_token: self.next_continuation_token,
+ contents: self.contents.unwrap_or_default(),
+ }
+ }
+}
+
+#[derive(Deserialize, Debug)]
+#[serde(rename_all = "PascalCase")]
+pub struct ListObjectsV2Contents {
+ pub key: String,
+ pub last_modified: LastModifiedTimestamp,
+ pub e_tag: String,
+ pub size: u64,
+ pub storage_class: String,
+}
+
+#[derive(Debug)]
+/// Subset of the head object response (headers only, there is no body)
+/// See https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html#API_HeadObject_ResponseSyntax
+pub struct HeadObjectResponse {
+ pub content_length: u64,
+ pub content_type: String,
+ pub date: HttpDate,
+ pub e_tag: String,
+ pub last_modified: HttpDate,
+}
+
+#[derive(Debug)]
+/// Subset of the get object response
+/// https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html#API_GetObject_ResponseSyntax
+pub struct GetObjectResponse {
+ pub content_length: u64,
+ pub content_type: String,
+ pub date: HttpDate,
+ pub e_tag: String,
+ pub last_modified: HttpDate,
+ pub content: Body,
+}
+
+#[derive(Debug)]
+pub struct CopyObjectResponse {
+ pub copy_object_result: CopyObjectResult,
+ pub x_amz_version_id: Option<String>,
+}
+
+#[derive(Deserialize, Debug)]
+#[serde(rename_all = "PascalCase")]
+pub struct CopyObjectResult {
+ pub e_tag: String,
+ pub last_modified: LastModifiedTimestamp,
+}
+
+/// Subset of the put object response
+/// https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html#API_PutObject_ResponseSyntax
+#[derive(Debug)]
+pub enum PutObjectResponse {
+ NeedsRetry,
+ PreconditionFailed,
+ Success(String),
+}
+
+/// Subset of the delete objects response
+/// https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html#API_DeleteObjects_ResponseElements
+#[derive(Deserialize, Debug)]
+#[serde(rename_all = "PascalCase")]
+pub struct DeleteObjectsResponse {
+ pub deleted: Option<Vec<DeletedObject>>,
+ pub error: Option<Vec<DeleteObjectError>>,
+}
+
+/// https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeletedObject.html
+#[derive(Deserialize, Debug)]
+#[serde(rename_all = "PascalCase")]
+pub struct DeletedObject {
+ pub delete_marker: Option<bool>,
+ pub delete_marker_version_id: Option<String>,
+ pub key: Option<String>,
+ pub version_id: Option<String>,
+}
+
+/// https://docs.aws.amazon.com/AmazonS3/latest/API/API_Error.html
+#[derive(Deserialize, Debug)]
+#[serde(rename_all = "PascalCase")]
+pub struct DeleteObjectError {
+ pub code: Option<String>,
+ pub key: Option<String>,
+ pub message: Option<String>,
+ pub version_id: Option<String>,
+}
+
+impl ResponseReader {
+ pub(crate) fn new(response: Response<Body>) -> Self {
+ Self { response }
+ }
+
+ pub(crate) async fn list_objects_v2_response(self) -> Result<ListObjectsV2Response, Error> {
+ let (parts, body) = self.response.into_parts();
+
+ match parts.status {
+ StatusCode::OK => (),
+ StatusCode::NOT_FOUND => bail!("bucket does not exist"),
+ status_code => bail!("unexpected status code {status_code}"),
+ }
+
+ let body = body.collect().await?.to_bytes();
+ let body = String::from_utf8(body.to_vec())?;
+
+ let date: HttpDate = Self::parse_header(header::DATE, &parts.headers)?;
+
+ let response: ListObjectsV2ResponseBody =
+ serde_xml_rs::from_str(&body).context("failed to parse response body")?;
+
+ Ok(response.with_date(date))
+ }
+
+ pub(crate) async fn head_object_response(self) -> Result<Option<HeadObjectResponse>, Error> {
+ let (parts, body) = self.response.into_parts();
+
+ match parts.status {
+ StatusCode::OK => (),
+ StatusCode::NOT_FOUND => return Ok(None),
+ status_code => bail!("unexpected status code {status_code}"),
+ }
+ let body = body.collect().await?.to_bytes();
+ if !body.is_empty() {
+ bail!("got unexpected non-empty response body");
+ }
+ println!("Headers {:?}", parts.headers);
+
+ let content_length: u64 = Self::parse_header(header::CONTENT_LENGTH, &parts.headers)?;
+ let content_type = Self::parse_header(header::CONTENT_TYPE, &parts.headers)?;
+ let e_tag = Self::parse_header(header::ETAG, &parts.headers)?;
+ let date = Self::parse_header(header::DATE, &parts.headers)?;
+ let last_modified = Self::parse_header(header::LAST_MODIFIED, &parts.headers)?;
+
+ Ok(Some(HeadObjectResponse {
+ content_length,
+ content_type,
+ date,
+ e_tag,
+ last_modified,
+ }))
+ }
+
+ pub(crate) async fn get_object_response(self) -> Result<Option<GetObjectResponse>, Error> {
+ let (parts, content) = self.response.into_parts();
+
+ match parts.status {
+ StatusCode::OK => (),
+ StatusCode::NOT_FOUND => return Ok(None),
+ StatusCode::FORBIDDEN => bail!("object is archived and inaccessible until restored"),
+ status_code => bail!("unexpected status code {status_code}"),
+ }
+
+ let content_length: u64 = Self::parse_header(header::CONTENT_LENGTH, &parts.headers)?;
+ let content_type = Self::parse_header(header::CONTENT_TYPE, &parts.headers)?;
+ let e_tag = Self::parse_header(header::ETAG, &parts.headers)?;
+ let date = Self::parse_header(header::DATE, &parts.headers)?;
+ let last_modified = Self::parse_header(header::LAST_MODIFIED, &parts.headers)?;
+
+ Ok(Some(GetObjectResponse {
+ content_length,
+ content_type,
+ date,
+ e_tag,
+ last_modified,
+ content,
+ }))
+ }
+
+ pub(crate) async fn copy_object_response(self) -> Result<CopyObjectResponse, Error> {
+ let (parts, content) = self.response.into_parts();
+
+ match parts.status {
+ StatusCode::OK => (),
+ StatusCode::NOT_FOUND => bail!("object not found"),
+ StatusCode::FORBIDDEN => bail!("the source object is not in the active tier"),
+ status_code => bail!("unexpected status code {status_code}"),
+ }
+
+ let body = content.collect().await?.to_bytes();
+ let body = String::from_utf8(body.to_vec())?;
+
+ let x_amz_version_id = match parts.headers.get("x-amz-version-id") {
+ Some(version_id) => Some(
+ version_id
+ .to_str()
+ .context("failed to parse version id header")?
+ .to_owned(),
+ ),
+ None => None,
+ };
+
+ let copy_object_result: CopyObjectResult =
+ serde_xml_rs::from_str(&body).context("failed to parse response body")?;
+
+ Ok(CopyObjectResponse {
+ copy_object_result,
+ x_amz_version_id,
+ })
+ }
+
+ pub(crate) async fn put_object_response(self) -> Result<PutObjectResponse, Error> {
+ let (parts, body) = self.response.into_parts();
+
+ match parts.status {
+ StatusCode::OK => (),
+ // If-None-Match precondition failed, an object with same key already present.
+ // FIXME: Should this be dropped in favor of re-uploading and rely on the local
+ // cache to detect duplicates to increase data safety guarantees?
+ StatusCode::PRECONDITION_FAILED => return Ok(PutObjectResponse::PreconditionFailed),
+ StatusCode::CONFLICT => return Ok(PutObjectResponse::NeedsRetry),
+ StatusCode::BAD_REQUEST => bail!("invalid request: {body:?}"),
+ status_code => bail!("unexpected status code {status_code}"),
+ };
+
+ let body = body.collect().await?.to_bytes();
+ if !body.is_empty() {
+ bail!("got unexpected non-empty response body");
+ }
+
+ let e_tag = Self::parse_header(header::ETAG, &parts.headers)?;
+
+ Ok(PutObjectResponse::Success(e_tag))
+ }
+
+ pub(crate) async fn delete_object_response(self) -> Result<(), Error> {
+ let (parts, _body) = self.response.into_parts();
+
+ match parts.status {
+ StatusCode::NO_CONTENT => (),
+ status_code => bail!("unexpected status code {status_code}"),
+ };
+
+ Ok(())
+ }
+
+ pub(crate) async fn delete_objects_response(self) -> Result<DeleteObjectsResponse, Error> {
+ let (parts, body) = self.response.into_parts();
+
+ match parts.status {
+ StatusCode::OK => (),
+ StatusCode::BAD_REQUEST => bail!("invalid request: {body:?}"),
+ status_code => bail!("unexpected status code {status_code}"),
+ };
+
+ let body = body.collect().await?.to_bytes();
+ let body = String::from_utf8(body.to_vec())?;
+
+ let delete_objects_response: DeleteObjectsResponse =
+ serde_xml_rs::from_str(&body).context("failed to parse response body")?;
+
+ Ok(delete_objects_response)
+ }
+
+ fn parse_header<T: FromStr>(name: HeaderName, headers: &HeaderMap) -> Result<T, Error>
+ where
+ <T as FromStr>::Err: Send + Sync + 'static,
+ Result<T, <T as FromStr>::Err>: Context<T, <T as FromStr>::Err>,
+ {
+ let header_value = headers
+ .get(&name)
+ .ok_or_else(|| anyhow!("missing header '{name}'"))?;
+ let header_str = header_value
+ .to_str()
+ .with_context(|| format!("non UTF-8 header '{name}'"))?;
+ let value = header_str
+ .parse()
+ .with_context(|| format!("failed to parse header '{name}'"))?;
+ Ok(value)
+ }
+
+ // TODO: Integrity checks via CRC32 or SHA265 currently cannot be performed, since not
+ // supported by all S3 object store providers.
+ // See also:
+ // https://tracker.ceph.com/issues/63951
+ // https://tracker.ceph.com/issues/69105
+ // https://www.backblaze.com/docs/cloud-storage-s3-compatible-api
+ fn parse_x_amz_checksum_crc32_header(headers: &HeaderMap) -> Result<Option<u32>, Error> {
+ let x_amz_checksum_crc32 = match headers.get("x-amz-checksum-crc32") {
+ Some(x_amz_checksum_crc32) => x_amz_checksum_crc32,
+ None => return Ok(None),
+ };
+ let x_amz_checksum_crc32 = base64::decode(x_amz_checksum_crc32.to_str()?)?;
+ let x_amz_checksum_crc32: [u8; 4] = x_amz_checksum_crc32
+ .try_into()
+ .map_err(|_e| anyhow!("failed to convert x-amz-checksum-crc32 header"))?;
+ let x_amz_checksum_crc32 = u32::from_be_bytes(x_amz_checksum_crc32);
+ Ok(Some(x_amz_checksum_crc32))
+ }
+}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 13/42] config: introduce s3 object store client configuration
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (11 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 12/42] s3 client: implement methods to operate on s3 objects in bucket Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 14/42] api: config: implement endpoints to manipulate and list s3 configs Christian Ebner
` (30 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Adds the client configuration for s3 object store as dedicated
configuration files, with secrets being stored separately from the
regular configuration and excluded from api responses for security
reasons.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-config/src/lib.rs | 1 +
pbs-config/src/s3.rs | 82 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 83 insertions(+)
create mode 100644 pbs-config/src/s3.rs
diff --git a/pbs-config/src/lib.rs b/pbs-config/src/lib.rs
index 9c4d77c24..d03c079ab 100644
--- a/pbs-config/src/lib.rs
+++ b/pbs-config/src/lib.rs
@@ -10,6 +10,7 @@ pub mod network;
pub mod notifications;
pub mod prune;
pub mod remote;
+pub mod s3;
pub mod sync;
pub mod tape_job;
pub mod token_shadow;
diff --git a/pbs-config/src/s3.rs b/pbs-config/src/s3.rs
new file mode 100644
index 000000000..5fce5034d
--- /dev/null
+++ b/pbs-config/src/s3.rs
@@ -0,0 +1,82 @@
+use std::collections::HashMap;
+use std::sync::LazyLock;
+
+use anyhow::Error;
+
+use proxmox_schema::*;
+use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
+
+use pbs_api_types::{S3ClientConfig, S3ClientSecretsConfig, JOB_ID_SCHEMA};
+
+use crate::{open_backup_lockfile, replace_backup_config, BackupLockGuard};
+
+pub static CONFIG: LazyLock<SectionConfig> = LazyLock::new(init);
+
+fn init() -> SectionConfig {
+ let obj_schema = match S3ClientConfig::API_SCHEMA {
+ Schema::Object(ref obj_schema) => obj_schema,
+ _ => unreachable!(),
+ };
+ let secrets_obj_schema = match S3ClientSecretsConfig::API_SCHEMA {
+ Schema::Object(ref obj_schema) => obj_schema,
+ _ => unreachable!(),
+ };
+
+ let plugin =
+ SectionConfigPlugin::new("s3client".to_string(), Some(String::from("id")), obj_schema);
+ let secrets_plugin = SectionConfigPlugin::new(
+ "s3secrets".to_string(),
+ Some(String::from("secrets-id")),
+ secrets_obj_schema,
+ );
+ let mut config = SectionConfig::new(&JOB_ID_SCHEMA);
+ config.register_plugin(plugin);
+ config.register_plugin(secrets_plugin);
+
+ config
+}
+
+pub const S3_CFG_FILENAME: &str = "/etc/proxmox-backup/s3.cfg";
+pub const S3_SECRETS_CFG_FILENAME: &str = "/etc/proxmox-backup/s3-secrets.cfg";
+pub const S3_CFG_LOCKFILE: &str = "/etc/proxmox-backup/.s3.lck";
+
+/// Get exclusive lock
+pub fn lock_config() -> Result<BackupLockGuard, Error> {
+ open_backup_lockfile(S3_CFG_LOCKFILE, None, true)
+}
+
+pub fn config() -> Result<(SectionConfigData, [u8; 32]), Error> {
+ parse_config(S3_CFG_FILENAME)
+}
+
+pub fn secrets_config() -> Result<(SectionConfigData, [u8; 32]), Error> {
+ parse_config(S3_SECRETS_CFG_FILENAME)
+}
+
+pub fn save_config(config: &SectionConfigData, secrets: &SectionConfigData) -> Result<(), Error> {
+ let raw = CONFIG.write(S3_CFG_FILENAME, config)?;
+ replace_backup_config(S3_CFG_FILENAME, raw.as_bytes())?;
+
+ let secrets_raw = CONFIG.write(S3_SECRETS_CFG_FILENAME, secrets)?;
+ // Secrets are stored with `backup` permissions to allow reading from
+ // not protected api endpoints as well.
+ replace_backup_config(S3_SECRETS_CFG_FILENAME, secrets_raw.as_bytes())?;
+
+ Ok(())
+}
+
+// shell completion helper
+pub fn complete_s3_client_id(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
+ match config() {
+ Ok((data, _digest)) => data.sections.keys().map(|id| id.to_string()).collect(),
+ Err(_) => Vec::new(),
+ }
+}
+
+fn parse_config(path: &str) -> Result<(SectionConfigData, [u8; 32]), Error> {
+ let content = proxmox_sys::fs::file_read_optional_string(path)?;
+ let content = content.unwrap_or_default();
+ let digest = openssl::sha::sha256(content.as_bytes());
+ let data = CONFIG.parse(path, &content)?;
+ Ok((data, digest))
+}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 14/42] api: config: implement endpoints to manipulate and list s3 configs
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (12 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 13/42] config: introduce s3 object store client configuration Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 15/42] api: datastore: check S3 backend bucket access on datastore create Christian Ebner
` (29 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Allows to create, list, modify and delete configurations for s3
clients via the api.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/config/mod.rs | 2 +
src/api2/config/s3.rs | 305 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 307 insertions(+)
create mode 100644 src/api2/config/s3.rs
diff --git a/src/api2/config/mod.rs b/src/api2/config/mod.rs
index 15dc5db92..1cd9ead76 100644
--- a/src/api2/config/mod.rs
+++ b/src/api2/config/mod.rs
@@ -14,6 +14,7 @@ pub mod metrics;
pub mod notifications;
pub mod prune;
pub mod remote;
+pub mod s3;
pub mod sync;
pub mod tape_backup_job;
pub mod tape_encryption_keys;
@@ -32,6 +33,7 @@ const SUBDIRS: SubdirMap = &sorted!([
("notifications", ¬ifications::ROUTER),
("prune", &prune::ROUTER),
("remote", &remote::ROUTER),
+ ("s3", &s3::ROUTER),
("sync", &sync::ROUTER),
("tape-backup-job", &tape_backup_job::ROUTER),
("tape-encryption-keys", &tape_encryption_keys::ROUTER),
diff --git a/src/api2/config/s3.rs b/src/api2/config/s3.rs
new file mode 100644
index 000000000..aa6d0fa81
--- /dev/null
+++ b/src/api2/config/s3.rs
@@ -0,0 +1,305 @@
+use ::serde::{Deserialize, Serialize};
+use anyhow::Error;
+use hex::FromHex;
+use serde_json::Value;
+
+use proxmox_router::{http_bail, Permission, Router, RpcEnvironment};
+use proxmox_schema::{api, param_bail};
+
+use pbs_api_types::{
+ S3ClientConfig, S3ClientConfigUpdater, S3ClientSecretsConfig, S3ClientSecretsConfigUpdater,
+ JOB_ID_SCHEMA, PRIV_SYS_AUDIT, PRIV_SYS_MODIFY, PROXMOX_CONFIG_DIGEST_SCHEMA,
+};
+use pbs_config::s3;
+
+#[api(
+ input: {
+ properties: {},
+ },
+ returns: {
+ description: "List configured s3 clients.",
+ type: Array,
+ items: { type: S3ClientConfig },
+ },
+ access: {
+ permission: &Permission::Privilege(&[], PRIV_SYS_AUDIT, false),
+ },
+)]
+/// List all s3 client configurations.
+pub fn list_s3_client_config(
+ _param: Value,
+ rpcenv: &mut dyn RpcEnvironment,
+) -> Result<Vec<S3ClientConfig>, Error> {
+ let (config, digest) = s3::config()?;
+ let list = config.convert_to_typed_array("s3client")?;
+
+ let (_secrets, secrets_digest) = s3::secrets_config()?;
+ let digest = digest_with_secrets(&digest, &secrets_digest);
+ rpcenv["digest"] = hex::encode(digest).into();
+
+ Ok(list)
+}
+
+#[api(
+ protected: true,
+ input: {
+ properties: {
+ config: {
+ type: S3ClientConfig,
+ flatten: true,
+ },
+ secrets: {
+ type: S3ClientSecretsConfig,
+ flatten: true,
+ },
+ },
+ },
+ access: {
+ permission: &Permission::Privilege(&[], PRIV_SYS_MODIFY, false),
+ },
+)]
+/// Create a new s3 client configuration.
+pub fn create_s3_client_config(
+ config: S3ClientConfig,
+ secrets: S3ClientSecretsConfig,
+ _rpcenv: &mut dyn RpcEnvironment,
+) -> Result<(), Error> {
+ // Asssure both, config and secrets are referenced by the same `id`
+ if config.id != secrets.secrets_id {
+ param_bail!(
+ "id",
+ "config and secrets must use the same id ({} != {})",
+ config.id,
+ secrets.secrets_id
+ );
+ }
+
+ let _lock = s3::lock_config()?;
+ let (mut section_config, _digest) = s3::config()?;
+ if section_config.sections.contains_key(&config.id) {
+ param_bail!("id", "s3 client config '{}' already exists.", config.id);
+ }
+
+ let (mut section_secrets, _secrets_digest) = s3::secrets_config()?;
+ if section_secrets.sections.contains_key(&config.id) {
+ param_bail!("id", "s3 secrets config '{}' already exists.", config.id);
+ }
+
+ section_config.set_data(&config.id, "s3client", &config)?;
+ section_secrets.set_data(&config.id, "s3secrets", &secrets)?;
+ s3::save_config(§ion_config, §ion_secrets)?;
+
+ Ok(())
+}
+
+#[api(
+ input: {
+ properties: {
+ id: {
+ schema: JOB_ID_SCHEMA,
+ },
+ },
+ },
+ returns: { type: S3ClientConfig },
+ access: {
+ permission: &Permission::Privilege(&[], PRIV_SYS_AUDIT, false),
+ },
+)]
+/// Read an s3 client configuration.
+pub fn read_s3_client_config(
+ id: String,
+ rpcenv: &mut dyn RpcEnvironment,
+) -> Result<S3ClientConfig, Error> {
+ let (config, digest) = s3::config()?;
+ let s3_client_config: S3ClientConfig = config.lookup("s3client", &id)?;
+
+ let (_secrets, secrets_digest) = s3::secrets_config()?;
+ let digest = digest_with_secrets(&digest, &secrets_digest);
+ rpcenv["digest"] = hex::encode(digest).into();
+
+ Ok(s3_client_config)
+}
+
+#[api()]
+#[derive(Serialize, Deserialize)]
+#[serde(rename_all = "kebab-case")]
+/// Deletable property name
+pub enum DeletableProperty {
+ /// Delete the port property.
+ Port,
+ /// Delete the region property.
+ Region,
+ /// Delete the fingerprint property.
+ Fingerprint,
+}
+
+#[api(
+ protected: true,
+ input: {
+ properties: {
+ id: {
+ schema: JOB_ID_SCHEMA,
+ },
+ update: {
+ type: S3ClientConfigUpdater,
+ flatten: true,
+ },
+ "update-secrets": {
+ type: S3ClientSecretsConfigUpdater,
+ flatten: true,
+ },
+ delete: {
+ description: "List of properties to delete.",
+ type: Array,
+ optional: true,
+ items: {
+ type: DeletableProperty,
+ }
+ },
+ digest: {
+ optional: true,
+ schema: PROXMOX_CONFIG_DIGEST_SCHEMA,
+ },
+ },
+ },
+ access: {
+ permission: &Permission::Privilege(&[], PRIV_SYS_MODIFY, false),
+ },
+)]
+/// Update an s3 client configuration.
+#[allow(clippy::too_many_arguments)]
+pub fn update_s3_client_config(
+ id: String,
+ update: S3ClientConfigUpdater,
+ update_secrets: S3ClientSecretsConfigUpdater,
+ delete: Option<Vec<DeletableProperty>>,
+ digest: Option<String>,
+ _rpcenv: &mut dyn RpcEnvironment,
+) -> Result<(), Error> {
+ let _lock = s3::lock_config()?;
+ let (mut config, expected_digest) = s3::config()?;
+ let (mut secrets, secrets_digest) = s3::secrets_config()?;
+ let expected_digest = digest_with_secrets(&expected_digest, &secrets_digest);
+
+ // Secrets are not included in digest concurrent changes therefore not detected.
+ if let Some(ref digest) = digest {
+ let digest = <[u8; 32]>::from_hex(digest)?;
+ crate::tools::detect_modified_configuration_file(&digest, &expected_digest)?;
+ }
+
+ let mut data: S3ClientConfig = config.lookup("s3client", &id)?;
+
+ if let Some(delete) = delete {
+ for delete_prop in delete {
+ match delete_prop {
+ DeletableProperty::Port => {
+ data.port = None;
+ }
+ DeletableProperty::Region => {
+ data.region = None;
+ }
+ DeletableProperty::Fingerprint => {
+ data.fingerprint = None;
+ }
+ }
+ }
+ }
+
+ if let Some(host) = update.host {
+ data.host = host;
+ }
+ if let Some(bucket) = update.bucket {
+ data.bucket = bucket;
+ }
+ if let Some(port) = update.port {
+ data.port = Some(port);
+ }
+ if let Some(region) = update.region {
+ data.region = Some(region);
+ }
+ if let Some(access_key) = update.access_key {
+ data.access_key = access_key;
+ }
+ if let Some(fingerprint) = update.fingerprint {
+ data.fingerprint = Some(fingerprint);
+ }
+
+ let mut secrets_data: S3ClientSecretsConfig = secrets.lookup("s3secrets", &id)?;
+ if let Some(secret_key) = update_secrets.secret_key {
+ secrets_data.secret_key = secret_key;
+ }
+
+ config.set_data(&id, "s3client", &data)?;
+ secrets.set_data(&id, "s3secrets", &secrets_data)?;
+ s3::save_config(&config, &secrets)?;
+
+ Ok(())
+}
+
+#[api(
+ protected: true,
+ input: {
+ properties: {
+ id: {
+ schema: JOB_ID_SCHEMA,
+ },
+ digest: {
+ optional: true,
+ schema: PROXMOX_CONFIG_DIGEST_SCHEMA,
+ },
+ },
+ },
+ access: {
+ permission: &Permission::Privilege(&[], PRIV_SYS_MODIFY, false),
+ },
+)]
+/// Remove an s3 client configuration.
+pub fn delete_s3_client_config(
+ id: String,
+ digest: Option<String>,
+ _rpcenv: &mut dyn RpcEnvironment,
+) -> Result<(), Error> {
+ let _lock = s3::lock_config()?;
+ let (mut config, expected_digest) = s3::config()?;
+ let (mut secrets, secrets_digest) = s3::secrets_config()?;
+ let expected_digest = digest_with_secrets(&expected_digest, &secrets_digest);
+
+ if let Some(ref digest) = digest {
+ let digest = <[u8; 32]>::from_hex(digest)?;
+ crate::tools::detect_modified_configuration_file(&digest, &expected_digest)?;
+ }
+
+ match (config.sections.remove(&id), secrets.sections.remove(&id)) {
+ (Some(_), Some(_)) => {}
+ (None, None) => http_bail!(
+ NOT_FOUND,
+ "s3 client config and secrets '{id}' do not exist."
+ ),
+ (Some(_), None) => http_bail!(
+ NOT_FOUND,
+ "removed s3 client config, but no secrets for '{id}' found."
+ ),
+ (None, Some(_)) => http_bail!(
+ NOT_FOUND,
+ "removed s3 client secrets, but no config for '{id}' found."
+ ),
+ }
+ s3::save_config(&config, &secrets)
+}
+
+// Calculate the digest based on the digest of config and secrets to detect changes for both
+fn digest_with_secrets(digest: &[u8; 32], secrets_digest: &[u8; 32]) -> [u8; 32] {
+ let mut digest = digest.to_vec();
+ digest.append(&mut secrets_digest.to_vec());
+ openssl::sha::sha256(&digest)
+}
+
+const ITEM_ROUTER: Router = Router::new()
+ .get(&API_METHOD_READ_S3_CLIENT_CONFIG)
+ .put(&API_METHOD_UPDATE_S3_CLIENT_CONFIG)
+ .delete(&API_METHOD_DELETE_S3_CLIENT_CONFIG);
+
+pub const ROUTER: Router = Router::new()
+ .get(&API_METHOD_LIST_S3_CLIENT_CONFIG)
+ .post(&API_METHOD_CREATE_S3_CLIENT_CONFIG)
+ .match_all("id", &ITEM_ROUTER);
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 15/42] api: datastore: check S3 backend bucket access on datastore create
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (13 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 14/42] api: config: implement endpoints to manipulate and list s3 configs Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 16/42] api/bin: add endpoint and command to check s3 client connection Christian Ebner
` (28 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Check if the configured S3 object store backend can be reached and
the provided secrets have the permissions to access the bucket.
Perform the check before creating the chunk store, so it is not left
behind if the bucket cannot be reached.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/config/datastore.rs | 41 ++++++++++++++++++++++++++++++++----
1 file changed, 37 insertions(+), 4 deletions(-)
diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
index b133be707..19b08b7e4 100644
--- a/src/api2/config/datastore.rs
+++ b/src/api2/config/datastore.rs
@@ -3,6 +3,7 @@ use std::path::{Path, PathBuf};
use ::serde::{Deserialize, Serialize};
use anyhow::{bail, Context, Error};
use hex::FromHex;
+use pbs_s3_client::{S3Client, S3ClientOptions};
use serde_json::Value;
use tracing::{info, warn};
@@ -12,10 +13,10 @@ use proxmox_section_config::SectionConfigData;
use proxmox_uuid::Uuid;
use pbs_api_types::{
- Authid, DataStoreConfig, DataStoreConfigUpdater, DatastoreNotify, DatastoreTuning, KeepOptions,
- MaintenanceMode, PruneJobConfig, PruneJobOptions, DATASTORE_SCHEMA, PRIV_DATASTORE_ALLOCATE,
- PRIV_DATASTORE_AUDIT, PRIV_DATASTORE_MODIFY, PRIV_SYS_MODIFY, PROXMOX_CONFIG_DIGEST_SCHEMA,
- UPID_SCHEMA,
+ Authid, DataStoreConfig, DataStoreConfigUpdater, DatastoreBackendConfig, DatastoreNotify,
+ DatastoreTuning, KeepOptions, MaintenanceMode, PruneJobConfig, PruneJobOptions, S3ClientConfig,
+ S3ClientSecretsConfig, DATASTORE_SCHEMA, PRIV_DATASTORE_ALLOCATE, PRIV_DATASTORE_AUDIT,
+ PRIV_DATASTORE_MODIFY, PRIV_SYS_MODIFY, PROXMOX_CONFIG_DIGEST_SCHEMA, UPID_SCHEMA,
};
use pbs_config::BackupLockGuard;
use pbs_datastore::chunk_store::ChunkStore;
@@ -116,6 +117,38 @@ pub(crate) fn do_create_datastore(
.parse_property_string(datastore.tuning.as_deref().unwrap_or(""))?,
)?;
+ if let Some(ref backend_config) = datastore.backend {
+ let backend_config: DatastoreBackendConfig = backend_config.parse()?;
+ match backend_config {
+ DatastoreBackendConfig::Filesystem => (),
+ DatastoreBackendConfig::S3(ref s3_client_id) => {
+ let (config, _config_digest) =
+ pbs_config::s3::config().context("failed to get s3 config")?;
+ let (secrets, _secrets_digest) =
+ pbs_config::s3::secrets_config().context("failed to get s3 secrets")?;
+ let config: S3ClientConfig = config
+ .lookup("s3client", s3_client_id)
+ .with_context(|| format!("no '{s3_client_id}' in config"))?;
+ let secrets: S3ClientSecretsConfig = secrets
+ .lookup("s3secrets", s3_client_id)
+ .with_context(|| format!("no '{s3_client_id}' in secrets"))?;
+ let options = S3ClientOptions {
+ host: config.host,
+ port: config.port,
+ bucket: config.bucket,
+ region: config.region.unwrap_or("us-west-1".to_string()),
+ fingerprint: config.fingerprint,
+ access_key: config.access_key,
+ secret_key: secrets.secret_key,
+ };
+ let s3_client = S3Client::new(options).context("failed to create s3 client")?;
+ // Fine to block since this runs in worker task
+ proxmox_async::runtime::block_on(s3_client.head_bucket())
+ .context("failed to access bucket")?;
+ }
+ }
+ }
+
let unmount_guard = if datastore.backing_device.is_some() {
do_mount_device(datastore.clone())?;
UnmountGuard::new(Some(path.clone()))
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 16/42] api/bin: add endpoint and command to check s3 client connection
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (14 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 15/42] api: datastore: check S3 backend bucket access on datastore create Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 17/42] datastore: allow to get the backend for a datastore Christian Ebner
` (27 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Adds a dedicated api endpoint and a proxmox-backup-manager command to
check if the configured S3 client can reach the bucket.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/admin/mod.rs | 2 +
src/api2/admin/s3.rs | 72 +++++++++++++++++++++++++++
src/bin/proxmox-backup-manager.rs | 1 +
src/bin/proxmox_backup_manager/mod.rs | 2 +
src/bin/proxmox_backup_manager/s3.rs | 34 +++++++++++++
5 files changed, 111 insertions(+)
create mode 100644 src/api2/admin/s3.rs
create mode 100644 src/bin/proxmox_backup_manager/s3.rs
diff --git a/src/api2/admin/mod.rs b/src/api2/admin/mod.rs
index a1c49f8e2..7694de4b9 100644
--- a/src/api2/admin/mod.rs
+++ b/src/api2/admin/mod.rs
@@ -9,6 +9,7 @@ pub mod gc;
pub mod metrics;
pub mod namespace;
pub mod prune;
+pub mod s3;
pub mod sync;
pub mod traffic_control;
pub mod verify;
@@ -19,6 +20,7 @@ const SUBDIRS: SubdirMap = &sorted!([
("metrics", &metrics::ROUTER),
("prune", &prune::ROUTER),
("gc", &gc::ROUTER),
+ ("s3", &s3::ROUTER),
("sync", &sync::ROUTER),
("traffic-control", &traffic_control::ROUTER),
("verify", &verify::ROUTER),
diff --git a/src/api2/admin/s3.rs b/src/api2/admin/s3.rs
new file mode 100644
index 000000000..229bcc535
--- /dev/null
+++ b/src/api2/admin/s3.rs
@@ -0,0 +1,72 @@
+//! S3 bucket operations
+
+use anyhow::{Context, Error};
+use hyper::Body;
+use serde_json::Value;
+
+use proxmox_router::{list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap};
+use proxmox_schema::*;
+use proxmox_sortable_macro::sortable;
+
+use pbs_api_types::{S3ClientConfig, S3ClientSecretsConfig, PRIV_SYS_MODIFY, S3_CLIENT_ID_SCHEMA};
+
+#[api(
+ input: {
+ properties: {
+ "s3-client-id": {
+ schema: S3_CLIENT_ID_SCHEMA ,
+ },
+ },
+ },
+ access: {
+ permission: &Permission::Privilege(&[], PRIV_SYS_MODIFY, false),
+ },
+)]
+/// Perform basic sanity check for given s3 client configuration
+pub async fn check(s3_client_id: String, _rpcenv: &mut dyn RpcEnvironment) -> Result<Value, Error> {
+ let (config, _digest) = pbs_config::s3::config()?;
+ let config: S3ClientConfig = config
+ .lookup("s3client", &s3_client_id)
+ .context("config lookup failed")?;
+ let (secrets, _secrets_digest) = pbs_config::s3::secrets_config()?;
+ let secrets: S3ClientSecretsConfig = secrets
+ .lookup("s3secrets", &s3_client_id)
+ .context("secrets lookup failed")?;
+
+ let options = pbs_s3_client::S3ClientOptions {
+ host: config.host,
+ port: config.port,
+ bucket: config.bucket,
+ region: config.region.unwrap_or_default(),
+ fingerprint: config.fingerprint,
+ access_key: config.access_key,
+ secret_key: secrets.secret_key,
+ };
+
+ let test_object_key = ".s3-client-test";
+ let client = pbs_s3_client::S3Client::new(options).context("client creation failed")?;
+ client.head_bucket().await.context("head object failed")?;
+ client
+ .put_object(test_object_key.into(), Body::empty())
+ .await
+ .context("put object failed")?;
+ client
+ .get_object(test_object_key.into())
+ .await
+ .context("get object failed")?;
+ client
+ .delete_object(test_object_key.into())
+ .await
+ .context("delete object failed")?;
+
+ Ok(Value::Null)
+}
+
+#[sortable]
+const S3_OPERATION_SUBDIRS: SubdirMap = &[("check", &Router::new().get(&API_METHOD_CHECK))];
+
+const S3_OPERATION_ROUTER: Router = Router::new()
+ .get(&list_subdirs_api_method!(S3_OPERATION_SUBDIRS))
+ .subdirs(S3_OPERATION_SUBDIRS);
+
+pub const ROUTER: Router = Router::new().match_all("s3-client-id", &S3_OPERATION_ROUTER);
diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
index d4363e717..68d87c676 100644
--- a/src/bin/proxmox-backup-manager.rs
+++ b/src/bin/proxmox-backup-manager.rs
@@ -677,6 +677,7 @@ async fn run() -> Result<(), Error> {
.insert("garbage-collection", garbage_collection_commands())
.insert("acme", acme_mgmt_cli())
.insert("cert", cert_mgmt_cli())
+ .insert("s3", s3_commands())
.insert("subscription", subscription_commands())
.insert("sync-job", sync_job_commands())
.insert("verify-job", verify_job_commands())
diff --git a/src/bin/proxmox_backup_manager/mod.rs b/src/bin/proxmox_backup_manager/mod.rs
index 9b5c73e9a..312a6db6b 100644
--- a/src/bin/proxmox_backup_manager/mod.rs
+++ b/src/bin/proxmox_backup_manager/mod.rs
@@ -26,6 +26,8 @@ mod prune;
pub use prune::*;
mod remote;
pub use remote::*;
+mod s3;
+pub use s3::*;
mod subscription;
pub use subscription::*;
mod sync;
diff --git a/src/bin/proxmox_backup_manager/s3.rs b/src/bin/proxmox_backup_manager/s3.rs
new file mode 100644
index 000000000..a92d3d1b2
--- /dev/null
+++ b/src/bin/proxmox_backup_manager/s3.rs
@@ -0,0 +1,34 @@
+use pbs_api_types::S3_CLIENT_ID_SCHEMA;
+use proxmox_router::{cli::*, RpcEnvironment};
+use proxmox_schema::api;
+
+use proxmox_backup::api2;
+
+use anyhow::Error;
+use serde_json::Value;
+
+#[api(
+ input: {
+ properties: {
+ "s3-client-id": {
+ schema: S3_CLIENT_ID_SCHEMA,
+ },
+ },
+ },
+)]
+/// Perform basic sanity checks for given S3 client configuration
+async fn check(s3_client_id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Value, Error> {
+ api2::admin::s3::check(s3_client_id, rpcenv).await?;
+ Ok(Value::Null)
+}
+
+pub fn s3_commands() -> CommandLineInterface {
+ let cmd_def = CliCommandMap::new().insert(
+ "check",
+ CliCommand::new(&API_METHOD_CHECK)
+ .arg_param(&["s3-client-id"])
+ .completion_cb("s3-client-id", pbs_config::s3::complete_s3_client_id),
+ );
+
+ cmd_def.into()
+}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 17/42] datastore: allow to get the backend for a datastore
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (15 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 16/42] api/bin: add endpoint and command to check s3 client connection Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 18/42] api: backup: store datastore backend in runtime environment Christian Ebner
` (26 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Implements an enum with variants Filesystem and S3 to distinguish
between available backends. Filesystem will be used as default, if no
backend is configured in the datastores configuration. If the
datastore has an s3 backend configured, the backend method will
instantiate and s3 client and return it with the S3 variant.
This allows to instantiate the client once, keeping and reusing the
same open connection to the api for the lifetime of task or job, e.g.
in the backup writer/readers runtime environment.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 46 ++++++++++++++++++++++++++++++++--
pbs-datastore/src/lib.rs | 1 +
3 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
index 7623adc28..3ee06c9bb 100644
--- a/pbs-datastore/Cargo.toml
+++ b/pbs-datastore/Cargo.toml
@@ -44,4 +44,5 @@ pbs-api-types.workspace = true
pbs-buildcfg.workspace = true
pbs-config.workspace = true
pbs-key-config.workspace = true
+pbs-s3-client.workspace = true
pbs-tools.workspace = true
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index cbf78ecb6..42d27d249 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -8,6 +8,7 @@ use std::time::Duration;
use anyhow::{bail, format_err, Context, Error};
use nix::unistd::{unlinkat, UnlinkatFlags};
+use pbs_s3_client::{S3Client, S3ClientOptions};
use pbs_tools::lru_cache::LruCache;
use tracing::{info, warn};
@@ -23,8 +24,9 @@ use proxmox_worker_task::WorkerTaskContext;
use pbs_api_types::{
ArchiveType, Authid, BackupGroupDeleteStats, BackupNamespace, BackupType, ChunkOrder,
- DataStoreConfig, DatastoreFSyncLevel, DatastoreTuning, GarbageCollectionStatus,
- MaintenanceMode, MaintenanceType, Operation, UPID,
+ DataStoreConfig, DatastoreBackendConfig, DatastoreFSyncLevel, DatastoreTuning,
+ GarbageCollectionStatus, MaintenanceMode, MaintenanceType, Operation, S3ClientConfig,
+ S3ClientSecretsConfig, UPID,
};
use pbs_config::BackupLockGuard;
@@ -125,6 +127,7 @@ pub struct DataStoreImpl {
chunk_order: ChunkOrder,
last_digest: Option<[u8; 32]>,
sync_level: DatastoreFSyncLevel,
+ backend_config: DatastoreBackendConfig,
}
impl DataStoreImpl {
@@ -139,6 +142,7 @@ impl DataStoreImpl {
chunk_order: Default::default(),
last_digest: None,
sync_level: Default::default(),
+ backend_config: Default::default(),
})
}
}
@@ -194,6 +198,12 @@ impl Drop for DataStore {
}
}
+#[derive(Clone)]
+pub enum DatastoreBackend {
+ Filesystem,
+ S3(Arc<S3Client>),
+}
+
impl DataStore {
// This one just panics on everything
#[doc(hidden)]
@@ -204,6 +214,32 @@ impl DataStore {
})
}
+ /// Get the backend for this datastore based on it's configuration
+ pub fn backend(&self) -> Result<DatastoreBackend, Error> {
+ let backend_type = match self.inner.backend_config {
+ DatastoreBackendConfig::Filesystem => DatastoreBackend::Filesystem,
+ DatastoreBackendConfig::S3(ref s3_client_id) => {
+ let (config, _config_digest) = pbs_config::s3::config()?;
+ let (secrets, _secrets_digest) = pbs_config::s3::secrets_config()?;
+ let config: S3ClientConfig = config.lookup("s3client", s3_client_id)?;
+ let secrets: S3ClientSecretsConfig = secrets.lookup("s3secrets", s3_client_id)?;
+ let options = S3ClientOptions {
+ host: config.host,
+ port: config.port,
+ bucket: config.bucket,
+ region: config.region.unwrap_or("us-west-1".to_string()),
+ fingerprint: config.fingerprint,
+ access_key: config.access_key,
+ secret_key: secrets.secret_key,
+ };
+ let s3_client = S3Client::new(options)?;
+ DatastoreBackend::S3(Arc::new(s3_client))
+ }
+ };
+
+ Ok(backend_type)
+ }
+
pub fn lookup_datastore(
name: &str,
operation: Option<Operation>,
@@ -381,6 +417,11 @@ impl DataStore {
.parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
)?;
+ let backend_config = match config.backend {
+ Some(config) => config.parse()?,
+ None => Default::default(),
+ };
+
Ok(DataStoreImpl {
chunk_store,
gc_mutex: Mutex::new(()),
@@ -389,6 +430,7 @@ impl DataStore {
chunk_order: tuning.chunk_order.unwrap_or_default(),
last_digest,
sync_level: tuning.sync_level.unwrap_or_default(),
+ backend_config,
})
}
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index 5014b6c09..e6f65575b 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -203,6 +203,7 @@ pub use store_progress::StoreProgress;
mod datastore;
pub use datastore::{
check_backup_owner, ensure_datastore_is_mounted, get_datastore_mount_status, DataStore,
+ DatastoreBackend,
};
mod hierarchy;
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 18/42] api: backup: store datastore backend in runtime environment
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (16 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 17/42] datastore: allow to get the backend for a datastore Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 19/42] api: backup: conditionally upload chunks to S3 object store backend Christian Ebner
` (25 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Get and store the datastore's backend during creation of the backup
runtime environment and upload the chunks to the local filesystem or
s3 object store based on the backend variant.
By storing the backend variant in the environment the s3 client is
instantiated only once and reused for all api calls in the same
backup http/2 connection.
Refactor the upgrade method by moving all logic into the async block,
such that the now possible error on backup environment creation gets
propagated to the thread spawn call side.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/backup/environment.rs | 12 +++--
src/api2/backup/mod.rs | 99 +++++++++++++++++-----------------
2 files changed, 57 insertions(+), 54 deletions(-)
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 6cd29f512..8919b919a 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -15,7 +15,8 @@ use pbs_api_types::Authid;
use pbs_datastore::backup_info::{BackupDir, BackupInfo};
use pbs_datastore::dynamic_index::DynamicIndexWriter;
use pbs_datastore::fixed_index::FixedIndexWriter;
-use pbs_datastore::{DataBlob, DataStore};
+use pbs_datastore::{DataBlob, DataStore, DatastoreBackend};
+use pbs_s3_client::PutObjectResponse;
use proxmox_rest_server::{formatter::*, WorkerTask};
use crate::backup::VerifyWorker;
@@ -115,6 +116,7 @@ pub struct BackupEnvironment {
pub datastore: Arc<DataStore>,
pub backup_dir: BackupDir,
pub last_backup: Option<BackupInfo>,
+ pub backend: DatastoreBackend,
state: Arc<Mutex<SharedBackupState>>,
}
@@ -125,7 +127,7 @@ impl BackupEnvironment {
worker: Arc<WorkerTask>,
datastore: Arc<DataStore>,
backup_dir: BackupDir,
- ) -> Self {
+ ) -> Result<Self, Error> {
let state = SharedBackupState {
finished: false,
uid_counter: 0,
@@ -137,7 +139,8 @@ impl BackupEnvironment {
backup_stat: UploadStatistic::new(),
};
- Self {
+ let backend = datastore.backend()?;
+ Ok(Self {
result_attributes: json!({}),
env_type,
auth_id,
@@ -147,8 +150,9 @@ impl BackupEnvironment {
formatter: JSON_FORMATTER,
backup_dir,
last_backup: None,
+ backend,
state: Arc::new(Mutex::new(state)),
- }
+ })
}
/// Register a Chunk with associated length.
diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
index 567bca3ef..2c6afca41 100644
--- a/src/api2/backup/mod.rs
+++ b/src/api2/backup/mod.rs
@@ -185,7 +185,8 @@ fn upgrade_to_backup_protocol(
}
// lock last snapshot to prevent forgetting/pruning it during backup
- let guard = last.backup_dir
+ let guard = last
+ .backup_dir
.lock_shared()
.with_context(|| format!("while locking last snapshot during backup '{last:?}'"))?;
Some(guard)
@@ -204,14 +205,14 @@ fn upgrade_to_backup_protocol(
Some(worker_id),
auth_id.to_string(),
true,
- move |worker| {
+ move |worker| async move {
let mut env = BackupEnvironment::new(
env_type,
auth_id,
worker.clone(),
datastore,
backup_dir,
- );
+ )?;
env.debug = debug;
env.last_backup = last_backup;
@@ -264,55 +265,53 @@ fn upgrade_to_backup_protocol(
});
let mut abort_future = abort_future.map(|_| Err(format_err!("task aborted")));
- async move {
- // keep flock until task ends
- let _group_guard = _group_guard;
- let snap_guard = snap_guard;
- let _last_guard = _last_guard;
-
- let res = select! {
- req = req_fut => req,
- abrt = abort_future => abrt,
- };
- if benchmark {
- env.log("benchmark finished successfully");
- proxmox_async::runtime::block_in_place(|| env.remove_backup())?;
- return Ok(());
+ // keep flock until task ends
+ let _group_guard = _group_guard;
+ let snap_guard = snap_guard;
+ let _last_guard = _last_guard;
+
+ let res = select! {
+ req = req_fut => req,
+ abrt = abort_future => abrt,
+ };
+ if benchmark {
+ env.log("benchmark finished successfully");
+ proxmox_async::runtime::block_in_place(|| env.remove_backup())?;
+ return Ok(());
+ }
+
+ let verify = |env: BackupEnvironment| {
+ if let Err(err) = env.verify_after_complete(snap_guard) {
+ env.log(format!(
+ "backup finished, but starting the requested verify task failed: {}",
+ err
+ ));
}
+ };
- let verify = |env: BackupEnvironment| {
- if let Err(err) = env.verify_after_complete(snap_guard) {
- env.log(format!(
- "backup finished, but starting the requested verify task failed: {}",
- err
- ));
- }
- };
-
- match (res, env.ensure_finished()) {
- (Ok(_), Ok(())) => {
- env.log("backup finished successfully");
- verify(env);
- Ok(())
- }
- (Err(err), Ok(())) => {
- // ignore errors after finish
- env.log(format!("backup had errors but finished: {}", err));
- verify(env);
- Ok(())
- }
- (Ok(_), Err(err)) => {
- env.log(format!("backup ended and finish failed: {}", err));
- env.log("removing unfinished backup");
- proxmox_async::runtime::block_in_place(|| env.remove_backup())?;
- Err(err)
- }
- (Err(err), Err(_)) => {
- env.log(format!("backup failed: {}", err));
- env.log("removing failed backup");
- proxmox_async::runtime::block_in_place(|| env.remove_backup())?;
- Err(err)
- }
+ match (res, env.ensure_finished()) {
+ (Ok(_), Ok(())) => {
+ env.log("backup finished successfully");
+ verify(env);
+ Ok(())
+ }
+ (Err(err), Ok(())) => {
+ // ignore errors after finish
+ env.log(format!("backup had errors but finished: {}", err));
+ verify(env);
+ Ok(())
+ }
+ (Ok(_), Err(err)) => {
+ env.log(format!("backup ended and finish failed: {}", err));
+ env.log("removing unfinished backup");
+ proxmox_async::runtime::block_in_place(|| env.remove_backup())?;
+ Err(err)
+ }
+ (Err(err), Err(_)) => {
+ env.log(format!("backup failed: {}", err));
+ env.log("removing failed backup");
+ proxmox_async::runtime::block_in_place(|| env.remove_backup())?;
+ Err(err)
}
}
},
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 19/42] api: backup: conditionally upload chunks to S3 object store backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (17 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 18/42] api: backup: store datastore backend in runtime environment Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 20/42] api: backup: conditionally upload blobs " Christian Ebner
` (24 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Upload fixed and dynamic sized chunks to either the filesystem or
the S3 object store, depending on the configured backend.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/backup/upload_chunk.rs | 44 ++++++++++++++++++++++-----------
1 file changed, 29 insertions(+), 15 deletions(-)
diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index 20259660a..838eec1fa 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -15,7 +15,8 @@ use proxmox_sortable_macro::sortable;
use pbs_api_types::{BACKUP_ARCHIVE_NAME_SCHEMA, CHUNK_DIGEST_SCHEMA};
use pbs_datastore::file_formats::{DataBlobHeader, EncryptedDataBlobHeader};
-use pbs_datastore::{DataBlob, DataStore};
+use pbs_datastore::{DataBlob, DataStore, DatastoreBackend};
+use pbs_s3_client::PutObjectResponse;
use pbs_tools::json::{required_integer_param, required_string_param};
use super::environment::*;
@@ -153,16 +154,10 @@ fn upload_fixed_chunk(
) -> ApiResponseFuture {
async move {
let wid = required_integer_param(¶m, "wid")? as usize;
- let size = required_integer_param(¶m, "size")? as u32;
- let encoded_size = required_integer_param(¶m, "encoded-size")? as u32;
-
- let digest_str = required_string_param(¶m, "digest")?;
- let digest = <[u8; 32]>::from_hex(digest_str)?;
-
let env: &BackupEnvironment = rpcenv.as_ref();
let (digest, size, compressed_size, is_duplicate) =
- UploadChunk::new(req_body, env.datastore.clone(), digest, size, encoded_size).await?;
+ upload_to_backend(req_body, param, env).await?;
env.register_fixed_chunk(wid, digest, size, compressed_size, is_duplicate)?;
let digest_str = hex::encode(digest);
@@ -222,16 +217,10 @@ fn upload_dynamic_chunk(
) -> ApiResponseFuture {
async move {
let wid = required_integer_param(¶m, "wid")? as usize;
- let size = required_integer_param(¶m, "size")? as u32;
- let encoded_size = required_integer_param(¶m, "encoded-size")? as u32;
-
- let digest_str = required_string_param(¶m, "digest")?;
- let digest = <[u8; 32]>::from_hex(digest_str)?;
-
let env: &BackupEnvironment = rpcenv.as_ref();
let (digest, size, compressed_size, is_duplicate) =
- UploadChunk::new(req_body, env.datastore.clone(), digest, size, encoded_size).await?;
+ upload_to_backend(req_body, param, env).await?;
env.register_dynamic_chunk(wid, digest, size, compressed_size, is_duplicate)?;
let digest_str = hex::encode(digest);
@@ -243,6 +232,31 @@ fn upload_dynamic_chunk(
.boxed()
}
+async fn upload_to_backend(
+ req_body: Body,
+ param: Value,
+ env: &BackupEnvironment,
+) -> Result<([u8; 32], u32, u32, bool), Error> {
+ let size = required_integer_param(¶m, "size")? as u32;
+ let encoded_size = required_integer_param(¶m, "encoded-size")? as u32;
+ let digest_str = required_string_param(¶m, "digest")?;
+ let digest = <[u8; 32]>::from_hex(digest_str)?;
+
+ match &env.backend {
+ DatastoreBackend::Filesystem => {
+ UploadChunk::new(req_body, env.datastore.clone(), digest, size, encoded_size).await
+ }
+ DatastoreBackend::S3(s3_client) => {
+ let is_duplicate = match s3_client.put_object(digest.into(), req_body).await? {
+ PutObjectResponse::PreconditionFailed => true,
+ PutObjectResponse::NeedsRetry => bail!("concurrent operation, reupload required"),
+ PutObjectResponse::Success(_content) => false,
+ };
+ Ok((digest, size, encoded_size, is_duplicate))
+ }
+ }
+}
+
pub const API_METHOD_UPLOAD_SPEEDTEST: ApiMethod = ApiMethod::new(
&ApiHandler::AsyncHttp(&upload_speedtest),
&ObjectSchema::new("Test upload speed.", &[]),
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 20/42] api: backup: conditionally upload blobs to S3 object store backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (18 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 19/42] api: backup: conditionally upload chunks to S3 object store backend Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 21/42] api: backup: conditionally upload indices " Christian Ebner
` (23 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Upload blobs to both, the local datastore cache and the S3 object
store if s3 is configured as backend.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/backup/environment.rs | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 8919b919a..393a8351d 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -581,6 +581,31 @@ impl BackupEnvironment {
let blob = DataBlob::load_from_reader(&mut &data[..])?;
let raw_data = blob.raw_data();
+ if let DatastoreBackend::S3(s3_client) = &self.backend {
+ let data = Body::from(raw_data.to_vec());
+ let mut object_key = self.backup_dir.relative_path();
+ object_key.push(file_name);
+ let object_key = object_key
+ .as_os_str()
+ .to_str()
+ .ok_or_else(|| format_err!("invalid path"))?;
+ match proxmox_async::runtime::block_on(s3_client.put_object(object_key.into(), data))? {
+ PutObjectResponse::PreconditionFailed => {
+ self.log(format!(
+ "Upload of blob failed, object {object_key} already present."
+ ));
+ bail!("upload of blob failed");
+ }
+ PutObjectResponse::NeedsRetry => {
+ self.log("Upload of blob failed, reupload required.");
+ bail!("concurrent operation, reupload required");
+ }
+ PutObjectResponse::Success(_content) => {
+ self.log(format!("Uploaded blob to object store: {object_key}"))
+ }
+ }
+ }
+
replace_file(&path, raw_data, CreateOptions::new(), false)?;
self.log(format!(
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 21/42] api: backup: conditionally upload indices to S3 object store backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (19 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 20/42] api: backup: conditionally upload blobs " Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 22/42] api: backup: conditionally upload manifest " Christian Ebner
` (22 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
If the datastore is backed by an S3 compatible object store, upload
the dynamic or fixed index files to the object store after closing
them. The local index files are kept in the local caching datastore
to allow for fast and efficient content lookups, avoiding expensive
(as in monetary cost and IO latency) requests.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/backup/environment.rs | 65 ++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 393a8351d..72e369bcf 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -2,6 +2,7 @@ use anyhow::{bail, format_err, Context, Error};
use pbs_config::BackupLockGuard;
use std::collections::HashMap;
+use std::io::Read;
use std::sync::{Arc, Mutex};
use tracing::info;
@@ -479,6 +480,38 @@ impl BackupEnvironment {
);
}
+ // For S3 backends, upload the index file to the object store after closing
+ if let DatastoreBackend::S3(s3_client) = &self.backend {
+ let mut object_key = self.backup_dir.relative_path();
+ object_key.push(&data.name);
+ let object_key = object_key
+ .as_os_str()
+ .to_str()
+ .ok_or_else(|| format_err!("invalid file name"))?;
+
+ let mut full_path = self.datastore.base_path();
+ full_path.push(object_key);
+ let mut file = std::fs::File::open(&full_path)?;
+ let mut buffer = Vec::new();
+ file.read_to_end(&mut buffer)?;
+ let data = Body::from(buffer);
+ match proxmox_async::runtime::block_on(s3_client.put_object(object_key.into(), data))? {
+ PutObjectResponse::PreconditionFailed => {
+ self.log(format!(
+ "Upload of dynamic index failed, object key {object_key} already present"
+ ));
+ bail!("Upload of dynamic index failed");
+ }
+ PutObjectResponse::NeedsRetry => {
+ self.log("Upload of dynamic index failed, reupload required");
+ bail!("concurrent operation, reupload required");
+ }
+ PutObjectResponse::Success(_content) => {
+ self.log(format!("Uploaded index file to object store: {object_key}"))
+ }
+ }
+ }
+
self.log_upload_stat(
&data.name,
&csum,
@@ -553,6 +586,38 @@ impl BackupEnvironment {
);
}
+ // For S3 backends, upload the index file to the object store after closing
+ if let DatastoreBackend::S3(s3_client) = &self.backend {
+ let mut object_key = self.backup_dir.relative_path();
+ object_key.push(&data.name);
+ let object_key = object_key
+ .as_os_str()
+ .to_str()
+ .ok_or_else(|| format_err!("invalid file name"))?;
+
+ let mut full_path = self.datastore.base_path();
+ full_path.push(object_key);
+ let mut file = std::fs::File::open(&full_path)?;
+ let mut buffer = Vec::new();
+ file.read_to_end(&mut buffer)?;
+ let data = Body::from(buffer);
+ match proxmox_async::runtime::block_on(s3_client.put_object(object_key.into(), data))? {
+ PutObjectResponse::PreconditionFailed => {
+ self.log(format!(
+ "Upload of fixed index failed, object {object_key} already present."
+ ));
+ bail!("upload of fixed index failed");
+ }
+ PutObjectResponse::NeedsRetry => {
+ self.log("Upload of fixed index failed, reupload required.");
+ bail!("concurrent operation, reupload required");
+ }
+ PutObjectResponse::Success(_content) => {
+ self.log(format!("Uploaded index file to object store: {object_key}"))
+ }
+ }
+ }
+
self.log_upload_stat(
&data.name,
&expected_csum,
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 22/42] api: backup: conditionally upload manifest to S3 object store backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (20 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 21/42] api: backup: conditionally upload indices " Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 23/42] sync: pull: conditionally upload content to S3 backend Christian Ebner
` (21 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Upload the manifest to the S3 object store backend after it has been
finished in the backup api call handler, if s3 is configured as
backend. Keep also the locally cached version for fast and efficient
listing of contents without the need to perform expensive (as in
monetary cost and IO latency) requests.
The datastore's metadata contents will be synced from the S3 backend
during datastore opening.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/backup/environment.rs | 33 ++++++++++++++++++++++++++++++++-
1 file changed, 32 insertions(+), 1 deletion(-)
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 72e369bcf..685b78e89 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -12,7 +12,7 @@ use serde_json::{json, Value};
use proxmox_router::{RpcEnvironment, RpcEnvironmentType};
use proxmox_sys::fs::{replace_file, CreateOptions};
-use pbs_api_types::Authid;
+use pbs_api_types::{Authid, MANIFEST_BLOB_NAME};
use pbs_datastore::backup_info::{BackupDir, BackupInfo};
use pbs_datastore::dynamic_index::DynamicIndexWriter;
use pbs_datastore::fixed_index::FixedIndexWriter;
@@ -719,6 +719,37 @@ impl BackupEnvironment {
}
}
+ if let DatastoreBackend::S3(s3_client) = &self.backend {
+ // Upload manifest to S3 object store
+ let mut object_key = self.backup_dir.relative_path();
+ object_key.push(MANIFEST_BLOB_NAME.as_ref());
+ let mut path = self.datastore.base_path();
+ path.push(&object_key);
+ let mut manifest = std::fs::File::open(&path)?;
+ let mut buffer = Vec::new();
+ manifest.read_to_end(&mut buffer)?;
+ let data = Body::from(buffer);
+ let object_key = object_key
+ .as_os_str()
+ .to_str()
+ .ok_or_else(|| format_err!("invalid path"))?;
+ match proxmox_async::runtime::block_on(s3_client.put_object(object_key.into(), data))? {
+ PutObjectResponse::PreconditionFailed => {
+ self.log(format!(
+ "Upload of manifest failed, object {object_key} already present."
+ ));
+ bail!("upload of manifest failed");
+ }
+ PutObjectResponse::NeedsRetry => {
+ self.log("Upload of manifest failed, reupload required.");
+ bail!("concurrent operation, reupload required");
+ }
+ PutObjectResponse::Success(_content) => {
+ self.log(format!("Uploaded manifest to object store: {object_key}"))
+ }
+ }
+ }
+
self.datastore.try_ensure_sync_level()?;
// marks the backup as successful
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 23/42] sync: pull: conditionally upload content to S3 backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (21 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 22/42] api: backup: conditionally upload manifest " Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 24/42] api: reader: fetch chunks based on datastore backend Christian Ebner
` (20 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
If the datastore is backed by an S3 object store, not only insert the
pulled contents to the local cache store, but also upload it to the
S3 backend.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/server/pull.rs | 58 ++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 56 insertions(+), 2 deletions(-)
diff --git a/src/server/pull.rs b/src/server/pull.rs
index b1724c142..f36efd7c8 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -8,6 +8,7 @@ use std::time::SystemTime;
use anyhow::{bail, format_err, Error};
use proxmox_human_byte::HumanByte;
+use tokio::io::AsyncReadExt;
use tracing::info;
use pbs_api_types::{
@@ -24,7 +25,7 @@ use pbs_datastore::fixed_index::FixedIndexReader;
use pbs_datastore::index::IndexFile;
use pbs_datastore::manifest::{BackupManifest, FileInfo};
use pbs_datastore::read_chunk::AsyncReadChunk;
-use pbs_datastore::{check_backup_owner, DataStore, StoreProgress};
+use pbs_datastore::{check_backup_owner, DataStore, DatastoreBackend, StoreProgress};
use pbs_tools::sha::sha256;
use super::sync::{
@@ -167,7 +168,18 @@ async fn pull_index_chunks<I: IndexFile>(
move |(chunk, digest, size): (DataBlob, [u8; 32], u64)| {
// println!("verify and write {}", hex::encode(&digest));
chunk.verify_unencrypted(size as usize, &digest)?;
- target2.insert_chunk(&chunk, &digest)?;
+ match target2.backend()? {
+ DatastoreBackend::Filesystem => {
+ target2.insert_chunk(&chunk, &digest)?;
+ }
+ DatastoreBackend::S3(s3_client) => {
+ let data = chunk.raw_data().to_vec();
+ let upload_body = hyper::Body::from(data);
+ proxmox_async::runtime::block_on(
+ s3_client.put_object(digest.into(), upload_body),
+ )?;
+ }
+ }
Ok(())
},
);
@@ -331,6 +343,20 @@ async fn pull_single_archive<'a>(
if let Err(err) = std::fs::rename(&tmp_path, &path) {
bail!("Atomic rename file {:?} failed - {}", path, err);
}
+ if let DatastoreBackend::S3(s3_client) = snapshot.datastore().backend()? {
+ let archive_path = snapshot.relative_path().join(archive_name);
+ let object_key = archive_path
+ .as_os_str()
+ .to_str()
+ .ok_or_else(|| format_err!("invalid archive path"))?;
+
+ let archive = tokio::fs::File::open(&path).await?;
+ let mut reader = tokio::io::BufReader::new(archive);
+ let mut contents = Vec::new();
+ reader.read_to_end(&mut contents).await?;
+ let data = hyper::body::Body::from(contents);
+ s3_client.put_object(object_key.into(), data).await?;
+ }
Ok(sync_stats)
}
@@ -401,6 +427,7 @@ async fn pull_snapshot<'a>(
}
}
+ let manifest_data = tmp_manifest_blob.raw_data().to_vec();
let manifest = BackupManifest::try_from(tmp_manifest_blob)?;
if ignore_not_verified_or_encrypted(
@@ -467,9 +494,36 @@ async fn pull_snapshot<'a>(
if let Err(err) = std::fs::rename(&tmp_manifest_name, &manifest_name) {
bail!("Atomic rename file {:?} failed - {}", manifest_name, err);
}
+ if let DatastoreBackend::S3(s3_client) = snapshot.datastore().backend()? {
+ let object_path = snapshot.relative_path().join(MANIFEST_BLOB_NAME.as_ref());
+ let object_key = object_path
+ .as_os_str()
+ .to_str()
+ .ok_or_else(|| format_err!("invalid archive path"))?;
+
+ let data = hyper::body::Body::from(manifest_data);
+ s3_client.put_object(object_key.into(), data).await?;
+ }
if !client_log_name.exists() {
reader.try_download_client_log(&client_log_name).await?;
+ if client_log_name.exists() {
+ if let DatastoreBackend::S3(s3_client) = snapshot.datastore().backend()? {
+ let object_path = snapshot.relative_path().join(CLIENT_LOG_BLOB_NAME.as_ref());
+ let object_key = object_path
+ .as_os_str()
+ .to_str()
+ .ok_or_else(|| format_err!("invalid archive path"))?;
+
+ let log_file = tokio::fs::File::open(&client_log_name).await?;
+ let mut reader = tokio::io::BufReader::new(log_file);
+ let mut contents = Vec::new();
+ reader.read_to_end(&mut contents).await?;
+
+ let data = hyper::body::Body::from(contents);
+ s3_client.put_object(object_key.into(), data).await?;
+ }
+ }
};
snapshot
.cleanup_unreferenced_files(&manifest)
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 24/42] api: reader: fetch chunks based on datastore backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (22 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 23/42] sync: pull: conditionally upload content to S3 backend Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 25/42] datastore: local chunk reader: read chunks based on backend Christian Ebner
` (19 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Read the chunk based on the datastores backend, reading from local
filesystem or fetching from S3 object store.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/reader/environment.rs | 12 +++++++----
src/api2/reader/mod.rs | 38 ++++++++++++++++++++++------------
2 files changed, 33 insertions(+), 17 deletions(-)
diff --git a/src/api2/reader/environment.rs b/src/api2/reader/environment.rs
index 3b2f06f43..8924352b0 100644
--- a/src/api2/reader/environment.rs
+++ b/src/api2/reader/environment.rs
@@ -1,13 +1,14 @@
use std::collections::HashSet;
use std::sync::{Arc, RwLock};
+use anyhow::Error;
use serde_json::{json, Value};
use proxmox_router::{RpcEnvironment, RpcEnvironmentType};
use pbs_api_types::Authid;
use pbs_datastore::backup_info::BackupDir;
-use pbs_datastore::DataStore;
+use pbs_datastore::{DataStore, DatastoreBackend};
use proxmox_rest_server::formatter::*;
use proxmox_rest_server::WorkerTask;
use tracing::info;
@@ -23,6 +24,7 @@ pub struct ReaderEnvironment {
pub worker: Arc<WorkerTask>,
pub datastore: Arc<DataStore>,
pub backup_dir: BackupDir,
+ pub backend: DatastoreBackend,
allowed_chunks: Arc<RwLock<HashSet<[u8; 32]>>>,
}
@@ -33,8 +35,9 @@ impl ReaderEnvironment {
worker: Arc<WorkerTask>,
datastore: Arc<DataStore>,
backup_dir: BackupDir,
- ) -> Self {
- Self {
+ ) -> Result<Self, Error> {
+ let backend = datastore.backend()?;
+ Ok(Self {
result_attributes: json!({}),
env_type,
auth_id,
@@ -43,8 +46,9 @@ impl ReaderEnvironment {
debug: tracing::enabled!(tracing::Level::DEBUG),
formatter: JSON_FORMATTER,
backup_dir,
+ backend,
allowed_chunks: Arc::new(RwLock::new(HashSet::new())),
- }
+ })
}
pub fn log<S: AsRef<str>>(&self, msg: S) {
diff --git a/src/api2/reader/mod.rs b/src/api2/reader/mod.rs
index cc791299c..3417f49be 100644
--- a/src/api2/reader/mod.rs
+++ b/src/api2/reader/mod.rs
@@ -24,7 +24,8 @@ use pbs_api_types::{
};
use pbs_config::CachedUserInfo;
use pbs_datastore::index::IndexFile;
-use pbs_datastore::{DataStore, PROXMOX_BACKUP_READER_PROTOCOL_ID_V1};
+use pbs_datastore::{DataStore, DatastoreBackend, PROXMOX_BACKUP_READER_PROTOCOL_ID_V1};
+use pbs_s3_client::S3Client;
use pbs_tools::json::required_string_param;
use crate::api2::backup::optional_ns_param;
@@ -159,7 +160,7 @@ fn upgrade_to_backup_reader_protocol(
worker.clone(),
datastore,
backup_dir,
- );
+ )?;
env.debug = debug;
@@ -320,17 +321,10 @@ fn download_chunk(
));
}
- let (path, _) = env.datastore.chunk_path(&digest);
- let path2 = path.clone();
-
- env.debug(format!("download chunk {:?}", path));
-
- let data =
- proxmox_async::runtime::block_in_place(|| std::fs::read(path)).map_err(move |err| {
- http_err!(BAD_REQUEST, "reading file {:?} failed: {}", path2, err)
- })?;
-
- let body = Body::from(data);
+ let body = match &env.backend {
+ DatastoreBackend::Filesystem => load_from_filesystem(env, &digest)?,
+ DatastoreBackend::S3(s3_client) => fetch_from_object_store(s3_client, &digest).await?,
+ };
// fixme: set other headers ?
Ok(Response::builder()
@@ -342,6 +336,24 @@ fn download_chunk(
.boxed()
}
+async fn fetch_from_object_store(s3_client: &S3Client, digest: &[u8; 32]) -> Result<Body, Error> {
+ if let Some(response) = s3_client.get_object(digest.into()).await? {
+ return Ok(response.content);
+ }
+ bail!("cannot find chunk with digest {}", hex::encode(digest));
+}
+
+fn load_from_filesystem(env: &ReaderEnvironment, digest: &[u8; 32]) -> Result<Body, Error> {
+ let (path, _) = env.datastore.chunk_path(digest);
+ let path2 = path.clone();
+
+ env.debug(format!("download chunk {path:?}"));
+
+ let data = proxmox_async::runtime::block_in_place(|| std::fs::read(path))
+ .map_err(move |err| http_err!(BAD_REQUEST, "reading file {path2:?} failed: {err}"))?;
+ Ok(Body::from(data))
+}
+
/* this is too slow
fn download_chunk_old(
_parts: Parts,
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 25/42] datastore: local chunk reader: read chunks based on backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (23 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 24/42] api: reader: fetch chunks based on datastore backend Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 26/42] verify worker: add datastore backed to verify worker Christian Ebner
` (18 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Get and store the datastore's backend on local chunk reader
instantiantion and fetch chunks based on the variant from either the
filesystem or the s3 object store.
By storing the backend variant, the s3 client is instantiated only
once and reused until the local chunk reader instance is dropped.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/Cargo.toml | 2 ++
pbs-datastore/src/local_chunk_reader.rs | 37 +++++++++++++++++++++----
2 files changed, 33 insertions(+), 6 deletions(-)
diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
index 3ee06c9bb..323f5e270 100644
--- a/pbs-datastore/Cargo.toml
+++ b/pbs-datastore/Cargo.toml
@@ -13,6 +13,7 @@ crc32fast.workspace = true
endian_trait.workspace = true
futures.workspace = true
hex = { workspace = true, features = [ "serde" ] }
+hyper.workspace = true
libc.workspace = true
log.workspace = true
nix.workspace = true
@@ -28,6 +29,7 @@ zstd-safe.workspace = true
pathpatterns.workspace = true
pxar.workspace = true
+proxmox-async.workspace = true
proxmox-borrow.workspace = true
proxmox-human-byte.workspace = true
proxmox-io.workspace = true
diff --git a/pbs-datastore/src/local_chunk_reader.rs b/pbs-datastore/src/local_chunk_reader.rs
index 05a70c068..a363059a1 100644
--- a/pbs-datastore/src/local_chunk_reader.rs
+++ b/pbs-datastore/src/local_chunk_reader.rs
@@ -3,17 +3,21 @@ use std::pin::Pin;
use std::sync::Arc;
use anyhow::{bail, Error};
+use hyper::body::HttpBody;
use pbs_api_types::CryptMode;
+use pbs_s3_client::S3Client;
use pbs_tools::crypt_config::CryptConfig;
use crate::data_blob::DataBlob;
+use crate::datastore::DatastoreBackend;
use crate::read_chunk::{AsyncReadChunk, ReadChunk};
use crate::DataStore;
#[derive(Clone)]
pub struct LocalChunkReader {
store: Arc<DataStore>,
+ backend: DatastoreBackend,
crypt_config: Option<Arc<CryptConfig>>,
crypt_mode: CryptMode,
}
@@ -24,8 +28,11 @@ impl LocalChunkReader {
crypt_config: Option<Arc<CryptConfig>>,
crypt_mode: CryptMode,
) -> Self {
+ // TODO: Error handling!
+ let backend = store.backend().unwrap();
Self {
store,
+ backend,
crypt_config,
crypt_mode,
}
@@ -47,10 +54,25 @@ impl LocalChunkReader {
}
}
+async fn fetch(s3_client: Arc<S3Client>, digest: &[u8; 32]) -> Result<DataBlob, Error> {
+ if let Some(response) = s3_client.get_object(digest.into()).await? {
+ let bytes = response.content.collect().await?.to_bytes();
+ DataBlob::from_raw(bytes.to_vec())
+ } else {
+ bail!("no object with digest {}", hex::encode(digest));
+ }
+}
+
impl ReadChunk for LocalChunkReader {
fn read_raw_chunk(&self, digest: &[u8; 32]) -> Result<DataBlob, Error> {
- let chunk = self.store.load_chunk(digest)?;
+ let chunk = match &self.backend {
+ DatastoreBackend::Filesystem => self.store.load_chunk(digest)?,
+ DatastoreBackend::S3(s3_client) => {
+ proxmox_async::runtime::block_on(fetch(s3_client.clone(), digest))?
+ }
+ };
self.ensure_crypt_mode(chunk.crypt_mode()?)?;
+
Ok(chunk)
}
@@ -69,11 +91,14 @@ impl AsyncReadChunk for LocalChunkReader {
digest: &'a [u8; 32],
) -> Pin<Box<dyn Future<Output = Result<DataBlob, Error>> + Send + 'a>> {
Box::pin(async move {
- let (path, _) = self.store.chunk_path(digest);
-
- let raw_data = tokio::fs::read(&path).await?;
-
- let chunk = DataBlob::load_from_reader(&mut &raw_data[..])?;
+ let chunk = match &self.backend {
+ DatastoreBackend::Filesystem => {
+ let (path, _) = self.store.chunk_path(digest);
+ let raw_data = tokio::fs::read(&path).await?;
+ DataBlob::load_from_reader(&mut &raw_data[..])?
+ }
+ DatastoreBackend::S3(s3_client) => fetch(s3_client.clone(), digest).await?,
+ };
self.ensure_crypt_mode(chunk.crypt_mode()?)?;
Ok(chunk)
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 26/42] verify worker: add datastore backed to verify worker
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (24 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 25/42] datastore: local chunk reader: read chunks based on backend Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 27/42] verify: implement chunk verification for stores with s3 backend Christian Ebner
` (17 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
In order to fetch chunks from an S3 compatible object store,
instantiate and store the s3 client in the verify worker by storing
the datastore's backend. This allows to reuse the same instance for
the whole verification task.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/admin/datastore.rs | 2 +-
src/api2/backup/environment.rs | 2 +-
src/backup/verify.rs | 14 ++++++++++----
src/server/verify_job.rs | 2 +-
4 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 7dc881ade..7b7f79b22 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -893,7 +893,7 @@ pub fn verify(
auth_id.to_string(),
to_stdout,
move |worker| {
- let verify_worker = VerifyWorker::new(worker.clone(), datastore);
+ let verify_worker = VerifyWorker::new(worker.clone(), datastore)?;
let failed_dirs = if let Some(backup_dir) = backup_dir {
let mut res = Vec::new();
if !verify_worker.verify_backup_dir(
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 685b78e89..384e8a73f 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -796,7 +796,7 @@ impl BackupEnvironment {
move |worker| {
worker.log_message("Automatically verifying newly added snapshot");
- let verify_worker = VerifyWorker::new(worker.clone(), datastore);
+ let verify_worker = VerifyWorker::new(worker.clone(), datastore)?;
if !verify_worker.verify_backup_dir_with_lock(
&backup_dir,
worker.upid().clone(),
diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index 0b954ae23..a01ddcca3 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -17,7 +17,7 @@ use pbs_api_types::{
use pbs_datastore::backup_info::{BackupDir, BackupGroup, BackupInfo};
use pbs_datastore::index::IndexFile;
use pbs_datastore::manifest::{BackupManifest, FileInfo};
-use pbs_datastore::{DataBlob, DataStore, StoreProgress};
+use pbs_datastore::{DataBlob, DataStore, DatastoreBackend, StoreProgress};
use crate::tools::parallel_handler::ParallelHandler;
@@ -30,19 +30,25 @@ pub struct VerifyWorker {
datastore: Arc<DataStore>,
verified_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
corrupt_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
+ backend: DatastoreBackend,
}
impl VerifyWorker {
/// Creates a new VerifyWorker for a given task worker and datastore.
- pub fn new(worker: Arc<dyn WorkerTaskContext>, datastore: Arc<DataStore>) -> Self {
- Self {
+ pub fn new(
+ worker: Arc<dyn WorkerTaskContext>,
+ datastore: Arc<DataStore>,
+ ) -> Result<Self, Error> {
+ let backend = datastore.backend()?;
+ Ok(Self {
worker,
datastore,
// start with 16k chunks == up to 64G data
verified_chunks: Arc::new(Mutex::new(HashSet::with_capacity(16 * 1024))),
// start with 64 chunks since we assume there are few corrupt ones
corrupt_chunks: Arc::new(Mutex::new(HashSet::with_capacity(64))),
- }
+ backend,
+ })
}
fn verify_blob(backup_dir: &BackupDir, info: &FileInfo) -> Result<(), Error> {
diff --git a/src/server/verify_job.rs b/src/server/verify_job.rs
index 95a7b2a9b..c8792174b 100644
--- a/src/server/verify_job.rs
+++ b/src/server/verify_job.rs
@@ -41,7 +41,7 @@ pub fn do_verification_job(
None => Default::default(),
};
- let verify_worker = VerifyWorker::new(worker.clone(), datastore);
+ let verify_worker = VerifyWorker::new(worker.clone(), datastore)?;
let result = verify_worker.verify_all_backups(
worker.upid(),
ns,
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 27/42] verify: implement chunk verification for stores with s3 backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (25 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 26/42] verify worker: add datastore backed to verify worker Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 28/42] datastore: create namespace marker in S3 backend Christian Ebner
` (16 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
For datastores backed by an S3 compatible object store, rather than
reading the chunks to be verified from the local filesystem, fetch
them via the s3 client from the configured bucket.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/backup/verify.rs | 59 +++++++++++++++++++++++++++++++++++---------
1 file changed, 47 insertions(+), 12 deletions(-)
diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index a01ddcca3..2c28c6af5 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -5,6 +5,7 @@ use std::sync::{Arc, Mutex};
use std::time::Instant;
use anyhow::{bail, Error};
+use hyper::body::HttpBody;
use tracing::{error, info, warn};
use proxmox_worker_task::WorkerTaskContext;
@@ -189,18 +190,52 @@ impl VerifyWorker {
continue; // already verified or marked corrupt
}
- match self.datastore.load_chunk(&info.digest) {
- Err(err) => {
- self.corrupt_chunks.lock().unwrap().insert(info.digest);
- error!("can't verify chunk, load failed - {err}");
- errors.fetch_add(1, Ordering::SeqCst);
- Self::rename_corrupted_chunk(self.datastore.clone(), &info.digest);
- }
- Ok(chunk) => {
- let size = info.size();
- read_bytes += chunk.raw_size();
- decoder_pool.send((chunk, info.digest, size))?;
- decoded_bytes += size;
+ match &self.backend {
+ DatastoreBackend::Filesystem => match self.datastore.load_chunk(&info.digest) {
+ Err(err) => {
+ self.corrupt_chunks.lock().unwrap().insert(info.digest);
+ error!("can't verify chunk, load failed - {err}");
+ errors.fetch_add(1, Ordering::SeqCst);
+ Self::rename_corrupted_chunk(self.datastore.clone(), &info.digest);
+ }
+ Ok(chunk) => {
+ let size = info.size();
+ read_bytes += chunk.raw_size();
+ decoder_pool.send((chunk, info.digest, size))?;
+ decoded_bytes += size;
+ }
+ },
+ DatastoreBackend::S3(s3_client) => {
+ //TODO: How to avoid all these requests? Does the AWS api offer other means
+ // to verify the contents/integrity of objects?
+ match proxmox_async::runtime::block_on(s3_client.get_object(info.digest.into()))
+ {
+ Ok(Some(response)) => {
+ let bytes =
+ proxmox_async::runtime::block_on(response.content.collect())?
+ .to_bytes();
+ let chunk = DataBlob::from_raw(bytes.to_vec())?;
+ let size = info.size();
+ read_bytes += chunk.raw_size();
+ decoder_pool.send((chunk, info.digest, size))?;
+ decoded_bytes += size;
+ }
+ Ok(None) => {
+ self.corrupt_chunks.lock().unwrap().insert(info.digest);
+ error!(
+ "can't verify missing chunk with digest {}",
+ hex::encode(info.digest)
+ );
+ errors.fetch_add(1, Ordering::SeqCst);
+ }
+ Err(err) => {
+ self.corrupt_chunks.lock().unwrap().insert(info.digest);
+ error!("can't verify chunk, load failed - {err}");
+ errors.fetch_add(1, Ordering::SeqCst);
+ //TODO: How to handle corrupt chunks for S3 store?
+ //Self::rename_corrupted_chunk(self.datastore.clone(), &info.digest);
+ }
+ }
}
}
}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 28/42] datastore: create namespace marker in S3 backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (26 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 27/42] verify: implement chunk verification for stores with s3 backend Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 29/42] datastore: create/delete protected marker file on S3 storage backend Christian Ebner
` (15 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
The S3 object store only allows to store objects, referenced by their
key. For backup namespaces datastores however use directories, so
they cannot be represented as one to one mapping.
Instead, create an empty marker file for each namespace and operate
based on that.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 42d27d249..ab5c22501 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -8,7 +8,7 @@ use std::time::Duration;
use anyhow::{bail, format_err, Context, Error};
use nix::unistd::{unlinkat, UnlinkatFlags};
-use pbs_s3_client::{S3Client, S3ClientOptions};
+use pbs_s3_client::{PutObjectResponse, S3Client, S3ClientOptions};
use pbs_tools::lru_cache::LruCache;
use tracing::{info, warn};
@@ -42,6 +42,8 @@ use crate::DataBlob;
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
+const NAMESPACE_MARKER_FILENAME: &str = ".namespace";
+
/// checks if auth_id is owner, or, if owner is a token, if
/// auth_id is the user of the token
pub fn check_backup_owner(owner: &Authid, auth_id: &Authid) -> Result<(), Error> {
@@ -590,6 +592,24 @@ impl DataStore {
// construct ns before mkdir to enforce max-depth and name validity
let ns = BackupNamespace::from_parent_ns(parent, name)?;
+ if let DatastoreBackend::S3(s3_client) = self.backend()? {
+ let marker = ns.path().join(NAMESPACE_MARKER_FILENAME);
+ let namespace_marker = marker
+ .to_str()
+ .ok_or_else(|| format_err!("unexpected namespace path"))?;
+
+ let response = proxmox_async::runtime::block_on(
+ s3_client.put_object(namespace_marker.into(), hyper::body::Body::empty()),
+ )?;
+ match response {
+ PutObjectResponse::NeedsRetry => bail!("failed to create namespace, needs retry"),
+ PutObjectResponse::PreconditionFailed => {
+ bail!("failed to create namespace, precondition failed")
+ }
+ PutObjectResponse::Success(_) => (),
+ }
+ }
+
let mut ns_full_path = self.base_path();
ns_full_path.push(ns.path());
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 29/42] datastore: create/delete protected marker file on S3 storage backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (27 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 28/42] datastore: create namespace marker in S3 backend Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 30/42] datastore: prune groups/snapshots from S3 object store backend Christian Ebner
` (14 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Commit 8292d3d2 ("api2/admin/datastore: add get/set_protection")
introduced the protected flag for backup snapshots, considering
snapshots as protected based on the presence/absence of the
`.protected` marker file in the corresponding snapshot directory.
To allow independent recovery of a datastore backed by an S3 bucket,
also create/delete the marker file on the object store backend. For
actual checks, still rely on the marker as encountered in the local
cache store.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 45 ++++++++++++++++++++++++++++++----
1 file changed, 40 insertions(+), 5 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index ab5c22501..5c8b49947 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1539,12 +1539,47 @@ impl DataStore {
let protected_path = backup_dir.protected_file();
if protection {
- std::fs::File::create(protected_path)
+ std::fs::File::create(&protected_path)
.map_err(|err| format_err!("could not create protection file: {}", err))?;
- } else if let Err(err) = std::fs::remove_file(protected_path) {
- // ignore error for non-existing file
- if err.kind() != std::io::ErrorKind::NotFound {
- bail!("could not remove protection file: {}", err);
+ if let DatastoreBackend::S3(s3_client) = self.backend()? {
+ let marker = backup_dir.relative_path().join(".protected");
+ let protected_marker = marker
+ .to_str()
+ .ok_or_else(|| format_err!("unexpected protected marker path"))?;
+ let response = proxmox_async::runtime::block_on(
+ s3_client.put_object(protected_marker.into(), hyper::body::Body::empty()),
+ )?;
+ match response {
+ PutObjectResponse::NeedsRetry => {
+ let _ = std::fs::remove_file(protected_path);
+ bail!("failed to mark snapshot as protected, needs retry")
+ }
+ PutObjectResponse::PreconditionFailed => {
+ let _ = std::fs::remove_file(protected_path);
+ bail!("failed to mark snapshot as protected, precondition failed")
+ }
+ PutObjectResponse::Success(_) => (),
+ }
+ }
+ } else {
+ if let Err(err) = std::fs::remove_file(&protected_path) {
+ // ignore error for non-existing file
+ if err.kind() != std::io::ErrorKind::NotFound {
+ bail!("could not remove protection file: {err}");
+ }
+ }
+ if let DatastoreBackend::S3(s3_client) = self.backend()? {
+ let marker = backup_dir.relative_path().join(".protected");
+ let protected_marker = marker
+ .to_str()
+ .ok_or_else(|| format_err!("unexpected protected marker path"))?;
+ if let Err(err) = proxmox_async::runtime::block_on(
+ s3_client.delete_object(protected_marker.into()),
+ ) {
+ std::fs::File::create(&protected_path)
+ .map_err(|err| format_err!("could not re-create protection file: {err}"))?;
+ return Err(err);
+ }
}
}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 30/42] datastore: prune groups/snapshots from S3 object store backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (28 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 29/42] datastore: create/delete protected marker file on S3 storage backend Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 31/42] datastore: get and set owner for S3 " Christian Ebner
` (13 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
When pruning a backup group or a backup snapshot for a datastore with
S3 object store backend, remove the associated objects by removing
them based on the prefix.
In order to exclude protected contents, add a filtering based on the
presence of the protected marker.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/backup_info.rs | 45 +++++++++++++++++++++++++++++---
pbs-datastore/src/datastore.rs | 34 +++++++++++++++++++++---
src/api2/admin/datastore.rs | 24 +++++++++++------
3 files changed, 88 insertions(+), 15 deletions(-)
diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index 1422fe865..b9ac286ad 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -8,6 +8,7 @@ use std::time::Duration;
use anyhow::{bail, format_err, Context, Error};
+use pbs_s3_client::S3_CONTENT_PREFIX;
use proxmox_sys::fs::{lock_dir_noblock, lock_dir_noblock_shared, replace_file, CreateOptions};
use proxmox_systemd::escape_unit;
@@ -18,7 +19,7 @@ use pbs_api_types::{
use pbs_config::{open_backup_lockfile, BackupLockGuard};
use crate::manifest::{BackupManifest, MANIFEST_LOCK_NAME};
-use crate::{DataBlob, DataStore};
+use crate::{DataBlob, DataStore, DatastoreBackend};
pub const DATASTORE_LOCKS_DIR: &str = "/run/proxmox-backup/locks";
@@ -214,7 +215,7 @@ impl BackupGroup {
///
/// Returns `BackupGroupDeleteStats`, containing the number of deleted snapshots
/// and number of protected snaphsots, which therefore were not removed.
- pub fn destroy(&self) -> Result<BackupGroupDeleteStats, Error> {
+ pub fn destroy(&self, backend: &DatastoreBackend) -> Result<BackupGroupDeleteStats, Error> {
let _guard = self
.lock()
.with_context(|| format!("while destroying group '{self:?}'"))?;
@@ -228,10 +229,26 @@ impl BackupGroup {
delete_stats.increment_protected_snapshots();
continue;
}
- snap.destroy(false)?;
+ // also for S3 cleanup local only, the actual S3 objects will be removed below,
+ // reducing the number of required API calls.
+ snap.destroy(false, &DatastoreBackend::Filesystem)?;
delete_stats.increment_removed_snapshots();
}
+ if let DatastoreBackend::S3(s3_client) = backend {
+ let path = self.relative_group_path();
+ let group_prefix = path
+ .to_str()
+ .ok_or_else(|| format_err!("invalid group path prefix"))?;
+ let prefix = format!("{S3_CONTENT_PREFIX}/{group_prefix}");
+ let delete_objects_error = proxmox_async::runtime::block_on(
+ s3_client.delete_objects_by_prefix_with_suffix_filter(&prefix, ".protected"),
+ )?;
+ if delete_objects_error {
+ bail!("deleting objects failed");
+ }
+ }
+
// Note: make sure the old locking mechanism isn't used as `remove_dir_all` is not safe in
// that case
if delete_stats.all_removed() && !*OLD_LOCKING {
@@ -577,7 +594,7 @@ impl BackupDir {
/// Destroy the whole snapshot, bails if it's protected
///
/// Setting `force` to true skips locking and thus ignores if the backup is currently in use.
- pub fn destroy(&self, force: bool) -> Result<(), Error> {
+ pub fn destroy(&self, force: bool, backend: &DatastoreBackend) -> Result<(), Error> {
let (_guard, _manifest_guard);
if !force {
_guard = self
@@ -590,6 +607,19 @@ impl BackupDir {
bail!("cannot remove protected snapshot"); // use special error type?
}
+ if let DatastoreBackend::S3(s3_client) = backend {
+ let path = self.relative_path();
+ let snapshot_prefix = path
+ .to_str()
+ .ok_or_else(|| format_err!("invalid snapshot path"))?;
+ let prefix = format!("{S3_CONTENT_PREFIX}/{snapshot_prefix}");
+ let delete_objects_error =
+ proxmox_async::runtime::block_on(s3_client.delete_objects_by_prefix(&prefix))?;
+ if delete_objects_error {
+ bail!("deleting objects failed");
+ }
+ }
+
let full_path = self.full_path();
log::info!("removing backup snapshot {:?}", full_path);
std::fs::remove_dir_all(&full_path).map_err(|err| {
@@ -619,6 +649,13 @@ impl BackupDir {
// do to rectify the situation.
if guard.is_ok() && group.list_backups()?.is_empty() && !*OLD_LOCKING {
group.remove_group_dir()?;
+ if let DatastoreBackend::S3(s3_client) = backend {
+ let path = group.relative_group_path().join("owner");
+ let owner_key = path
+ .to_str()
+ .ok_or_else(|| format_err!("invalid group path prefix"))?;
+ proxmox_async::runtime::block_on(s3_client.delete_object(owner_key.into()))?;
+ }
} else if let Err(err) = guard {
log::debug!("{err:#}");
}
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 5c8b49947..d016e2139 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -29,6 +29,7 @@ use pbs_api_types::{
S3ClientSecretsConfig, UPID,
};
use pbs_config::BackupLockGuard;
+use pbs_s3_client::S3_CONTENT_PREFIX;
use crate::backup_info::{BackupDir, BackupGroup, BackupInfo, OLD_LOCKING};
use crate::chunk_store::ChunkStore;
@@ -643,7 +644,9 @@ impl DataStore {
let mut stats = BackupGroupDeleteStats::default();
for group in self.iter_backup_groups(ns.to_owned())? {
- let delete_stats = group?.destroy()?;
+ let group = group?;
+ let backend = self.backend()?;
+ let delete_stats = group.destroy(&backend)?;
stats.add(&delete_stats);
removed_all_groups = removed_all_groups && delete_stats.all_removed();
}
@@ -677,6 +680,8 @@ impl DataStore {
let store = self.name();
let mut removed_all_requested = true;
let mut stats = BackupGroupDeleteStats::default();
+ let backend = self.backend()?;
+
if delete_groups {
log::info!("removing whole namespace recursively below {store}:/{ns}",);
for ns in self.recursive_iter_backup_ns(ns.to_owned())? {
@@ -684,6 +689,20 @@ impl DataStore {
stats.add(&delete_stats);
removed_all_requested = removed_all_requested && removed_ns_groups;
}
+
+ if let DatastoreBackend::S3(s3_client) = &backend {
+ let ns_dir = ns.path();
+ let ns_prefix = ns_dir
+ .to_str()
+ .ok_or_else(|| format_err!("invalid namespace path prefix"))?;
+ let prefix = format!("{S3_CONTENT_PREFIX}/{ns_prefix}");
+ let delete_objects_error = proxmox_async::runtime::block_on(
+ s3_client.delete_objects_by_prefix_with_suffix_filter(&prefix, ".protected"),
+ )?;
+ if delete_objects_error {
+ bail!("deleting objects failed");
+ }
+ }
} else {
log::info!("pruning empty namespace recursively below {store}:/{ns}");
}
@@ -719,6 +738,15 @@ impl DataStore {
log::warn!("failed to remove namespace {ns} - {err}")
}
}
+ if let DatastoreBackend::S3(s3_client) = &backend {
+ // Only remove the namespace marker, if it was empty,
+ // than this is the same as the namespace being removed.
+ let ns_dir = ns.path().join(NAMESPACE_MARKER_FILENAME);
+ let ns_key = ns_dir
+ .to_str()
+ .ok_or_else(|| format_err!("invalid namespace path"))?;
+ proxmox_async::runtime::block_on(s3_client.delete_object(ns_key.into()))?;
+ }
}
}
@@ -736,7 +764,7 @@ impl DataStore {
) -> Result<BackupGroupDeleteStats, Error> {
let backup_group = self.backup_group(ns.clone(), backup_group.clone());
- backup_group.destroy()
+ backup_group.destroy(&self.backend()?)
}
/// Remove a backup directory including all content
@@ -748,7 +776,7 @@ impl DataStore {
) -> Result<(), Error> {
let backup_dir = self.backup_dir(ns.clone(), backup_dir.clone())?;
- backup_dir.destroy(force)
+ backup_dir.destroy(force, &self.backend()?)
}
/// Returns the time of the last successful backup
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 7b7f79b22..c62b980d1 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -432,7 +432,7 @@ pub async fn delete_snapshot(
let snapshot = datastore.backup_dir(ns, backup_dir)?;
- snapshot.destroy(false)?;
+ snapshot.destroy(false, &datastore.backend()?)?;
Ok(Value::Null)
})
@@ -1098,13 +1098,21 @@ pub fn prune(
});
if !keep {
- if let Err(err) = backup_dir.destroy(false) {
- warn!(
- "failed to remove dir {:?}: {}",
- backup_dir.relative_path(),
- err,
- );
- }
+ match datastore.backend() {
+ Ok(backend) => {
+ if let Err(err) = backup_dir.destroy(false, &backend) {
+ warn!(
+ "failed to remove dir {:?}: {}",
+ backup_dir.relative_path(),
+ err,
+ );
+ }
+ }
+ Err(err) => warn!(
+ "failed to remove dir {:?}: {err}",
+ backup_dir.relative_path()
+ ),
+ };
}
}
prune_result
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 31/42] datastore: get and set owner for S3 store backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (29 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 30/42] datastore: prune groups/snapshots from S3 object store backend Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 32/42] datastore: implement garbage collection for s3 backend Christian Ebner
` (12 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Read or write the ownership information from/to the corresponding
object in the S3 object store. Keep that information available if
the bucket is reused as datastore.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 39 ++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index d016e2139..52ec8218e 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -816,6 +816,25 @@ impl DataStore {
backup_group: &pbs_api_types::BackupGroup,
) -> Result<Authid, Error> {
let full_path = self.owner_path(ns, backup_group);
+
+ if let DatastoreBackend::S3(s3_client) = self.backend()? {
+ let object_key = format!(
+ "{}/{backup_group}/owner",
+ ns.path()
+ .to_str()
+ .ok_or_else(|| format_err!("unexpected owner path"))?,
+ );
+ let response =
+ proxmox_async::runtime::block_on(s3_client.get_object(object_key.as_str().into()))?
+ .ok_or_else(|| format_err!("fetching owner failed"))?;
+ let content =
+ proxmox_async::runtime::block_on(hyper::body::HttpBody::collect(response.content))?;
+ let owner = String::from_utf8(content.to_bytes().trim_ascii_end().to_vec())?;
+ return owner
+ .parse()
+ .map_err(|err| format_err!("parsing owner for {backup_group} failed: {err}"));
+ }
+
let owner = proxmox_sys::fs::file_read_firstline(full_path)?;
owner
.trim_end() // remove trailing newline
@@ -844,6 +863,26 @@ impl DataStore {
) -> Result<(), Error> {
let path = self.owner_path(ns, backup_group);
+ if let DatastoreBackend::S3(s3_client) = self.backend()? {
+ let object_key = format!(
+ "{}/{backup_group}/owner",
+ ns.path()
+ .to_str()
+ .ok_or_else(|| format_err!("unexpected owner path"))?,
+ );
+ let data = hyper::body::Body::from(format!("{auth_id}\n"));
+ let response = proxmox_async::runtime::block_on(
+ s3_client.put_object(object_key.as_str().into(), data),
+ )?;
+ match response {
+ PutObjectResponse::NeedsRetry => bail!("failed to set owner, needs retry"),
+ PutObjectResponse::PreconditionFailed => {
+ bail!("failed to set owner, precondition failed")
+ }
+ PutObjectResponse::Success(_) => (),
+ }
+ }
+
let mut open_options = std::fs::OpenOptions::new();
open_options.write(true);
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 32/42] datastore: implement garbage collection for s3 backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (30 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 31/42] datastore: get and set owner for S3 " Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 33/42] ui: add S3 client edit window for configuration create/edit Christian Ebner
` (11 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Implements the garbage collection for datastore's backed by an s3
object store.
Take advantage of the local datastore by placing marker files in the
chunk store during phase 1 of the garbage collection, updating their
atime if already present. By this expensive api calls can be avoided
to update the object metadata (only possible via a copy object
operation).
The phase 2 is implemented by fetching a list of all the chunks via
the ListObjectsV2 api call, filtered by the chunk folder prefix.
This operation has to be performed in patches of 1000 objects, given
by the api's response limits.
For each object key, lookup the marker file and decide based on the
marker existence and it's atime if the chunk object needs to be
removed. Deletion happens via the delete objects operation, allowing
to delete multiple chunks by a single request.
This allows to efficiently lookup chunks which are not in use
anymore while being performant and cost effective.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 203 +++++++++++++++++++++++++++++----
1 file changed, 178 insertions(+), 25 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 52ec8218e..c940c935e 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -4,7 +4,7 @@ use std::os::unix::ffi::OsStrExt;
use std::os::unix::io::AsRawFd;
use std::path::{Path, PathBuf};
use std::sync::{Arc, LazyLock, Mutex};
-use std::time::Duration;
+use std::time::{Duration, SystemTime};
use anyhow::{bail, format_err, Context, Error};
use nix::unistd::{unlinkat, UnlinkatFlags};
@@ -1204,6 +1204,7 @@ impl DataStore {
chunk_lru_cache: &mut LruCache<[u8; 32], ()>,
status: &mut GarbageCollectionStatus,
worker: &dyn WorkerTaskContext,
+ s3_client: Option<Arc<S3Client>>,
) -> Result<(), Error> {
status.index_file_count += 1;
status.index_data_bytes += index.index_bytes();
@@ -1218,21 +1219,41 @@ impl DataStore {
continue;
}
- if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
- let hex = hex::encode(digest);
- warn!(
- "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
- );
-
- // touch any corresponding .bad files to keep them around, meaning if a chunk is
- // rewritten correctly they will be removed automatically, as well as if no index
- // file requires the chunk anymore (won't get to this loop then)
- for i in 0..=9 {
- let bad_ext = format!("{}.bad", i);
- let mut bad_path = PathBuf::new();
- bad_path.push(self.chunk_path(digest).0);
- bad_path.set_extension(bad_ext);
- self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
+ match s3_client {
+ None => {
+ // Filesystem backend
+ if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
+ let hex = hex::encode(digest);
+ warn!(
+ "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
+ );
+
+ // touch any corresponding .bad files to keep them around, meaning if a chunk is
+ // rewritten correctly they will be removed automatically, as well as if no index
+ // file requires the chunk anymore (won't get to this loop then)
+ for i in 0..=9 {
+ let bad_ext = format!("{}.bad", i);
+ let mut bad_path = PathBuf::new();
+ bad_path.push(self.chunk_path(digest).0);
+ bad_path.set_extension(bad_ext);
+ self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
+ }
+ }
+ }
+ Some(ref _s3_client) => {
+ // Update atime on local cache marker files.
+ if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
+ let (chunk_path, _digest) = self.chunk_path(digest);
+ // Insert empty file as marker to tell GC phase2 that this is
+ // a chunk still in-use, so to keep in the S3 object store.
+ std::fs::File::options()
+ .write(true)
+ .create_new(true)
+ .open(chunk_path)
+ .with_context(|| {
+ format!("failed to create marker for chunk {}", hex::encode(digest))
+ })?;
+ }
}
}
}
@@ -1244,6 +1265,7 @@ impl DataStore {
status: &mut GarbageCollectionStatus,
worker: &dyn WorkerTaskContext,
cache_capacity: usize,
+ s3_client: Option<Arc<S3Client>>,
) -> Result<(), Error> {
// Iterate twice over the datastore to fetch index files, even if this comes with an
// additional runtime cost:
@@ -1333,6 +1355,7 @@ impl DataStore {
&mut chunk_lru_cache,
status,
worker,
+ s3_client.as_ref().cloned(),
)?;
if !unprocessed_index_list.remove(&path) {
@@ -1367,7 +1390,14 @@ impl DataStore {
continue;
}
};
- self.index_mark_used_chunks(index, &path, &mut chunk_lru_cache, status, worker)?;
+ self.index_mark_used_chunks(
+ index,
+ &path,
+ &mut chunk_lru_cache,
+ status,
+ worker,
+ s3_client.as_ref().cloned(),
+ )?;
warn!("Marked chunks for unexpected index file at '{path:?}'");
}
if strange_paths_count > 0 {
@@ -1465,18 +1495,141 @@ impl DataStore {
1024 * 1024
};
- info!("Start GC phase1 (mark used chunks)");
+ let s3_client = match self.backend()? {
+ DatastoreBackend::Filesystem => None,
+ DatastoreBackend::S3(s3_client) => {
+ proxmox_async::runtime::block_on(s3_client.head_bucket())
+ .context("failed to reach bucket")?;
+ Some(s3_client)
+ }
+ };
- self.mark_used_chunks(&mut gc_status, worker, gc_cache_capacity)
- .context("marking used chunks failed")?;
+ info!("Start GC phase1 (mark used chunks)");
- info!("Start GC phase2 (sweep unused chunks)");
- self.inner.chunk_store.sweep_unused_chunks(
- oldest_writer,
- min_atime,
+ self.mark_used_chunks(
&mut gc_status,
worker,
- )?;
+ gc_cache_capacity,
+ s3_client.as_ref().cloned(),
+ )
+ .context("marking used chunks failed")?;
+
+ info!("Start GC phase2 (sweep unused chunks)");
+
+ if let Some(ref s3_client) = s3_client {
+ let mut chunk_count = 0;
+ let prefix = Some(".chunks/");
+ // Operates in batches of 1000 objects max per request
+ let mut list_bucket_result = proxmox_async::runtime::block_on(
+ s3_client.list_objects_v2(prefix, None, None),
+ )?;
+
+ let mut delete_list = Vec::with_capacity(1000);
+ loop {
+ for content in list_bucket_result.contents {
+ // Check object is actually a chunk
+ let digest = match Path::new(&content.key).file_name() {
+ Some(file_name) => file_name,
+ // should never be the case as objects will have a filename
+ None => continue,
+ };
+ let bytes = digest.as_bytes();
+ if bytes.len() != 64 && bytes.len() != 64 + ".0.bad".len() {
+ continue;
+ }
+ if !bytes.iter().take(64).all(u8::is_ascii_hexdigit) {
+ continue;
+ }
+
+ let bad = bytes.ends_with(b".bad");
+
+ // Check local markers (created or atime updated during phase1) and
+ // keep or delete chunk based on that.
+
+ let mut chunk_path = self.base_path();
+ chunk_path.push(&content.key);
+ let atime = match std::fs::metadata(chunk_path) {
+ Ok(stat) => stat.accessed()?,
+ Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
+ // File not found, delete by setting atime to unix epoch
+ info!("Not found, mark for deletion: {}", content.key);
+ SystemTime::UNIX_EPOCH
+ }
+ Err(err) => return Err(err.into()),
+ };
+ let atime = atime.duration_since(SystemTime::UNIX_EPOCH)?.as_secs() as i64;
+
+ chunk_count += 1;
+
+ if atime < min_atime {
+ delete_list.push(content.key);
+ if bad {
+ gc_status.removed_bad += 1;
+ } else {
+ gc_status.removed_chunks += 1;
+ }
+ gc_status.removed_bytes += content.size;
+ } else if atime < oldest_writer {
+ if bad {
+ gc_status.still_bad += 1;
+ } else {
+ gc_status.pending_chunks += 1;
+ }
+ gc_status.pending_bytes += content.size;
+ } else {
+ if !bad {
+ gc_status.disk_chunks += 1;
+ }
+ gc_status.disk_bytes += content.size;
+ }
+ }
+
+ if !delete_list.is_empty() {
+ let delete_objects_result = proxmox_async::runtime::block_on(
+ s3_client.delete_objects(&delete_list),
+ )?;
+ if let Some(_err) = delete_objects_result.error {
+ bail!("failed to delete some objects");
+ }
+ delete_list.clear();
+ }
+
+ // Process next batch of chunks if there is more
+ if list_bucket_result.is_truncated {
+ list_bucket_result =
+ proxmox_async::runtime::block_on(s3_client.list_objects_v2(
+ prefix,
+ None,
+ list_bucket_result.next_continuation_token.as_deref(),
+ ))?;
+ continue;
+ }
+
+ break;
+ }
+ info!("processed {chunk_count} total chunks");
+
+ // Phase 2 GC of Filesystem backed storage is phase 3 for S3 backed GC
+ info!("Start GC phase3 (sweep unused chunk markers)");
+
+ let mut tmp_gc_status = GarbageCollectionStatus {
+ upid: Some(upid.to_string()),
+ ..Default::default()
+ };
+ self.inner.chunk_store.sweep_unused_chunks(
+ oldest_writer,
+ min_atime,
+ &mut tmp_gc_status,
+ worker,
+ )?;
+ } else {
+ self.inner.chunk_store.sweep_unused_chunks(
+ oldest_writer,
+ min_atime,
+ &mut gc_status,
+ worker,
+ )?;
+ }
info!(
"Removed garbage: {}",
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 33/42] ui: add S3 client edit window for configuration create/edit
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (31 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 32/42] datastore: implement garbage collection for s3 backend Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 34/42] ui: add S3 client view for configuration Christian Ebner
` (10 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Adds an edit window for creating or editing S3 client configurations.
Loosely based on the same edit window for the remote configuration.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
www/window/S3BucketEdit.js | 125 +++++++++++++++++++++++++++++++++++++
1 file changed, 125 insertions(+)
create mode 100644 www/window/S3BucketEdit.js
diff --git a/www/window/S3BucketEdit.js b/www/window/S3BucketEdit.js
new file mode 100644
index 000000000..1491ddbe5
--- /dev/null
+++ b/www/window/S3BucketEdit.js
@@ -0,0 +1,125 @@
+Ext.define('PBS.window.S3BucketEdit', {
+ extend: 'Proxmox.window.Edit',
+ alias: 'widget.pbsS3BucketEdit',
+ mixins: ['Proxmox.Mixin.CBind'],
+
+ onlineHelp: 'backup_s3bucket',
+
+ isAdd: true,
+
+ subject: gettext('S3 Bucket'),
+
+ fieldDefaults: { labelWidth: 120 },
+
+ cbindData: function(initialConfig) {
+ let me = this;
+
+ let baseurl = '/api2/extjs/config/s3';
+ let id = initialConfig.id;
+
+ me.isCreate = !id;
+ me.url = id ? `${baseurl}/${id}` : baseurl;
+ me.method = id ? 'PUT' : 'POST';
+ me.autoLoad = !!id;
+ return {
+ passwordEmptyText: me.isCreate ? '' : gettext('Unchanged'),
+ };
+ },
+
+ items: {
+ xtype: 'inputpanel',
+ column1: [
+ {
+ xtype: 'pmxDisplayEditField',
+ name: 'id',
+ fieldLabel: gettext('Unique Identifier'),
+ renderer: Ext.htmlEncode,
+ allowBlank: false,
+ minLength: 4,
+ cbind: {
+ editable: '{isCreate}',
+ },
+ },
+ {
+ xtype: 'proxmoxtextfield',
+ name: 'host',
+ fieldLabel: gettext('Host'),
+ allowBlank: false,
+ emptyText: gettext('FQDN or IP-address'),
+ },
+ {
+ xtype: 'proxmoxtextfield',
+ name: 'port',
+ fieldLabel: gettext('Port'),
+ emptyText: gettext("default"),
+ cbind: {
+ deleteEmpty: '{!isCreate}',
+ },
+ },
+ ],
+
+ column2: [
+ {
+ xtype: 'proxmoxtextfield',
+ name: 'bucket',
+ fieldLabel: gettext('Bucket'),
+ allowBlank: false,
+ },
+ {
+ xtype: 'proxmoxtextfield',
+ name: 'region',
+ fieldLabel: gettext('Region'),
+ emptyText: gettext("default"),
+ },
+ {
+ xtype: 'proxmoxtextfield',
+ name: 'access-key',
+ fieldLabel: gettext('Access Key'),
+ cbind: {
+ emptyText: '{passwordEmptyText}',
+ allowBlank: '{!isCreate}',
+ },
+ },
+ {
+ xtype: 'textfield',
+ name: 'secret-key',
+ inputType: 'password',
+ fieldLabel: gettext('Secret Key'),
+ cbind: {
+ emptyText: '{passwordEmptyText}',
+ allowBlank: '{!isCreate}',
+ },
+ },
+ ],
+
+ columnB: [
+ {
+ xtype: 'proxmoxtextfield',
+ name: 'fingerprint',
+ fieldLabel: gettext('Fingerprint'),
+ emptyText: gettext("Server certificate's SHA-256 fingerprint, required for self-signed certificates"),
+ cbind: {
+ deleteEmpty: '{!isCreate}',
+ },
+ },
+ ],
+ },
+
+ getValues: function() {
+ let me = this;
+ let values = me.callParent(arguments);
+
+ if (me.isCreate) {
+ /// Secrets are stored into separate config, but set the same id for both configs
+ values['secrets-id'] = values.id;
+ }
+ if (values['access-key'] === '') {
+ delete values['access-key']
+ }
+ if (values['secret-key'] === '') {
+ delete values['secret-key']
+ }
+
+ return values;
+ },
+});
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 34/42] ui: add S3 client view for configuration
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (32 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 33/42] ui: add S3 client edit window for configuration create/edit Christian Ebner
@ 2025-05-29 14:31 ` Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 35/42] ui: expose the S3 client view in the navigation tree Christian Ebner
` (9 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:31 UTC (permalink / raw)
To: pbs-devel
Adds the view to configure S3 clients in the Configuration section of
the UI.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
www/config/S3BucketView.js | 144 +++++++++++++++++++++++++++++++++++++
1 file changed, 144 insertions(+)
create mode 100644 www/config/S3BucketView.js
diff --git a/www/config/S3BucketView.js b/www/config/S3BucketView.js
new file mode 100644
index 000000000..85ac6c49c
--- /dev/null
+++ b/www/config/S3BucketView.js
@@ -0,0 +1,144 @@
+Ext.define('pmx-s3bucket', {
+ extend: 'Ext.data.Model',
+ fields: ['id', 'host', 'bucket', 'port', 'access-key', 'secret-key', 'region', 'fingerprint'],
+ idProperty: 'id',
+ proxy: {
+ type: 'proxmox',
+ url: '/api2/json/config/s3',
+ },
+});
+
+Ext.define('PBS.config.S3BucketView', {
+ extend: 'Ext.grid.GridPanel',
+ alias: 'widget.pbsS3BucketView',
+
+ title: gettext('S3 Buckets'),
+
+ stateful: true,
+ stateId: 'grid-s3buckets',
+ tools: [PBS.Utils.get_help_tool("backup-s3-bucket")],
+
+ controller: {
+ xclass: 'Ext.app.ViewController',
+
+ addS3Bucket: function() {
+ let me = this;
+ Ext.create('PBS.window.S3BucketEdit', {
+ listeners: {
+ destroy: function() {
+ me.reload();
+ },
+ },
+ }).show();
+ },
+
+ editS3Bucket: function() {
+ let me = this;
+ let view = me.getView();
+ let selection = view.getSelection();
+ if (selection.length < 1) return;
+
+ Ext.create('PBS.window.S3BucketEdit', {
+ id: selection[0].data.id,
+ listeners: {
+ destroy: function() {
+ me.reload();
+ },
+ },
+ }).show();
+ },
+
+ reload: function() { this.getView().getStore().rstore.load(); },
+
+ init: function(view) {
+ Proxmox.Utils.monStoreErrors(view, view.getStore().rstore);
+ },
+ },
+
+ listeners: {
+ activate: 'reload',
+ itemdblclick: 'editS3Bucket',
+ },
+
+ store: {
+ type: 'diff',
+ autoDestroy: true,
+ autoDestroyRstore: true,
+ sorters: 'id',
+ rstore: {
+ type: 'update',
+ storeid: 'pmx-s3bucket',
+ model: 'pmx-s3bucket',
+ autoStart: true,
+ interval: 5000,
+ },
+ },
+
+ tbar: [
+ {
+ xtype: 'proxmoxButton',
+ text: gettext('Add'),
+ handler: 'addS3Bucket',
+ selModel: false,
+ },
+ {
+ xtype: 'proxmoxButton',
+ text: gettext('Edit'),
+ handler: 'editS3Bucket',
+ disabled: true,
+ },
+ {
+ xtype: 'proxmoxStdRemoveButton',
+ baseurl: '/config/s3',
+ callback: 'reload',
+ },
+ ],
+
+ viewConfig: {
+ trackOver: false,
+ },
+
+ columns: [
+ {
+ dataIndex: 'id',
+ header: gettext('Unique Identifier'),
+ renderer: Ext.String.htmlEncode,
+ sortable: true,
+ width: 200,
+ },
+ {
+ dataIndex: 'bucket',
+ header: gettext('Bucket'),
+ renderer: Ext.String.htmlEncode,
+ sortable: true,
+ width: 200,
+ },
+ {
+ dataIndex: 'host',
+ header: gettext('Host'),
+ sortable: true,
+ width: 200,
+ },
+ {
+ dataIndex: 'port',
+ header: gettext('Port'),
+ renderer: Ext.String.htmlEncode,
+ sortable: true,
+ width: 100,
+ },
+ {
+ dataIndex: 'region',
+ header: gettext('Region'),
+ renderer: Ext.String.htmlEncode,
+ sortable: true,
+ width: 100,
+ },
+ {
+ dataIndex: 'fingerprint',
+ header: gettext('Fingerprint'),
+ renderer: Ext.String.htmlEncode,
+ sortable: false,
+ flex: 1,
+ },
+ ],
+});
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 35/42] ui: expose the S3 client view in the navigation tree
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (33 preceding siblings ...)
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 34/42] ui: add S3 client view for configuration Christian Ebner
@ 2025-05-29 14:32 ` Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 36/42] ui: add s3 bucket selector and allow to set s3 backend Christian Ebner
` (8 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:32 UTC (permalink / raw)
To: pbs-devel
Add a `S3 Clients` item to the navigation tree to allow accessing the
S3 client configuration view and edit windows.
Adds the required source files to the Makefile.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
www/Makefile | 2 ++
www/NavigationTree.js | 6 ++++++
2 files changed, 8 insertions(+)
diff --git a/www/Makefile b/www/Makefile
index 44c5fa133..ca4683941 100644
--- a/www/Makefile
+++ b/www/Makefile
@@ -61,6 +61,7 @@ JSSRC= \
config/RemoteView.js \
config/TrafficControlView.js \
config/ACLView.js \
+ config/S3BucketView.js \
config/SyncView.js \
config/VerifyView.js \
config/PruneView.js \
@@ -85,6 +86,7 @@ JSSRC= \
window/PruneJobEdit.js \
window/GCJobEdit.js \
window/UserEdit.js \
+ window/S3BucketEdit.js \
window/Settings.js \
window/TokenEdit.js \
window/VerifyJobEdit.js \
diff --git a/www/NavigationTree.js b/www/NavigationTree.js
index f10b0cd63..c79797d79 100644
--- a/www/NavigationTree.js
+++ b/www/NavigationTree.js
@@ -80,6 +80,12 @@ Ext.define('PBS.store.NavigationStore', {
path: 'pbsSubscription',
leaf: true,
},
+ {
+ text: gettext('S3 Buckets'),
+ iconCls: 'fa fa-trash',
+ path: 'pbsS3BucketView',
+ leaf: true,
+ },
],
},
{
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 36/42] ui: add s3 bucket selector and allow to set s3 backend
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (34 preceding siblings ...)
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 35/42] ui: expose the S3 client view in the navigation tree Christian Ebner
@ 2025-05-29 14:32 ` Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 37/42] tools: lru cache: add removed callback for evicted cache nodes Christian Ebner
` (7 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:32 UTC (permalink / raw)
To: pbs-devel
In order to be able to create datastore with an s3 object store
backend. Implements a bucket selector and exposes it in the advanced
options of the datastore edit window.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
www/Makefile | 1 +
www/form/S3BucketSelector.js | 40 ++++++++++++++++++++++++++++++++++++
www/window/DataStoreEdit.js | 35 +++++++++++++++++++++++++++++++
3 files changed, 76 insertions(+)
create mode 100644 www/form/S3BucketSelector.js
diff --git a/www/Makefile b/www/Makefile
index ca4683941..41deeee00 100644
--- a/www/Makefile
+++ b/www/Makefile
@@ -42,6 +42,7 @@ JSSRC= \
Schema.js \
form/TokenSelector.js \
form/AuthidSelector.js \
+ form/S3BucketSelector.js \
form/RemoteSelector.js \
form/RemoteTargetSelector.js \
form/DataStoreSelector.js \
diff --git a/www/form/S3BucketSelector.js b/www/form/S3BucketSelector.js
new file mode 100644
index 000000000..c9905feb9
--- /dev/null
+++ b/www/form/S3BucketSelector.js
@@ -0,0 +1,40 @@
+Ext.define('PBS.form.S3BucketSelector', {
+ extend: 'Proxmox.form.ComboGrid',
+ alias: 'widget.pbsS3BucketSelector',
+
+ allowBlank: false,
+ autoSelect: false,
+ valueField: 'id',
+ displayField: 'id',
+
+ store: {
+ model: 'pmx-s3bucket',
+ autoLoad: true,
+ sorters: 'id',
+ },
+
+ listConfig: {
+ columns: [
+ {
+ header: gettext('S3 Bucket ID'),
+ sortable: true,
+ dataIndex: 'id',
+ renderer: Ext.String.htmlEncode,
+ flex: 1,
+ },
+ {
+ header: gettext('Bucket'),
+ sortable: true,
+ dataIndex: 'bucket',
+ renderer: Ext.String.htmlEncode,
+ flex: 1,
+ },
+ {
+ header: gettext('Host'),
+ sortable: true,
+ dataIndex: 'host',
+ flex: 1,
+ },
+ ],
+ },
+});
diff --git a/www/window/DataStoreEdit.js b/www/window/DataStoreEdit.js
index 4a0b8d819..dffd2b2e0 100644
--- a/www/window/DataStoreEdit.js
+++ b/www/window/DataStoreEdit.js
@@ -101,6 +101,7 @@ Ext.define('PBS.DataStoreEdit', {
columnB: [
{
xtype: 'checkbox',
+ name: 'removable-datastore',
boxLabel: gettext('Removable datastore'),
submitValue: false,
listeners: {
@@ -135,6 +136,37 @@ Ext.define('PBS.DataStoreEdit', {
fieldLabel: gettext('Reuse existing datastore'),
},
],
+ advancedColumn2: [
+ {
+ xtype: 'checkbox',
+ boxLabel: gettext('With S3 object store'),
+ submitValue: false,
+ listeners: {
+ change: function(checkbox, withS3Backend) {
+ let inputPanel = checkbox.up('inputpanel');
+
+ let bucketSelector = inputPanel.down('[name=backend]');
+ bucketSelector.setDisabled(!withS3Backend);
+ bucketSelector.allowBlank = !withS3Backend;
+ bucketSelector.setValue('');
+
+ let removableDatastore = inputPanel.down('[name=removable-datastore]');
+ removableDatastore.setDisabled(withS3Backend);
+ removableDatastore.allowBlank = withS3Backend;
+ removableDatastore.setValue('');
+ },
+ },
+ },
+ {
+ xtype: 'pbsS3BucketSelector',
+ name: 'backend',
+ fieldLabel: gettext('S3 Bucket ID'),
+ disabled: true,
+ cbind: {
+ editable: '{isCreate}',
+ },
+ },
+ ],
onGetValues: function(values) {
let me = this;
@@ -143,6 +175,9 @@ Ext.define('PBS.DataStoreEdit', {
// New datastores default to using the notification system
values['notification-mode'] = 'notification-system';
}
+ if (values.backend) {
+ values.backend = PBS.Utils.printPropertyString({ 's3': values.backend });
+ }
return values;
},
},
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 37/42] tools: lru cache: add removed callback for evicted cache nodes
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (35 preceding siblings ...)
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 36/42] ui: add s3 bucket selector and allow to set s3 backend Christian Ebner
@ 2025-05-29 14:32 ` Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 38/42] tools: async lru cache: implement insert, remove and contains methods Christian Ebner
` (6 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:32 UTC (permalink / raw)
To: pbs-devel
Add a callback function to be executed on evicted cache nodes. The
callback gets the key of the removed node, allowing to externally act
based on that value.
Since the callback might fail, extend the current LRU cache api to
return an error on insert, covering the error for the `removed`
callback.
Async lru cache, callsites and tests are adapted to include the
additional callback parameter accordingly.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/cached_chunk_reader.rs | 6 +++-
pbs-datastore/src/datastore.rs | 2 +-
pbs-datastore/src/dynamic_index.rs | 1 +
pbs-tools/src/async_lru_cache.rs | 23 +++++++++----
pbs-tools/src/lru_cache.rs | 42 +++++++++++++++---------
5 files changed, 50 insertions(+), 24 deletions(-)
diff --git a/pbs-datastore/src/cached_chunk_reader.rs b/pbs-datastore/src/cached_chunk_reader.rs
index be7f2a1e2..95ac23a54 100644
--- a/pbs-datastore/src/cached_chunk_reader.rs
+++ b/pbs-datastore/src/cached_chunk_reader.rs
@@ -81,7 +81,11 @@ impl<I: IndexFile, R: AsyncReadChunk + Send + Sync + 'static> CachedChunkReader<
let info = self.index.chunk_info(chunk.0).unwrap();
// will never be None, see AsyncChunkCacher
- let data = self.cache.access(info.digest, &self.cacher).await?.unwrap();
+ let data = self
+ .cache
+ .access(info.digest, &self.cacher, |_| Ok(()))
+ .await?
+ .unwrap();
let want_bytes = ((info.range.end - cur_offset) as usize).min(size - read);
let slice = &mut buf[read..(read + want_bytes)];
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index c940c935e..c3ac63b32 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1215,7 +1215,7 @@ impl DataStore {
let digest = index.index_digest(pos).unwrap();
// Avoid multiple expensive atime updates by utimensat
- if chunk_lru_cache.insert(*digest, ()) {
+ if chunk_lru_cache.insert(*digest, (), |_| Ok(()))? {
continue;
}
diff --git a/pbs-datastore/src/dynamic_index.rs b/pbs-datastore/src/dynamic_index.rs
index 8e9cb1163..e9d28c7de 100644
--- a/pbs-datastore/src/dynamic_index.rs
+++ b/pbs-datastore/src/dynamic_index.rs
@@ -599,6 +599,7 @@ impl<S: ReadChunk> BufferedDynamicReader<S> {
store: &mut self.store,
index: &self.index,
},
+ |_| Ok(()),
)?
.ok_or_else(|| format_err!("chunk not found by cacher"))?;
diff --git a/pbs-tools/src/async_lru_cache.rs b/pbs-tools/src/async_lru_cache.rs
index c43b87717..141114933 100644
--- a/pbs-tools/src/async_lru_cache.rs
+++ b/pbs-tools/src/async_lru_cache.rs
@@ -42,7 +42,16 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V: Clone + Send + 'static> AsyncL
/// Access an item either via the cache or by calling cacher.fetch. A return value of Ok(None)
/// means the item requested has no representation, Err(_) means a call to fetch() failed,
/// regardless of whether it was initiated by this call or a previous one.
- pub async fn access(&self, key: K, cacher: &dyn AsyncCacher<K, V>) -> Result<Option<V>, Error> {
+ /// Calls the removed callback on the evicted item, if any.
+ pub async fn access<F>(
+ &self,
+ key: K,
+ cacher: &dyn AsyncCacher<K, V>,
+ removed: F,
+ ) -> Result<Option<V>, Error>
+ where
+ F: Fn(K) -> Result<(), Error>,
+ {
let (owner, result_fut) = {
// check if already requested
let mut maps = self.maps.lock().unwrap();
@@ -71,7 +80,7 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V: Clone + Send + 'static> AsyncL
// this call was the one initiating the request, put into LRU and remove from map
let mut maps = self.maps.lock().unwrap();
if let Ok(Some(ref value)) = result {
- maps.0.insert(key, value.clone());
+ maps.0.insert(key, value.clone(), removed)?;
}
maps.1.remove(&key);
}
@@ -106,15 +115,15 @@ mod test {
let cache: AsyncLruCache<i32, String> = AsyncLruCache::new(2);
assert_eq!(
- cache.access(10, &cacher).await.unwrap(),
+ cache.access(10, &cacher, |_| Ok(())).await.unwrap(),
Some("x10".to_string())
);
assert_eq!(
- cache.access(20, &cacher).await.unwrap(),
+ cache.access(20, &cacher, |_| Ok(())).await.unwrap(),
Some("x20".to_string())
);
assert_eq!(
- cache.access(30, &cacher).await.unwrap(),
+ cache.access(30, &cacher, |_| Ok(())).await.unwrap(),
Some("x30".to_string())
);
@@ -123,14 +132,14 @@ mod test {
tokio::spawn(async move {
let cacher = TestAsyncCacher { prefix: "y" };
assert_eq!(
- c.access(40, &cacher).await.unwrap(),
+ c.access(40, &cacher, |_| Ok(())).await.unwrap(),
Some("y40".to_string())
);
});
}
assert_eq!(
- cache.access(20, &cacher).await.unwrap(),
+ cache.access(20, &cacher, |_| Ok(())).await.unwrap(),
Some("x20".to_string())
);
});
diff --git a/pbs-tools/src/lru_cache.rs b/pbs-tools/src/lru_cache.rs
index 9e0112647..53b84ec41 100644
--- a/pbs-tools/src/lru_cache.rs
+++ b/pbs-tools/src/lru_cache.rs
@@ -60,10 +60,10 @@ impl<K, V> CacheNode<K, V> {
/// assert_eq!(cache.get_mut(1), None);
/// assert_eq!(cache.len(), 0);
///
-/// cache.insert(1, 1);
-/// cache.insert(2, 2);
-/// cache.insert(3, 3);
-/// cache.insert(4, 4);
+/// cache.insert(1, 1, |_| Ok(()));
+/// cache.insert(2, 2, |_| Ok(()));
+/// cache.insert(3, 3, |_| Ok(()));
+/// cache.insert(4, 4, |_| Ok(()));
/// assert_eq!(cache.len(), 3);
///
/// assert_eq!(cache.get_mut(1), None);
@@ -77,9 +77,9 @@ impl<K, V> CacheNode<K, V> {
/// assert_eq!(cache.len(), 0);
/// assert_eq!(cache.get_mut(2), None);
/// // access will fill in missing cache entry by fetching from LruCacher
-/// assert_eq!(cache.access(2, &mut LruCacher {}).unwrap(), Some(&mut 2));
+/// assert_eq!(cache.access(2, &mut LruCacher {}, |_| Ok(())).unwrap(), Some(&mut 2));
///
-/// cache.insert(1, 1);
+/// cache.insert(1, 1, |_| Ok(()));
/// assert_eq!(cache.get_mut(1), Some(&mut 1));
///
/// cache.clear();
@@ -133,7 +133,10 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V> LruCache<K, V> {
/// Insert or update an entry identified by `key` with the given `value`.
/// This entry is placed as the most recently used node at the head.
- pub fn insert(&mut self, key: K, value: V) -> bool {
+ pub fn insert<F>(&mut self, key: K, value: V, removed: F) -> Result<bool, anyhow::Error>
+ where
+ F: Fn(K) -> Result<(), anyhow::Error>,
+ {
match self.map.entry(key) {
Entry::Occupied(mut o) => {
// Node present, update value
@@ -142,7 +145,7 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V> LruCache<K, V> {
let mut node = unsafe { Box::from_raw(node_ptr) };
node.value = value;
let _node_ptr = Box::into_raw(node);
- true
+ Ok(true)
}
Entry::Vacant(v) => {
// Node not present, insert a new one
@@ -158,9 +161,11 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V> LruCache<K, V> {
// avoid borrow conflict. This means there are temporarily
// self.capacity + 1 cache nodes.
if self.map.len() > self.capacity {
- self.pop_tail();
+ if let Some(removed_node) = self.pop_tail() {
+ removed(removed_node)?;
+ }
}
- false
+ Ok(false)
}
}
}
@@ -174,11 +179,12 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V> LruCache<K, V> {
}
/// Remove the least recently used node from the cache.
- fn pop_tail(&mut self) {
+ fn pop_tail(&mut self) -> Option<K> {
if let Some(old_tail) = self.list.pop_tail() {
// Remove HashMap entry for old tail
- self.map.remove(&old_tail.key);
+ return self.map.remove(&old_tail.key).map(|_| old_tail.key);
}
+ None
}
/// Get a mutable reference to the value identified by `key`.
@@ -206,11 +212,15 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V> LruCache<K, V> {
/// value.
/// If fetch returns a value, it is inserted as the most recently used entry
/// in the cache.
- pub fn access<'a>(
+ pub fn access<'a, F>(
&'a mut self,
key: K,
cacher: &mut dyn Cacher<K, V>,
- ) -> Result<Option<&'a mut V>, anyhow::Error> {
+ removed: F,
+ ) -> Result<Option<&'a mut V>, anyhow::Error>
+ where
+ F: Fn(K) -> Result<(), anyhow::Error>,
+ {
match self.map.entry(key) {
Entry::Occupied(mut o) => {
// Cache hit, birng node to front of list
@@ -234,7 +244,9 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V> LruCache<K, V> {
// avoid borrow conflict. This means there are temporarily
// self.capacity + 1 cache nodes.
if self.map.len() > self.capacity {
- self.pop_tail();
+ if let Some(removed_node) = self.pop_tail() {
+ removed(removed_node)?;
+ }
}
}
}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 38/42] tools: async lru cache: implement insert, remove and contains methods
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (36 preceding siblings ...)
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 37/42] tools: lru cache: add removed callback for evicted cache nodes Christian Ebner
@ 2025-05-29 14:32 ` Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 39/42] datastore: add local datastore cache for network attached storages Christian Ebner
` (5 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:32 UTC (permalink / raw)
To: pbs-devel
Add methods to insert new cache entries without using the cacher,
remove cache entries given their key and check if the cache contains
a key, marking it the most recently used one if it does.
These methods will be used to implement the local datastore cache
which stores the values (chunks) on the filesystem rather than
keeping track of them by storing them in-memory in the cache. The lru
cache will only be used to allow for fast lookup and keep track of
the lookup order.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-tools/src/async_lru_cache.rs | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/pbs-tools/src/async_lru_cache.rs b/pbs-tools/src/async_lru_cache.rs
index 141114933..3a975de32 100644
--- a/pbs-tools/src/async_lru_cache.rs
+++ b/pbs-tools/src/async_lru_cache.rs
@@ -87,6 +87,29 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V: Clone + Send + 'static> AsyncL
result
}
+
+ /// Insert an item as the most recently used one into the cache, calling the removed callback
+ /// on the evicted cache item, if any.
+ pub fn insert<F>(&self, key: K, value: V, removed: F) -> Result<(), Error>
+ where
+ F: Fn(K) -> Result<(), Error>,
+ {
+ let mut maps = self.maps.lock().unwrap();
+ maps.0.insert(key, value.clone(), removed)?;
+ Ok(())
+ }
+
+ /// Check if the item exists and if so, mark it as the most recently uses one.
+ pub fn contains(&self, key: K) -> bool {
+ let mut maps = self.maps.lock().unwrap();
+ maps.0.get_mut(key).is_some()
+ }
+
+ /// Remove the item from the cache.
+ pub fn remove(&self, key: K) {
+ let mut maps = self.maps.lock().unwrap();
+ maps.0.remove(key);
+ }
}
#[cfg(test)]
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 39/42] datastore: add local datastore cache for network attached storages
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (37 preceding siblings ...)
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 38/42] tools: async lru cache: implement insert, remove and contains methods Christian Ebner
@ 2025-05-29 14:32 ` Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 40/42] api: backup: use local datastore cache on S3 backend chunk upload Christian Ebner
` (4 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:32 UTC (permalink / raw)
To: pbs-devel
Use a local datastore as cache using LRU cache replacement policy for
operations on a datastore backed by a network, e.g. by an S3 object
store backend. The goal is to reduce number of requests to the
backend and thereby save costs (monetary as well as time).
The cacher allows to fetch cache items on cache misses via the access
method.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 46 ++++++-
pbs-datastore/src/lib.rs | 3 +
.../src/local_datastore_lru_cache.rs | 116 ++++++++++++++++++
3 files changed, 164 insertions(+), 1 deletion(-)
create mode 100644 pbs-datastore/src/local_datastore_lru_cache.rs
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index c3ac63b32..409aec74c 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -37,8 +37,9 @@ use crate::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
use crate::fixed_index::{FixedIndexReader, FixedIndexWriter};
use crate::hierarchy::{ListGroups, ListGroupsType, ListNamespaces, ListNamespacesRecursive};
use crate::index::IndexFile;
+use crate::local_datastore_lru_cache::S3Cacher;
use crate::task_tracking::{self, update_active_operations};
-use crate::DataBlob;
+use crate::{DataBlob, LocalDatastoreLruCache};
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
@@ -131,6 +132,7 @@ pub struct DataStoreImpl {
last_digest: Option<[u8; 32]>,
sync_level: DatastoreFSyncLevel,
backend_config: DatastoreBackendConfig,
+ lru_store_caching: Option<LocalDatastoreLruCache>,
}
impl DataStoreImpl {
@@ -146,6 +148,7 @@ impl DataStoreImpl {
last_digest: None,
sync_level: Default::default(),
backend_config: Default::default(),
+ lru_store_caching: None,
})
}
}
@@ -243,6 +246,37 @@ impl DataStore {
Ok(backend_type)
}
+ pub fn cache(&self) -> Option<&LocalDatastoreLruCache> {
+ self.inner.lru_store_caching.as_ref()
+ }
+
+ /// Check if the digest is present in the local datastore cache.
+ /// Always returns false if there is no cache configured for this datastore.
+ pub fn cache_contains(&self, digest: &[u8; 32]) -> bool {
+ if let Some(cache) = self.inner.lru_store_caching.as_ref() {
+ return cache.contains(digest);
+ }
+ false
+ }
+
+ /// Insert digest as most recently used on in the cache.
+ /// Returns with success if there is no cache configured for this datastore.
+ pub fn cache_insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
+ if let Some(cache) = self.inner.lru_store_caching.as_ref() {
+ return cache.insert(digest, chunk);
+ }
+ Ok(())
+ }
+
+ pub fn cacher(&self) -> Result<Option<S3Cacher>, Error> {
+ self.backend().map(|backend| match backend {
+ DatastoreBackend::S3(s3_client) => {
+ Some(S3Cacher::new(s3_client, self.inner.chunk_store.clone()))
+ }
+ DatastoreBackend::Filesystem => None,
+ })
+ }
+
pub fn lookup_datastore(
name: &str,
operation: Option<Operation>,
@@ -425,6 +459,15 @@ impl DataStore {
None => Default::default(),
};
+ const LOCAL_DATASTORE_CACHE_SIZE: usize = 10_000_000;
+ let lru_store_caching = if let DatastoreBackendConfig::S3(_) = backend_config {
+ let cache =
+ LocalDatastoreLruCache::new(LOCAL_DATASTORE_CACHE_SIZE, chunk_store.clone());
+ Some(cache)
+ } else {
+ None
+ };
+
Ok(DataStoreImpl {
chunk_store,
gc_mutex: Mutex::new(()),
@@ -434,6 +477,7 @@ impl DataStore {
last_digest,
sync_level: tuning.sync_level.unwrap_or_default(),
backend_config,
+ lru_store_caching,
})
}
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index e6f65575b..f1ad3d4c2 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -216,3 +216,6 @@ pub use snapshot_reader::SnapshotReader;
mod local_chunk_reader;
pub use local_chunk_reader::LocalChunkReader;
+
+mod local_datastore_lru_cache;
+pub use local_datastore_lru_cache::LocalDatastoreLruCache;
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
new file mode 100644
index 000000000..c711c5208
--- /dev/null
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -0,0 +1,116 @@
+//! Use a local datastore as cache for operations on a datastore attached via
+//! a network layer (e.g. via the S3 backend).
+
+use std::future::Future;
+use std::sync::Arc;
+
+use anyhow::{bail, Error};
+use hyper::body::HttpBody;
+
+use pbs_s3_client::S3Client;
+use pbs_tools::async_lru_cache::{AsyncCacher, AsyncLruCache};
+
+use crate::ChunkStore;
+use crate::DataBlob;
+
+#[derive(Clone)]
+pub struct S3Cacher {
+ client: Arc<S3Client>,
+ store: Arc<ChunkStore>,
+}
+
+impl AsyncCacher<[u8; 32], ()> for S3Cacher {
+ fn fetch(
+ &self,
+ key: [u8; 32],
+ ) -> Box<dyn Future<Output = Result<Option<()>, Error>> + Send + 'static> {
+ let client = self.client.clone();
+ let store = self.store.clone();
+ Box::new(async move {
+ match client.get_object(key.into()).await? {
+ None => bail!("could not fetch object with key {}", hex::encode(key)),
+ Some(response) => {
+ let bytes = response.content.collect().await?.to_bytes();
+ let chunk = DataBlob::from_raw(bytes.to_vec())?;
+ store.insert_chunk(&chunk, &key)?;
+ Ok(Some(()))
+ }
+ }
+ })
+ }
+}
+
+impl S3Cacher {
+ pub fn new(client: Arc<S3Client>, store: Arc<ChunkStore>) -> Self {
+ Self { client, store }
+ }
+}
+
+/// LRU cache using local datastore for caching chunks
+///
+/// Uses a LRU cache, but without storing the values in-memory but rather
+/// on the filesystem
+pub struct LocalDatastoreLruCache {
+ cache: AsyncLruCache<[u8; 32], ()>,
+ store: Arc<ChunkStore>,
+}
+
+impl LocalDatastoreLruCache {
+ pub fn new(capacity: usize, store: Arc<ChunkStore>) -> Self {
+ Self {
+ cache: AsyncLruCache::new(capacity),
+ store,
+ }
+ }
+
+ /// Insert a new chunk into the local datastore cache.
+ ///
+ /// Fails if the chunk cannot be inserted successfully.
+ pub fn insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
+ self.store.insert_chunk(chunk, digest)?;
+ self.cache.insert(*digest, (), |digest| {
+ let (path, _digest_str) = self.store.chunk_path(&digest);
+ // Truncate to free up space but keep the inode around, since that
+ // is used as marker for chunks in use by garbage collection.
+ nix::unistd::truncate(&path, 0).map_err(Error::from)
+ })
+ }
+
+ /// Remove a chunk from the local datastore cache.
+ ///
+ /// Fails if the chunk cannot be deleted successfully.
+ pub fn remove(&self, digest: &[u8; 32]) -> Result<(), Error> {
+ self.cache.remove(*digest);
+ let (path, _digest_str) = self.store.chunk_path(digest);
+ std::fs::remove_file(path).map_err(Error::from)
+ }
+
+ pub async fn access(
+ &self,
+ digest: &[u8; 32],
+ cacher: &mut S3Cacher,
+ ) -> Result<Option<DataBlob>, Error> {
+ if self
+ .cache
+ .access(*digest, cacher, |digest| {
+ let (path, _digest_str) = self.store.chunk_path(&digest);
+ // Truncate to free up space but keep the inode around, since that
+ // is used as marker for chunks in use by garbage collection.
+ nix::unistd::truncate(&path, 0).map_err(Error::from)
+ })
+ .await?
+ .is_some()
+ {
+ let (path, _digest_str) = self.store.chunk_path(digest);
+ let mut file = std::fs::File::open(&path)?;
+ let chunk = DataBlob::load_from_reader(&mut file)?;
+ Ok(Some(chunk))
+ } else {
+ Ok(None)
+ }
+ }
+
+ pub fn contains(&self, digest: &[u8; 32]) -> bool {
+ self.cache.contains(*digest)
+ }
+}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 40/42] api: backup: use local datastore cache on S3 backend chunk upload
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (38 preceding siblings ...)
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 39/42] datastore: add local datastore cache for network attached storages Christian Ebner
@ 2025-05-29 14:32 ` Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 41/42] api: reader: use local datastore cache on S3 backend chunk fetching Christian Ebner
` (3 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:32 UTC (permalink / raw)
To: pbs-devel
Take advantage of the local datastore cache to avoid re-uploading of
already known chunks. This not only helps improve the backup/upload
speeds, but also avoids additionally costs by reducing the number of
requests and transferred payload data to the S3 object store api.
If the cache is present, lookup if it contains the chunk, skipping
upload altogether if it is. Otherwise, upload the chunk into memory,
upload it to the S3 object store api and insert it into the local
datastore cache.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/backup/upload_chunk.rs | 46 ++++++++++++++++++++++++++++++---
src/server/pull.rs | 4 +++
2 files changed, 46 insertions(+), 4 deletions(-)
diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index 838eec1fa..7a80fd0eb 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -247,10 +247,48 @@ async fn upload_to_backend(
UploadChunk::new(req_body, env.datastore.clone(), digest, size, encoded_size).await
}
DatastoreBackend::S3(s3_client) => {
- let is_duplicate = match s3_client.put_object(digest.into(), req_body).await? {
- PutObjectResponse::PreconditionFailed => true,
- PutObjectResponse::NeedsRetry => bail!("concurrent operation, reupload required"),
- PutObjectResponse::Success(_content) => false,
+ // Load chunk data into memory, need to write it twice, to S3 object store and
+ // local cache store. Further, body needs to be consumed also if chunks insert
+ // can be skipped since cached.
+ let data = req_body
+ .map_err(Error::from)
+ .try_fold(Vec::new(), |mut acc, chunk| {
+ acc.extend_from_slice(&chunk);
+ future::ok::<_, Error>(acc)
+ })
+ .await?;
+
+ if encoded_size != data.len() as u32 {
+ bail!(
+ "got blob with unexpected length ({encoded_size} != {})",
+ data.len()
+ );
+ }
+
+ if env.datastore.cache_contains(&digest) {
+ return Ok((digest, size, encoded_size, true));
+ }
+
+ let datastore = env.datastore.clone();
+ let upload_body = hyper::Body::from(data.clone());
+ let upload = s3_client.put_object(digest.into(), upload_body);
+ let cache_insert = tokio::task::spawn_blocking(move || {
+ let chunk = DataBlob::from_raw(data)?;
+ datastore.cache_insert(&digest, &chunk)
+ });
+ let is_duplicate = match futures::join!(upload, cache_insert) {
+ (Ok(upload_response), Ok(Ok(()))) => match upload_response {
+ PutObjectResponse::PreconditionFailed => true,
+ PutObjectResponse::NeedsRetry => {
+ bail!("concurrent operation, reupload required")
+ }
+ PutObjectResponse::Success(_content) => false,
+ },
+ (Ok(_), Ok(Err(err))) => return Err(err.context("chunk cache insert failed")),
+ (Ok(_), Err(err)) => {
+ return Err(Error::from(err).context("chunk cache insert task failed"))
+ }
+ (Err(err), _) => return Err(err.context("chunk upload failed")),
};
Ok((digest, size, encoded_size, is_duplicate))
}
diff --git a/src/server/pull.rs b/src/server/pull.rs
index f36efd7c8..85d3154eb 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -173,6 +173,10 @@ async fn pull_index_chunks<I: IndexFile>(
target2.insert_chunk(&chunk, &digest)?;
}
DatastoreBackend::S3(s3_client) => {
+ if target2.cache_contains(&digest) {
+ return Ok(());
+ }
+ target2.cache_insert(&digest, &chunk)?;
let data = chunk.raw_data().to_vec();
let upload_body = hyper::Body::from(data);
proxmox_async::runtime::block_on(
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 41/42] api: reader: use local datastore cache on S3 backend chunk fetching
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (39 preceding siblings ...)
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 40/42] api: backup: use local datastore cache on S3 backend chunk upload Christian Ebner
@ 2025-05-29 14:32 ` Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 42/42] api: backup: add no-cache flag to bypass local datastore cache Christian Ebner
` (2 subsequent siblings)
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:32 UTC (permalink / raw)
To: pbs-devel
Take advantage of the local datastore filesystem cache for datastores
backed by an s3 object store in order to reduce number of requests
and latency, and increase throughput.
Also, reducing the number of requests is cost beneficial for S3 object
stores charging for fetching of objects.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/reader/mod.rs | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/src/api2/reader/mod.rs b/src/api2/reader/mod.rs
index 3417f49be..24962a136 100644
--- a/src/api2/reader/mod.rs
+++ b/src/api2/reader/mod.rs
@@ -323,7 +323,28 @@ fn download_chunk(
let body = match &env.backend {
DatastoreBackend::Filesystem => load_from_filesystem(env, &digest)?,
- DatastoreBackend::S3(s3_client) => fetch_from_object_store(s3_client, &digest).await?,
+ DatastoreBackend::S3(s3_client) => {
+ match env.datastore.cache() {
+ None => fetch_from_object_store(s3_client, &digest).await?,
+ Some(cache) => {
+ let mut cacher = env
+ .datastore
+ .cacher()?
+ .ok_or(format_err!("no cacher for datastore"))?;
+ // Download from object store, insert to local cache store and read from
+ // file. Can this be optimized?
+ let chunk =
+ cache
+ .access(&digest, &mut cacher)
+ .await?
+ .ok_or(format_err!(
+ "unable to access chunk with digest {}",
+ hex::encode(digest)
+ ))?;
+ Body::from(chunk.raw_data().to_owned())
+ }
+ }
+ }
};
// fixme: set other headers ?
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* [pbs-devel] [RFC v2 proxmox-backup 42/42] api: backup: add no-cache flag to bypass local datastore cache
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (40 preceding siblings ...)
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 41/42] api: reader: use local datastore cache on S3 backend chunk fetching Christian Ebner
@ 2025-05-29 14:32 ` Christian Ebner
2025-06-04 11:58 ` [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Lukas Wagner
2025-06-06 11:12 ` Lukas Wagner
43 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-05-29 14:32 UTC (permalink / raw)
To: pbs-devel
Adds the `no-cache` flag so the client can request to bypass the
local datastore cache for chunk uploads. This is mainly intended for
debugging and benchmarking, but can be used in cases the caching is
known to be ineffective (no possible deduplication).
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
examples/upload-speed.rs | 1 +
pbs-client/src/backup_writer.rs | 4 +++-
proxmox-backup-client/src/benchmark.rs | 1 +
proxmox-backup-client/src/main.rs | 8 ++++++++
src/api2/backup/environment.rs | 3 +++
src/api2/backup/mod.rs | 3 +++
src/api2/backup/upload_chunk.rs | 11 +++++++++++
src/server/push.rs | 1 +
8 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/examples/upload-speed.rs b/examples/upload-speed.rs
index e4b570ec5..8a6594a47 100644
--- a/examples/upload-speed.rs
+++ b/examples/upload-speed.rs
@@ -25,6 +25,7 @@ async fn upload_speed() -> Result<f64, Error> {
&(BackupType::Host, "speedtest".to_string(), backup_time).into(),
false,
true,
+ false,
)
.await?;
diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index 325425069..a91880720 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -82,6 +82,7 @@ impl BackupWriter {
backup: &BackupDir,
debug: bool,
benchmark: bool,
+ no_cache: bool,
) -> Result<Arc<BackupWriter>, Error> {
let mut param = json!({
"backup-type": backup.ty(),
@@ -89,7 +90,8 @@ impl BackupWriter {
"backup-time": backup.time,
"store": datastore,
"debug": debug,
- "benchmark": benchmark
+ "benchmark": benchmark,
+ "no-cache": no_cache,
});
if !ns.is_root() {
diff --git a/proxmox-backup-client/src/benchmark.rs b/proxmox-backup-client/src/benchmark.rs
index a6f24d745..ed21c7a91 100644
--- a/proxmox-backup-client/src/benchmark.rs
+++ b/proxmox-backup-client/src/benchmark.rs
@@ -236,6 +236,7 @@ async fn test_upload_speed(
&(BackupType::Host, "benchmark".to_string(), backup_time).into(),
false,
true,
+ true,
)
.await?;
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 44f4f5db5..83fc9309a 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -742,6 +742,12 @@ fn spawn_catalog_upload(
optional: true,
default: false,
},
+ "no-cache": {
+ type: Boolean,
+ description: "Bypass local datastore cache for network storages.",
+ optional: true,
+ default: false,
+ },
}
}
)]
@@ -754,6 +760,7 @@ async fn create_backup(
change_detection_mode: Option<BackupDetectionMode>,
dry_run: bool,
skip_e2big_xattr: bool,
+ no_cache: bool,
limit: ClientRateLimitConfig,
_info: &ApiMethod,
_rpcenv: &mut dyn RpcEnvironment,
@@ -960,6 +967,7 @@ async fn create_backup(
&snapshot,
true,
false,
+ no_cache,
)
.await?;
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 384e8a73f..874f0c44d 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -112,6 +112,7 @@ pub struct BackupEnvironment {
result_attributes: Value,
auth_id: Authid,
pub debug: bool,
+ pub no_cache: bool,
pub formatter: &'static dyn OutputFormatter,
pub worker: Arc<WorkerTask>,
pub datastore: Arc<DataStore>,
@@ -128,6 +129,7 @@ impl BackupEnvironment {
worker: Arc<WorkerTask>,
datastore: Arc<DataStore>,
backup_dir: BackupDir,
+ no_cache: bool,
) -> Result<Self, Error> {
let state = SharedBackupState {
finished: false,
@@ -148,6 +150,7 @@ impl BackupEnvironment {
worker,
datastore,
debug: tracing::enabled!(tracing::Level::DEBUG),
+ no_cache,
formatter: JSON_FORMATTER,
backup_dir,
last_backup: None,
diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
index 2c6afca41..0913d4264 100644
--- a/src/api2/backup/mod.rs
+++ b/src/api2/backup/mod.rs
@@ -51,6 +51,7 @@ pub const API_METHOD_UPGRADE_BACKUP: ApiMethod = ApiMethod::new(
("backup-time", false, &BACKUP_TIME_SCHEMA),
("debug", true, &BooleanSchema::new("Enable verbose debug logging.").schema()),
("benchmark", true, &BooleanSchema::new("Job is a benchmark (do not keep data).").schema()),
+ ("no-cache", true, &BooleanSchema::new("Disable local datastore cache for network storages").schema()),
]),
)
).access(
@@ -77,6 +78,7 @@ fn upgrade_to_backup_protocol(
async move {
let debug = param["debug"].as_bool().unwrap_or(false);
let benchmark = param["benchmark"].as_bool().unwrap_or(false);
+ let no_cache = param["no-cache"].as_bool().unwrap_or(false);
let auth_id: Authid = rpcenv.get_auth_id().unwrap().parse()?;
@@ -212,6 +214,7 @@ fn upgrade_to_backup_protocol(
worker.clone(),
datastore,
backup_dir,
+ no_cache,
)?;
env.debug = debug;
diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index 7a80fd0eb..4e949a073 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -247,6 +247,17 @@ async fn upload_to_backend(
UploadChunk::new(req_body, env.datastore.clone(), digest, size, encoded_size).await
}
DatastoreBackend::S3(s3_client) => {
+ if env.no_cache {
+ let is_duplicate = match s3_client.put_object(digest.into(), req_body).await? {
+ PutObjectResponse::PreconditionFailed => true,
+ PutObjectResponse::NeedsRetry => {
+ bail!("concurrent operation, reupload required")
+ }
+ PutObjectResponse::Success(_content) => false,
+ };
+ return Ok((digest, size, encoded_size, is_duplicate));
+ }
+
// Load chunk data into memory, need to write it twice, to S3 object store and
// local cache store. Further, body needs to be consumed also if chunks insert
// can be skipped since cached.
diff --git a/src/server/push.rs b/src/server/push.rs
index e71012ed8..6a31d2abe 100644
--- a/src/server/push.rs
+++ b/src/server/push.rs
@@ -828,6 +828,7 @@ pub(crate) async fn push_snapshot(
snapshot,
false,
false,
+ false,
)
.await?;
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (41 preceding siblings ...)
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 42/42] api: backup: add no-cache flag to bypass local datastore cache Christian Ebner
@ 2025-06-04 11:58 ` Lukas Wagner
2025-06-06 7:40 ` Christian Ebner
2025-06-06 11:12 ` Lukas Wagner
43 siblings, 1 reply; 46+ messages in thread
From: Lukas Wagner @ 2025-06-04 11:58 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Christian Ebner
On 2025-05-29 16:31, Christian Ebner wrote:
> Testing:
> For testing, an S3 compatible object store provided via Ceph RADOS
> gateway can be used by the following setup. This was performed on a
> pre-existing Ceph Reef 18.2 cluster.
>
For further reference, here are the steps needed to set up a local MinIO [1] server.
Took me a bit of trial and error to get it to work, so I thought I'd share
my notes. Christian, feel free to include/reference them in upcoming revisions
of this patch series.
# Setting up a local MinIO server for testing PBS's S3 feature.
Download latest server, client and cert tool
```
wget https://dl.min.io/server/minio/release/linux-amd64/minio
wget https://dl.min.io/client/mc/release/linux-amd64/mc
wget https://github.com/minio/certgen/releases/latest/download/certgen-linux-amd64
chmod +x certgen-linux-amd64 mc minio
```
Next, create the HTTPS cert. You can also use `openssl` to create one, if you don't want
to use minio's tool.
```
mkdir certs && cd certs
../certgen-linux-amd64 -host "localhost,s3.example.com"
cd ../
```
Start minio server:
```
MINIO_DOMAIN="s3.example.com" MINIO_ROOT_USER=admin MINIO_ROOT_PASSWORD=<admin-password> ./minio server ./data --console-address ":9001" --certs-dir ./certs
```
Create an alias for the local server in the client tool:
```
./mc alias set 'local' 'https://localhost:9000' 'admin' '<admin-password>'
```
For some reason you have to run this command twice. At first, it asks you to
confirm the certificate fingerprint but still fails with an error ('certificate
signed by an unknown authority'), but if you run it a second time, it works
Next, verify that the client connection works:
```
./mc ping local
```
After that, let's create the `pbs` bucket (mb = make bucket):
```
./mc mb local/pbs
```
After that, you need to create an entry in `/etc/hosts` on the PBS host.
S3 encodes the name of the bucket in the domain, so you have to make sure
that PBS can resolve the IP properly.
```
172.25.0.xxx pbs.s3.example.com
```
Finally, get the SHA256 fingerprint of the certificate so that you can use it in PBS later.
```
openssl x509 -noout -fingerprint -sha256 -inform pem -in certs/public.crt
```
When adding the S3 bucket in PBS, use the following values:
```
Host: pbs.s3.example.com
Port: 9000
Bucket: pbs
Access Key: admin
Secret Key: <admin-password>
Fingerprint: SHA256 from the previous command
```
[1] https://github.com/minio/minio
--
- Lukas
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores
2025-06-04 11:58 ` [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Lukas Wagner
@ 2025-06-06 7:40 ` Christian Ebner
0 siblings, 0 replies; 46+ messages in thread
From: Christian Ebner @ 2025-06-06 7:40 UTC (permalink / raw)
To: Lukas Wagner, Proxmox Backup Server development discussion
On 6/4/25 13:58, Lukas Wagner wrote:
> On 2025-05-29 16:31, Christian Ebner wrote:
>> Testing:
>> For testing, an S3 compatible object store provided via Ceph RADOS
>> gateway can be used by the following setup. This was performed on a
>> pre-existing Ceph Reef 18.2 cluster.
>>
>
> For further reference, here are the steps needed to set up a local MinIO [1] server.
> Took me a bit of trial and error to get it to work, so I thought I'd share
> my notes. Christian, feel free to include/reference them in upcoming revisions
> of this patch series.
Thanks a lot for the write-up, I will also create a dedicated wiki
article outlining the steps for setting up self hosted S3 object store,
so will include this there as well.
Given the issues with not supporting path style requests for bucket
operations as you reported off-list, I extended the PBS S3 client to
also allow for these. Thanks for your feedback on that as well!
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
` (42 preceding siblings ...)
2025-06-04 11:58 ` [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Lukas Wagner
@ 2025-06-06 11:12 ` Lukas Wagner
43 siblings, 0 replies; 46+ messages in thread
From: Lukas Wagner @ 2025-06-06 11:12 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Christian Ebner
On 2025-05-29 16:31, Christian Ebner wrote:
> Disclaimer: These patches are in a development state and are not
> intended for production use.
>
> This patch series aims to add S3 compatible object stores as storage
> backend for PBS datastores. A PBS local cache store using the regular
> datastore layout is used for faster operation, bypassing requests to
> the S3 api when possible. Further, the local cache store allows to
> keep frequently used chunks and is used to avoid expensive metadata
> updates on the object store, e.g. by using local marker file during
> garbage collection.
>
> Backups are created by upload chunks to the corresponding S3 bucket,
> while keeping the index files in the local cache store, on backup
> finish, the snapshot metadata are persisted to the S3 storage backend.
>
> Snapshot restores read chunks preferably from the local cache store,
> downloading and insterting them if not present from the S3 object
> store.
>
> Listing and snapsoht metadata operation currently rely soly on the
> local cache store, with the intention to provide a mechanism to
> re-sync and merge with object stored on the S3 backend if requested.
>
> Sending this patch series as RFC to get some initial feedback, mostly
> on the S3 client implementation part and the corresponding
> configuration integration with PBS, which is already in an advanced
> stage and warants initial review and real world testing.
>
> Datastore operations on the S3 backend are still work in progress,
> but feedback on that is appreciated very much as well.
>
> Among the open points still being worked on are:
> - Consistency between local cache and S3 store.
> - Sync and merge of namespace, group snapshot and index files when
> required or requested.
> - Advanced packing mechanism for chunks to significantly reduce the
> number of api requests and therefore be more cost effective.
> - Reduction of in-memory copies for chunks/blobs and recalculation of
> checksums.
>
Had some off-list discussions with Christian about a couple of aspects of this
version of the series, here is a quick summary:
With regards to the 'Create Datastore' dialog:
In the current version, the S3 bucket can be selected under 'Advanced'. This might be
a bit hard to find for some users, so I suggested revising the dialog in general.
For example, perhaps we could start by having the user select a type
right away (Normal / Removable / Existing / S3-backed), and then show or hide
the required UI elements accordingly. For the S3-backed store specifically,
my intuitive expectation would be to first select the bucket, and then, as a second
step, choose the location for the local cache.
If we still want to keep S3 a bit hidden for now, we could either add a global setting
or an option within the dialog under 'Advanced' to opt into the experimental S3 feature,
or something along those lines.
Also I mentioned that the 'trash-can' icon - albeit being a bucket - might not be the best
fit for 'S3 Buckets', because it creates the association of 'trash' or 'throwing something away'.
I suggested fa-cloud-upload [1] instead for now, which should be quite fitting for a 'syncing
something to the cloud' feature.
Furthermore, I suggested that maybe the 'bucket' should be a property of the
datastore config, not of the s3 config. That way, the s3 config contains only the
connection info and credentials, which make it easy to use the same s3 config for
multiple datastores which use different buckets as as a backing storage.
Last, we probably should encode the name of the datastore into the key
of the S3 object, unless we want a strict 1:1 relationship between bucket and
datastore. Maybe it could even make sense to allow the user to set custom
prefixes for objects, in case they want PBSs objects not to be stored at the
top level of the bucket.
[1] https://fontawesome.com/v4/icon/cloud-upload
--
- Lukas
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2025-06-06 11:12 UTC | newest]
Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-29 14:31 [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox/bookworm-stable 1/42] pbs-api-types: add types for S3 client configs and secrets Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox/bookworm-stable 2/42] pbs-api-types: extend datastore config by backend config enum Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 03/42] api: fix minor formatting issues Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 04/42] bin: sort submodules alphabetically Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 05/42] datastore: ignore missing owner file when removing group directory Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 06/42] verify: refactor verify related functions to be methods of worker Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 07/42] s3 client: add crate for AWS S3 compatible object store client Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 08/42] s3 client: implement AWS signature v4 request authentication Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 09/42] s3 client: add dedicated type for s3 object keys Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 10/42] s3 client: add type for last modified timestamp in responses Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 11/42] s3 client: add helper to parse http date headers Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 12/42] s3 client: implement methods to operate on s3 objects in bucket Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 13/42] config: introduce s3 object store client configuration Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 14/42] api: config: implement endpoints to manipulate and list s3 configs Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 15/42] api: datastore: check S3 backend bucket access on datastore create Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 16/42] api/bin: add endpoint and command to check s3 client connection Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 17/42] datastore: allow to get the backend for a datastore Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 18/42] api: backup: store datastore backend in runtime environment Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 19/42] api: backup: conditionally upload chunks to S3 object store backend Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 20/42] api: backup: conditionally upload blobs " Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 21/42] api: backup: conditionally upload indices " Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 22/42] api: backup: conditionally upload manifest " Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 23/42] sync: pull: conditionally upload content to S3 backend Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 24/42] api: reader: fetch chunks based on datastore backend Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 25/42] datastore: local chunk reader: read chunks based on backend Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 26/42] verify worker: add datastore backed to verify worker Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 27/42] verify: implement chunk verification for stores with s3 backend Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 28/42] datastore: create namespace marker in S3 backend Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 29/42] datastore: create/delete protected marker file on S3 storage backend Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 30/42] datastore: prune groups/snapshots from S3 object store backend Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 31/42] datastore: get and set owner for S3 " Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 32/42] datastore: implement garbage collection for s3 backend Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 33/42] ui: add S3 client edit window for configuration create/edit Christian Ebner
2025-05-29 14:31 ` [pbs-devel] [RFC v2 proxmox-backup 34/42] ui: add S3 client view for configuration Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 35/42] ui: expose the S3 client view in the navigation tree Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 36/42] ui: add s3 bucket selector and allow to set s3 backend Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 37/42] tools: lru cache: add removed callback for evicted cache nodes Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 38/42] tools: async lru cache: implement insert, remove and contains methods Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 39/42] datastore: add local datastore cache for network attached storages Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 40/42] api: backup: use local datastore cache on S3 backend chunk upload Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 41/42] api: reader: use local datastore cache on S3 backend chunk fetching Christian Ebner
2025-05-29 14:32 ` [pbs-devel] [RFC v2 proxmox-backup 42/42] api: backup: add no-cache flag to bypass local datastore cache Christian Ebner
2025-06-04 11:58 ` [pbs-devel] [RFC v2 proxmox/bookworm-stable proxmox-backup 00/42] S3 storage backend for datastores Lukas Wagner
2025-06-06 7:40 ` Christian Ebner
2025-06-06 11:12 ` Lukas Wagner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal