* [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages
@ 2024-11-21 15:43 Christian Ebner
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 1/4] server: push: fix needless borrow clippy warning Christian Ebner
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Christian Ebner @ 2024-11-21 15:43 UTC (permalink / raw)
To: pbs-devel
This is a small series of patches with the intend to improve the log
messages for the sync job in push direction, mainly adding context to
error messages from the remote when the error stems from an api call,
limiting line length and improving usage of consistent terms for better
readablility and easier understanding.
Christian Ebner (4):
server: push: fix needless borrow clippy warning
server: push: consistently use remote over target for error messages
server: push: add error context to all target api calls
server: push: various smaller improvements to error messages
src/server/push.rs | 79 ++++++++++++++++++++++++++++------------------
1 file changed, 49 insertions(+), 30 deletions(-)
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pbs-devel] [PATCH proxmox-backup 1/4] server: push: fix needless borrow clippy warning
2024-11-21 15:43 [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Christian Ebner
@ 2024-11-21 15:43 ` Christian Ebner
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 2/4] server: push: consistently use remote over target for error messages Christian Ebner
` (4 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Christian Ebner @ 2024-11-21 15:43 UTC (permalink / raw)
To: pbs-devel
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/server/push.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/server/push.rs b/src/server/push.rs
index 4a222627b..2181634c6 100644
--- a/src/server/push.rs
+++ b/src/server/push.rs
@@ -422,7 +422,7 @@ pub(crate) async fn push_store(mut params: PushParameters) -> Result<SyncStats,
params
.target
.ns
- .contains(&target_namespace)
+ .contains(target_namespace)
.map(|sub_depth| sub_depth <= max_depth)
.unwrap_or(false)
})
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pbs-devel] [PATCH proxmox-backup 2/4] server: push: consistently use remote over target for error messages
2024-11-21 15:43 [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Christian Ebner
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 1/4] server: push: fix needless borrow clippy warning Christian Ebner
@ 2024-11-21 15:43 ` Christian Ebner
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 3/4] server: push: add error context to all target api calls Christian Ebner
` (3 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Christian Ebner @ 2024-11-21 15:43 UTC (permalink / raw)
To: pbs-devel
Mixing of terms only makes the errors harder to understand.
In order to make error messages more intuitive, always refer to the
sync push target as remote, mention the remote explicitly and/or
improve messages where needed.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/server/push.rs | 35 +++++++++++++++++++++--------------
1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/src/server/push.rs b/src/server/push.rs
index 2181634c6..86cef5520 100644
--- a/src/server/push.rs
+++ b/src/server/push.rs
@@ -197,7 +197,7 @@ async fn remove_target_namespace(
target_namespace: &BackupNamespace,
) -> Result<BackupGroupDeleteStats, Error> {
if target_namespace.is_root() {
- bail!("cannot remove root namespace from target");
+ bail!("Cannot remove root namespace from remote");
}
check_ns_remote_datastore_privs(params, target_namespace, PRIV_REMOTE_DATASTORE_MODIFY)
@@ -295,7 +295,7 @@ async fn check_or_create_target_namespace(
// Sub-namespaces have to be created by creating parent components first.
check_ns_remote_datastore_privs(params, target_namespace, PRIV_REMOTE_DATASTORE_MODIFY)
- .map_err(|err| format_err!("Creating namespace not allowed - {err}"))?;
+ .map_err(|err| format_err!("Creating remote namespace not allowed - {err}"))?;
let mut parent = BackupNamespace::root();
for component in target_namespace.components() {
@@ -311,7 +311,7 @@ async fn check_or_create_target_namespace(
args["parent"] = serde_json::to_value(parent.clone())?;
}
match params.target.client.post(&api_path, Some(args)).await {
- Ok(_) => info!("Created new namespace on target: {current}"),
+ Ok(_) => info!("Successfully created new namespace {current} on remote"),
Err(err) => {
bail!("Remote creation of namespace {current} failed, remote returned: {err}")
}
@@ -445,18 +445,19 @@ pub(crate) async fn push_store(mut params: PushParameters) -> Result<SyncStats,
}));
if delete_stats.protected_snapshots() > 0 {
warn!(
- "kept {protected_count} protected snapshots of namespace '{target_namespace}'",
+ "Kept {protected_count} protected snapshots of remote namespace {target_namespace}",
protected_count = delete_stats.protected_snapshots(),
);
continue;
}
}
Err(err) => {
- warn!("failed to remove vanished namespace {target_namespace} - {err}");
+ warn!("Encountered errors: {err}");
+ warn!("Failed to remove vanished namespace {target_namespace} from remote!");
continue;
}
}
- info!("removed vanished namespace {target_namespace}");
+ info!("Successfully removed vanished namespace {target_namespace} from remote");
}
if !params.target.supports_prune_delete_stats {
@@ -481,7 +482,7 @@ pub(crate) async fn push_namespace(
let target_namespace = params.map_to_target(namespace)?;
// Check if user is allowed to perform backups on remote datastore
check_ns_remote_datastore_privs(params, &target_namespace, PRIV_REMOTE_DATASTORE_BACKUP)
- .map_err(|err| format_err!("Pushing to remote not allowed - {err}"))?;
+ .map_err(|err| format_err!("Pushing to remote namespace not allowed - {err}"))?;
let mut list: Vec<BackupGroup> = params
.source
@@ -527,7 +528,8 @@ pub(crate) async fn push_namespace(
match push_group(params, namespace, &group, &mut progress).await {
Ok(sync_stats) => stats.add(sync_stats),
Err(err) => {
- warn!("sync group '{group}' failed - {err}");
+ warn!("Encountered errors: {err}");
+ warn!("Failed to push group {group} to remote!");
errors = true;
}
}
@@ -543,13 +545,13 @@ pub(crate) async fn push_namespace(
continue;
}
- info!("delete vanished group '{target_group}'");
+ info!("Removed vanished group {target_group} from remote");
match remove_target_group(params, &target_namespace, &target_group).await {
Ok(delete_stats) => {
if delete_stats.protected_snapshots() > 0 {
warn!(
- "kept {protected_count} protected snapshots of group '{target_group}'",
+ "Kept {protected_count} protected snapshots of group {target_group} on remote",
protected_count = delete_stats.protected_snapshots(),
);
}
@@ -560,7 +562,8 @@ pub(crate) async fn push_namespace(
}));
}
Err(err) => {
- warn!("failed to delete vanished group - {err}");
+ warn!("Encountered errors: {err}");
+ warn!("Failed to remove vanished group {target_group} from remote!");
errors = true;
continue;
}
@@ -693,7 +696,7 @@ pub(crate) async fn push_group(
}
if snapshot.protected {
info!(
- "don't delete vanished snapshot {name} (protected)",
+ "Kept protected snapshot {name} on remote",
name = snapshot.backup
);
continue;
@@ -701,12 +704,16 @@ pub(crate) async fn push_group(
if let Err(err) =
forget_target_snapshot(params, &target_namespace, &snapshot.backup).await
{
+ info!("Encountered errors: {err}");
info!(
- "could not delete vanished snapshot {name} - {err}",
+ "Failed to remove vanished snapshot {name} from remote!",
name = snapshot.backup
);
}
- info!("delete vanished snapshot {name}", name = snapshot.backup);
+ info!(
+ "Removed vanished snapshot {name} from remote",
+ name = snapshot.backup
+ );
stats.add(SyncStats::from(RemovedVanishedStats {
snapshots: 1,
groups: 0,
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pbs-devel] [PATCH proxmox-backup 3/4] server: push: add error context to all target api calls
2024-11-21 15:43 [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Christian Ebner
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 1/4] server: push: fix needless borrow clippy warning Christian Ebner
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 2/4] server: push: consistently use remote over target for error messages Christian Ebner
@ 2024-11-21 15:43 ` Christian Ebner
2024-11-22 9:01 ` Fabian Grünbichler
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 4/4] server: push: various smaller improvements to error messages Christian Ebner
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Christian Ebner @ 2024-11-21 15:43 UTC (permalink / raw)
To: pbs-devel
Make it clear from the context that these error messages stem from
the response of an api call rather than a local error.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/server/push.rs | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/src/server/push.rs b/src/server/push.rs
index 86cef5520..fe2e11220 100644
--- a/src/server/push.rs
+++ b/src/server/push.rs
@@ -219,7 +219,9 @@ async fn remove_target_namespace(
if params.target.supports_prune_delete_stats {
let data = result["data"].take();
serde_json::from_value(data).map_err(|err| {
- format_err!("removing target namespace {target_namespace} failed - {err}")
+ format_err!(
+ "Failed to remove remote namespace {target_namespace}, remote returned: {err}"
+ )
})
} else {
Ok(BackupGroupDeleteStats::default())
@@ -236,7 +238,8 @@ async fn fetch_target_groups(
let args = Some(serde_json::json!({ "ns": target_namespace.name() }));
let mut result = params.target.client.get(&api_path, args).await?;
- let groups: Vec<GroupListItem> = serde_json::from_value(result["data"].take())?;
+ let groups: Vec<GroupListItem> = serde_json::from_value(result["data"].take())
+ .map_err(|err| format_err!("Failed to fetch remote groups, remote returned: {err}"))?;
let (mut owned, not_owned) = groups.into_iter().fold(
(Vec::new(), HashSet::new()),
@@ -277,8 +280,9 @@ async fn remove_target_group(
if params.target.supports_prune_delete_stats {
let data = result["data"].take();
- serde_json::from_value(data)
- .map_err(|err| format_err!("removing target group {backup_group} failed - {err}"))
+ serde_json::from_value(data).map_err(|err| {
+ format_err!("Failed to remove remote group {backup_group}, remote returned: {err}")
+ })
} else {
Ok(BackupGroupDeleteStats::default())
}
@@ -313,7 +317,7 @@ async fn check_or_create_target_namespace(
match params.target.client.post(&api_path, Some(args)).await {
Ok(_) => info!("Successfully created new namespace {current} on remote"),
Err(err) => {
- bail!("Remote creation of namespace {current} failed, remote returned: {err}")
+ bail!("Creation of remote namespace {current} failed, remote returned: {err}")
}
}
existing_target_namespaces.push(current.clone());
@@ -585,7 +589,8 @@ async fn fetch_target_snapshots(
args["ns"] = serde_json::to_value(target_namespace)?;
}
let mut result = params.target.client.get(&api_path, Some(args)).await?;
- let snapshots: Vec<SnapshotListItem> = serde_json::from_value(result["data"].take())?;
+ let snapshots: Vec<SnapshotListItem> = serde_json::from_value(result["data"].take())
+ .map_err(|err| format_err!("Failed to fetch remote snapshots, remote returned: {err}"))?;
Ok(snapshots)
}
@@ -603,7 +608,12 @@ async fn forget_target_snapshot(
if !target_namespace.is_root() {
args["ns"] = serde_json::to_value(target_namespace)?;
}
- params.target.client.delete(&api_path, Some(args)).await?;
+ params
+ .target
+ .client
+ .delete(&api_path, Some(args))
+ .await
+ .map_err(|err| format_err!("Failed to remove remote snapshot, remote returned: {err}"))?;
Ok(())
}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pbs-devel] [PATCH proxmox-backup 4/4] server: push: various smaller improvements to error messages
2024-11-21 15:43 [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Christian Ebner
` (2 preceding siblings ...)
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 3/4] server: push: add error context to all target api calls Christian Ebner
@ 2024-11-21 15:43 ` Christian Ebner
2024-11-21 16:06 ` [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Gabriel Goller
2024-11-22 9:04 ` [pbs-devel] partially-applied: " Fabian Grünbichler
5 siblings, 0 replies; 14+ messages in thread
From: Christian Ebner @ 2024-11-21 15:43 UTC (permalink / raw)
To: pbs-devel
Various smaller adaptions such as capitalization of the start of
sentences, expansion of abbreviations and shortening of to long
error messages.
To improve consistency with the rest of the error messages for the
sync job in push direction.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/server/push.rs | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/src/server/push.rs b/src/server/push.rs
index fe2e11220..7a4e21f37 100644
--- a/src/server/push.rs
+++ b/src/server/push.rs
@@ -129,7 +129,7 @@ impl PushParameters {
// push assumes namespace support on the remote side, fail early if missing
if api_version.major < 2 || (api_version.major == 2 && api_version.minor < 2) {
- bail!("unsupported remote api version, minimum v2.2 required");
+ bail!("Unsupported remote api version, minimum v2.2 required");
}
let supports_prune_delete_stats = api_version.major > 3
@@ -408,7 +408,8 @@ pub(crate) async fn push_store(mut params: PushParameters) -> Result<SyncStats,
}
Err(err) => {
errors = true;
- info!("Encountered errors while syncing namespace {source_namespace} - {err}");
+ info!("Encountered errors: {err}");
+ info!("Failed to sync {source_store_and_ns} into {target_store_and_ns}!");
}
}
}
@@ -465,12 +466,12 @@ pub(crate) async fn push_store(mut params: PushParameters) -> Result<SyncStats,
}
if !params.target.supports_prune_delete_stats {
- info!("Older api version on remote detected, delete stats might be incomplete");
+ info!("Older api version on remote detected, deletion statistics might be incomplete");
}
}
if errors {
- bail!("sync failed with some errors.");
+ bail!("Sync failed with some errors!");
}
Ok(stats)
@@ -502,7 +503,7 @@ pub(crate) async fn push_namespace(
.collect();
info!(
- "found {filtered} groups to sync (out of {total} total)",
+ "Found {filtered} groups to sync (out of {total} total)",
filtered = list.len()
);
@@ -522,7 +523,7 @@ pub(crate) async fn push_namespace(
if not_owned_target_groups.contains(&group) {
warn!(
- "group '{group}' not owned by remote user '{}' on target, skip",
+ "Group '{group}' not owned by remote user '{}' on target, aborting upload",
params.target.remote_user(),
);
continue;
@@ -692,7 +693,7 @@ pub(crate) async fn push_group(
fetch_previous_manifest = true;
progress.done_snapshots = pos as u64 + 1;
- info!("percentage done: {progress}");
+ info!("Percentage done: {progress}");
// stop on error
let sync_stats = result?;
@@ -762,7 +763,8 @@ pub(crate) async fn push_snapshot(
Ok((manifest, _raw_size)) => manifest,
Err(err) => {
// No manifest in snapshot or failed to read, warn and skip
- log::warn!("failed to load manifest - {err}");
+ log::warn!("Encountered errors: {err}");
+ log::warn!("Failed to load manifest for '{snapshot}'!");
return Ok(stats);
}
};
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages
2024-11-21 15:43 [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Christian Ebner
` (3 preceding siblings ...)
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 4/4] server: push: various smaller improvements to error messages Christian Ebner
@ 2024-11-21 16:06 ` Gabriel Goller
2024-11-21 16:26 ` Christian Ebner
2024-11-22 12:28 ` Christian Ebner
2024-11-22 9:04 ` [pbs-devel] partially-applied: " Fabian Grünbichler
5 siblings, 2 replies; 14+ messages in thread
From: Gabriel Goller @ 2024-11-21 16:06 UTC (permalink / raw)
To: Christian Ebner; +Cc: pbs-devel
I think I'd be nice if we use `.context()` instead of `format_err` and
debug print ("{err:?}") instead of default ("{}") here. As these
messages are all errors they shouldn't appear too often in the log and
if an error happens, you get much more information.
What do you think?
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages
2024-11-21 16:06 ` [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Gabriel Goller
@ 2024-11-21 16:26 ` Christian Ebner
2024-11-21 17:04 ` Gabriel Goller
2024-11-22 12:28 ` Christian Ebner
1 sibling, 1 reply; 14+ messages in thread
From: Christian Ebner @ 2024-11-21 16:26 UTC (permalink / raw)
To: Gabriel Goller; +Cc: pbs-devel
On 11/21/24 17:06, Gabriel Goller wrote:
> I think I'd be nice if we use `.context()` instead of `format_err` and
> debug print ("{err:?}") instead of default ("{}") here. As these
> messages are all errors they shouldn't appear too often in the log and
> if an error happens, you get much more information.
>
> What do you think?
Hmm, can check that out, but wouldn't that disrupt the single line
character of the log entries, prefixed by the time stamp?
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages
2024-11-21 16:26 ` Christian Ebner
@ 2024-11-21 17:04 ` Gabriel Goller
2024-11-21 19:32 ` Fabian Grünbichler
0 siblings, 1 reply; 14+ messages in thread
From: Gabriel Goller @ 2024-11-21 17:04 UTC (permalink / raw)
To: Christian Ebner; +Cc: pbs-devel
On 21.11.2024 17:26, Christian Ebner wrote:
>On 11/21/24 17:06, Gabriel Goller wrote:
>>I think I'd be nice if we use `.context()` instead of `format_err` and
>>debug print ("{err:?}") instead of default ("{}") here. As these
>>messages are all errors they shouldn't appear too often in the log and
>>if an error happens, you get much more information.
>>
>>What do you think?
>
>Hmm, can check that out, but wouldn't that disrupt the single line
>character of the log entries, prefixed by the time stamp?
We could use "{err:#}", which will print everything in one line?
Although on the other hand, I'd understand if you want to have strict
control over what is displayed to the users, so no hard feelings on
this one.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages
2024-11-21 17:04 ` Gabriel Goller
@ 2024-11-21 19:32 ` Fabian Grünbichler
2024-11-22 8:41 ` Christian Ebner
0 siblings, 1 reply; 14+ messages in thread
From: Fabian Grünbichler @ 2024-11-21 19:32 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Gabriel Goller,
Christian Ebner
> Gabriel Goller <g.goller@proxmox.com> hat am 21.11.2024 18:04 CET geschrieben:
>
>
> On 21.11.2024 17:26, Christian Ebner wrote:
> >On 11/21/24 17:06, Gabriel Goller wrote:
> >>I think I'd be nice if we use `.context()` instead of `format_err` and
> >>debug print ("{err:?}") instead of default ("{}") here. As these
> >>messages are all errors they shouldn't appear too often in the log and
> >>if an error happens, you get much more information.
> >>
> >>What do you think?
> >
> >Hmm, can check that out, but wouldn't that disrupt the single line
> >character of the log entries, prefixed by the time stamp?
>
> We could use "{err:#}", which will print everything in one line?
>
> Although on the other hand, I'd understand if you want to have strict
> control over what is displayed to the users, so no hard feelings on
> this one.
big tasks/job like the sync ones are a bit special w.r.t. error handling because they "catch" errors often at intermediate levels, log them, record that an error occur, but then proceed with a sensible next unit of work. so it's not possible to just add a lot of context to an error like with simple(r) API calls, where many/most errors can be treated as fatal, maybe requiring a bit of cleanup before bailing completely.
I think a good compromise is to treat those units of work as a sort of error scope, and
- add context where it is missing (e.g., low level file or network access, where just bubbling up errors might be meaningless)
- and then when logging the error, think about formatting
e.g., for syncs we have groups as lowest unit of work - if something fails within a group, we abort that group, but proceed with the next one. but if we just log "syncing group X failed - permission denied" that doesn't help whatsoever. it might be better to have three lines of warnings if something goes wrong, if those three lines contain appropriate information like
- request X / file access Y /.. failed with error Z (root cause)
- while processing snapshot A in group B (important context, because the request or path of the root cause might not tell us this)
- syncing group B failed (result of the error chain)
achieving that requires carefully analyzing all error sources/chains though. when in doubt, I'd rather have a bit too much context initially (that usually means too much is added at lower levels and we can optimize there) than too little, but within reason ;) we can't travel back in time and add needed information to debug issues to our or users' logs, a stray duplicate context can easily be filtered out though. of course that doesn't mean every error is allowed to grow 10 lines long by default, balance is still important even when doing a first pass/implementation of improved error messages!
and of course, the general rule applies - if we use errors with proper context, it should be done fairly consistently, so that the next level up can actually use it. it might be easier to use format_err or explicit additional `warn!` invocations to improve the situation immediately, and postpone adding proper contexts to do it for all of sync without time pressure.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages
2024-11-21 19:32 ` Fabian Grünbichler
@ 2024-11-22 8:41 ` Christian Ebner
0 siblings, 0 replies; 14+ messages in thread
From: Christian Ebner @ 2024-11-22 8:41 UTC (permalink / raw)
To: Fabian Grünbichler,
Proxmox Backup Server development discussion, Gabriel Goller
On 11/21/24 20:32, Fabian Grünbichler wrote:
>
>> Gabriel Goller <g.goller@proxmox.com> hat am 21.11.2024 18:04 CET geschrieben:
>>
>>
>> On 21.11.2024 17:26, Christian Ebner wrote:
>>> On 11/21/24 17:06, Gabriel Goller wrote:
>>>> I think I'd be nice if we use `.context()` instead of `format_err` and
>>>> debug print ("{err:?}") instead of default ("{}") here. As these
>>>> messages are all errors they shouldn't appear too often in the log and
>>>> if an error happens, you get much more information.
>>>>
>>>> What do you think?
>>>
>>> Hmm, can check that out, but wouldn't that disrupt the single line
>>> character of the log entries, prefixed by the time stamp?
>>
>> We could use "{err:#}", which will print everything in one line?
>>
>> Although on the other hand, I'd understand if you want to have strict
>> control over what is displayed to the users, so no hard feelings on
>> this one.
>
> big tasks/job like the sync ones are a bit special w.r.t. error handling because they "catch" errors often at intermediate levels, log them, record that an error occur, but then proceed with a sensible next unit of work. so it's not possible to just add a lot of context to an error like with simple(r) API calls, where many/most errors can be treated as fatal, maybe requiring a bit of cleanup before bailing completely.
>
> I think a good compromise is to treat those units of work as a sort of error scope, and
> - add context where it is missing (e.g., low level file or network access, where just bubbling up errors might be meaningless)
> - and then when logging the error, think about formatting
The current series already tries to follow your lead here, following
along the lines of you followup patches to the original series.
I will however double check where the suggested adding of an anyhow
Error context makes sense and how to best log soft errors without
disrupting the logs to much, increasing the verbosity for now as you
suggested below.
>
> e.g., for syncs we have groups as lowest unit of work - if something fails within a group, we abort that group, but proceed with the next one. but if we just log "syncing group X failed - permission denied" that doesn't help whatsoever. it might be better to have three lines of warnings if something goes wrong, if those three lines contain appropriate information like
> - request X / file access Y /.. failed with error Z (root cause)
> - while processing snapshot A in group B (important context, because the request or path of the root cause might not tell us this)
> - syncing group B failed (result of the error chain)
>
> achieving that requires carefully analyzing all error sources/chains though. when in doubt, I'd rather have a bit too much context initially (that usually means too much is added at lower levels and we can optimize there) than too little, but within reason ;) we can't travel back in time and add needed information to debug issues to our or users' logs, a stray duplicate context can easily be filtered out though. of course that doesn't mean every error is allowed to grow 10 lines long by default, balance is still important even when doing a first pass/implementation of improved error messages!
>
> and of course, the general rule applies - if we use errors with proper context, it should be done fairly consistently, so that the next level up can actually use it. it might be easier to use format_err or explicit additional `warn!` invocations to improve the situation immediately, and postpone adding proper contexts to do it for all of sync without time pressure.
Yes, aligning the sync job logs for push and pull would of course also
be of interest here, but given that the pull job also logs information
not controlled by the job directly, but also indirectly by e.g. logs of
operations on the datastore, that will require some more refactoring and
reorganization of the logic to get it right, without breaking other logging.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 3/4] server: push: add error context to all target api calls
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 3/4] server: push: add error context to all target api calls Christian Ebner
@ 2024-11-22 9:01 ` Fabian Grünbichler
2024-11-22 10:11 ` Christian Ebner
0 siblings, 1 reply; 14+ messages in thread
From: Fabian Grünbichler @ 2024-11-22 9:01 UTC (permalink / raw)
To: Christian Ebner, pbs-devel
Quoting Christian Ebner (2024-11-21 16:43:36)
> Make it clear from the context that these error messages stem from
> the response of an api call rather than a local error.
>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> src/server/push.rs | 24 +++++++++++++++++-------
> 1 file changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/src/server/push.rs b/src/server/push.rs
> index 86cef5520..fe2e11220 100644
> --- a/src/server/push.rs
> +++ b/src/server/push.rs
> @@ -219,7 +219,9 @@ async fn remove_target_namespace(
> if params.target.supports_prune_delete_stats {
> let data = result["data"].take();
> serde_json::from_value(data).map_err(|err| {
> - format_err!("removing target namespace {target_namespace} failed - {err}")
> + format_err!(
> + "Failed to remove remote namespace {target_namespace}, remote returned: {err}"
> + )
this is attached to the wrong error - it should be attached to the client.delete call right above..
this here should instead add the context that we failed to parse the returned
value (which should never happen, that means we missed some API breakage..)
> })
> } else {
> Ok(BackupGroupDeleteStats::default())
> @@ -236,7 +238,8 @@ async fn fetch_target_groups(
> let args = Some(serde_json::json!({ "ns": target_namespace.name() }));
>
> let mut result = params.target.client.get(&api_path, args).await?;
> - let groups: Vec<GroupListItem> = serde_json::from_value(result["data"].take())?;
> + let groups: Vec<GroupListItem> = serde_json::from_value(result["data"].take())
> + .map_err(|err| format_err!("Failed to fetch remote groups, remote returned: {err}"))?;
same here, just with get instead of delete ;)
>
> let (mut owned, not_owned) = groups.into_iter().fold(
> (Vec::new(), HashSet::new()),
> @@ -277,8 +280,9 @@ async fn remove_target_group(
>
> if params.target.supports_prune_delete_stats {
> let data = result["data"].take();
> - serde_json::from_value(data)
> - .map_err(|err| format_err!("removing target group {backup_group} failed - {err}"))
> + serde_json::from_value(data).map_err(|err| {
> + format_err!("Failed to remove remote group {backup_group}, remote returned: {err}")
> + })
here as well
> } else {
> Ok(BackupGroupDeleteStats::default())
> }
> @@ -313,7 +317,7 @@ async fn check_or_create_target_namespace(
> match params.target.client.post(&api_path, Some(args)).await {
> Ok(_) => info!("Successfully created new namespace {current} on remote"),
> Err(err) => {
> - bail!("Remote creation of namespace {current} failed, remote returned: {err}")
> + bail!("Creation of remote namespace {current} failed, remote returned: {err}")
> }
> }
> existing_target_namespaces.push(current.clone());
> @@ -585,7 +589,8 @@ async fn fetch_target_snapshots(
> args["ns"] = serde_json::to_value(target_namespace)?;
> }
> let mut result = params.target.client.get(&api_path, Some(args)).await?;
> - let snapshots: Vec<SnapshotListItem> = serde_json::from_value(result["data"].take())?;
> + let snapshots: Vec<SnapshotListItem> = serde_json::from_value(result["data"].take())
> + .map_err(|err| format_err!("Failed to fetch remote snapshots, remote returned: {err}"))?;
here as well
>
> Ok(snapshots)
> }
> @@ -603,7 +608,12 @@ async fn forget_target_snapshot(
> if !target_namespace.is_root() {
> args["ns"] = serde_json::to_value(target_namespace)?;
> }
> - params.target.client.delete(&api_path, Some(args)).await?;
> + params
> + .target
> + .client
> + .delete(&api_path, Some(args))
> + .await
> + .map_err(|err| format_err!("Failed to remove remote snapshot, remote returned: {err}"))?;
this should probably be just "Request to remote returned {err}", since the call
site already logs the snapshot name and the fact that this is removal failing
;)
>
> Ok(())
> }
> --
> 2.39.5
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pbs-devel] partially-applied: [PATCH proxmox-backup 0/4] improve push sync job log messages
2024-11-21 15:43 [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Christian Ebner
` (4 preceding siblings ...)
2024-11-21 16:06 ` [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Gabriel Goller
@ 2024-11-22 9:04 ` Fabian Grünbichler
5 siblings, 0 replies; 14+ messages in thread
From: Fabian Grünbichler @ 2024-11-22 9:04 UTC (permalink / raw)
To: Christian Ebner, pbs-devel
applied patchs 1, 2 and 4 with some small follow-ups:
- moved the remove snapshot/group success logging into the corresponding code
path, so that failures don't log twice/wrong information
- replaced the abort for non-owned groups in the last patch with a skip, we
haven't started uploading yet at that point, so we can't abort anything.
comments for patch 3 in a separate mail.
Quoting Christian Ebner (2024-11-21 16:43:33)
> This is a small series of patches with the intend to improve the log
> messages for the sync job in push direction, mainly adding context to
> error messages from the remote when the error stems from an api call,
> limiting line length and improving usage of consistent terms for better
> readablility and easier understanding.
>
> Christian Ebner (4):
> server: push: fix needless borrow clippy warning
> server: push: consistently use remote over target for error messages
> server: push: add error context to all target api calls
> server: push: various smaller improvements to error messages
>
> src/server/push.rs | 79 ++++++++++++++++++++++++++++------------------
> 1 file changed, 49 insertions(+), 30 deletions(-)
>
> --
> 2.39.5
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 3/4] server: push: add error context to all target api calls
2024-11-22 9:01 ` Fabian Grünbichler
@ 2024-11-22 10:11 ` Christian Ebner
0 siblings, 0 replies; 14+ messages in thread
From: Christian Ebner @ 2024-11-22 10:11 UTC (permalink / raw)
To: Fabian Grünbichler, pbs-devel
On 11/22/24 10:01, Fabian Grünbichler wrote:
> Quoting Christian Ebner (2024-11-21 16:43:36)
>> Make it clear from the context that these error messages stem from
>> the response of an api call rather than a local error.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> src/server/push.rs | 24 +++++++++++++++++-------
>> 1 file changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/src/server/push.rs b/src/server/push.rs
>> index 86cef5520..fe2e11220 100644
>> --- a/src/server/push.rs
>> +++ b/src/server/push.rs
>> @@ -219,7 +219,9 @@ async fn remove_target_namespace(
>> if params.target.supports_prune_delete_stats {
>> let data = result["data"].take();
>> serde_json::from_value(data).map_err(|err| {
>> - format_err!("removing target namespace {target_namespace} failed - {err}")
>> + format_err!(
>> + "Failed to remove remote namespace {target_namespace}, remote returned: {err}"
>> + )
>
> this is attached to the wrong error - it should be attached to the client.delete call right above..
Oof, correct, will fix that. Thanks for catching this!
>
> this here should instead add the context that we failed to parse the returned
> value (which should never happen, that means we missed some API breakage..)
>
>> })
>> } else {
>> Ok(BackupGroupDeleteStats::default())
>> @@ -236,7 +238,8 @@ async fn fetch_target_groups(
>> let args = Some(serde_json::json!({ "ns": target_namespace.name() }));
>>
>> let mut result = params.target.client.get(&api_path, args).await?;
>> - let groups: Vec<GroupListItem> = serde_json::from_value(result["data"].take())?;
>> + let groups: Vec<GroupListItem> = serde_json::from_value(result["data"].take())
>> + .map_err(|err| format_err!("Failed to fetch remote groups, remote returned: {err}"))?;
>
> same here, just with get instead of delete ;)
>
>>
>> let (mut owned, not_owned) = groups.into_iter().fold(
>> (Vec::new(), HashSet::new()),
>> @@ -277,8 +280,9 @@ async fn remove_target_group(
>>
>> if params.target.supports_prune_delete_stats {
>> let data = result["data"].take();
>> - serde_json::from_value(data)
>> - .map_err(|err| format_err!("removing target group {backup_group} failed - {err}"))
>> + serde_json::from_value(data).map_err(|err| {
>> + format_err!("Failed to remove remote group {backup_group}, remote returned: {err}")
>> + })
>
> here as well
>
>> } else {
>> Ok(BackupGroupDeleteStats::default())
>> }
>> @@ -313,7 +317,7 @@ async fn check_or_create_target_namespace(
>> match params.target.client.post(&api_path, Some(args)).await {
>> Ok(_) => info!("Successfully created new namespace {current} on remote"),
>> Err(err) => {
>> - bail!("Remote creation of namespace {current} failed, remote returned: {err}")
>> + bail!("Creation of remote namespace {current} failed, remote returned: {err}")
>> }
>> }
>> existing_target_namespaces.push(current.clone());
>> @@ -585,7 +589,8 @@ async fn fetch_target_snapshots(
>> args["ns"] = serde_json::to_value(target_namespace)?;
>> }
>> let mut result = params.target.client.get(&api_path, Some(args)).await?;
>> - let snapshots: Vec<SnapshotListItem> = serde_json::from_value(result["data"].take())?;
>> + let snapshots: Vec<SnapshotListItem> = serde_json::from_value(result["data"].take())
>> + .map_err(|err| format_err!("Failed to fetch remote snapshots, remote returned: {err}"))?;
>
> here as well
>
>>
>> Ok(snapshots)
>> }
>> @@ -603,7 +608,12 @@ async fn forget_target_snapshot(
>> if !target_namespace.is_root() {
>> args["ns"] = serde_json::to_value(target_namespace)?;
>> }
>> - params.target.client.delete(&api_path, Some(args)).await?;
>> + params
>> + .target
>> + .client
>> + .delete(&api_path, Some(args))
>> + .await
>> + .map_err(|err| format_err!("Failed to remove remote snapshot, remote returned: {err}"))?;
>
> this should probably be just "Request to remote returned {err}", since the call
> site already logs the snapshot name and the fact that this is removal failing
> ;)
>
>>
>> Ok(())
>> }
>> --
>> 2.39.5
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages
2024-11-21 16:06 ` [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Gabriel Goller
2024-11-21 16:26 ` Christian Ebner
@ 2024-11-22 12:28 ` Christian Ebner
1 sibling, 0 replies; 14+ messages in thread
From: Christian Ebner @ 2024-11-22 12:28 UTC (permalink / raw)
To: Gabriel Goller; +Cc: pbs-devel
superseded-by version 2:
https://lore.proxmox.com/pbs-devel/20241122122604.335901-1-c.ebner@proxmox.com/T
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-11-22 12:29 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-21 15:43 [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Christian Ebner
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 1/4] server: push: fix needless borrow clippy warning Christian Ebner
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 2/4] server: push: consistently use remote over target for error messages Christian Ebner
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 3/4] server: push: add error context to all target api calls Christian Ebner
2024-11-22 9:01 ` Fabian Grünbichler
2024-11-22 10:11 ` Christian Ebner
2024-11-21 15:43 ` [pbs-devel] [PATCH proxmox-backup 4/4] server: push: various smaller improvements to error messages Christian Ebner
2024-11-21 16:06 ` [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages Gabriel Goller
2024-11-21 16:26 ` Christian Ebner
2024-11-21 17:04 ` Gabriel Goller
2024-11-21 19:32 ` Fabian Grünbichler
2024-11-22 8:41 ` Christian Ebner
2024-11-22 12:28 ` Christian Ebner
2024-11-22 9:04 ` [pbs-devel] partially-applied: " Fabian Grünbichler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox