* [pbs-devel] [PATCH 1/2] rrd: relay error to update database to caller
@ 2024-11-14 13:29 Thomas Lamprecht
2024-11-14 13:29 ` [pbs-devel] [PATCH 2/2] rrd: clamp future last_update time on load Thomas Lamprecht
2025-02-10 11:39 ` [pbs-devel] [PATCH 1/2] rrd: relay error to update database to caller Wolfgang Bumiller
0 siblings, 2 replies; 3+ messages in thread
From: Thomas Lamprecht @ 2024-11-14 13:29 UTC (permalink / raw)
To: pbs-devel
It does not make much sense to just log here, especially as the update
fn has no context about what RRD series it's operating on.
I.e., logged message previously:
> rrd update failed: time in past (...)
vs logged message now:
> rrd::update_value 'host/cpu' failed - time in past (...)
The callers of the Database::update fn in the RRD Cache map can
already handle errors, albeit it won't save the freshly loaded RRD in
the map anymore if the update fails, any load will still do that
though.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
It might be slightly nicer to factor out the common call to update to
happen after getting/creating the RRD, but it's not trivial to do so as
efficiently due to ownership handover when inserting the RRD in the map.
proxmox-rrd/src/cache/rrd_map.rs | 4 ++--
proxmox-rrd/src/rrd.rs | 14 ++++++--------
2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/proxmox-rrd/src/cache/rrd_map.rs b/proxmox-rrd/src/cache/rrd_map.rs
index 0ef61cfa..4bcedade 100644
--- a/proxmox-rrd/src/cache/rrd_map.rs
+++ b/proxmox-rrd/src/cache/rrd_map.rs
@@ -42,7 +42,7 @@ impl RRDMap {
) -> Result<(), Error> {
if let Some(rrd) = self.map.get_mut(rel_path) {
if !new_only || time > rrd.last_update() {
- rrd.update(time, value);
+ rrd.update(time, value)?;
}
} else {
let mut path = self.config.basedir.clone();
@@ -61,7 +61,7 @@ impl RRDMap {
};
if !new_only || time > rrd.last_update() {
- rrd.update(time, value);
+ rrd.update(time, value)?;
}
self.map.insert(rel_path.to_string(), rrd);
}
diff --git a/proxmox-rrd/src/rrd.rs b/proxmox-rrd/src/rrd.rs
index 440abe06..4bf4f01b 100644
--- a/proxmox-rrd/src/rrd.rs
+++ b/proxmox-rrd/src/rrd.rs
@@ -469,14 +469,10 @@ impl Database {
/// Update the value (in memory)
///
/// Note: This does not call [Self::save].
- pub fn update(&mut self, time: f64, value: f64) {
- let value = match self.source.compute_new_value(time, value) {
- Ok(value) => value,
- Err(err) => {
- log::error!("rrd update failed: {}", err);
- return;
- }
- };
+ pub fn update(&mut self, time: f64, value: f64) -> Result<(), Error> {
+ let value = self
+ .source
+ .compute_new_value(time, value)?;
let last_update = self.source.last_update;
self.source.last_update = time;
@@ -485,6 +481,8 @@ impl Database {
rra.delete_old_slots(time, last_update);
rra.compute_new_value(time, last_update, value);
}
+
+ Ok(())
}
/// Extract data from the archive
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* [pbs-devel] [PATCH 2/2] rrd: clamp future last_update time on load
2024-11-14 13:29 [pbs-devel] [PATCH 1/2] rrd: relay error to update database to caller Thomas Lamprecht
@ 2024-11-14 13:29 ` Thomas Lamprecht
2025-02-10 11:39 ` [pbs-devel] [PATCH 1/2] rrd: relay error to update database to caller Wolfgang Bumiller
1 sibling, 0 replies; 3+ messages in thread
From: Thomas Lamprecht @ 2024-11-14 13:29 UTC (permalink / raw)
To: pbs-devel
We had already cases reported about systems where the BIOS had a time
rather far in the future and thus anything that requires some time
ordering might fail if it was initialised before an NTP system managed
to sync the clock again.
RRD updates are one such things, so as stop-gap just clam the
last_update time on load.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
it might be nicer to clamp when saving the file, as that also has a
higher chance to a NTP client having run and thus avoiding an error in
the other direction, i.e., when the system is booted with time in the
past. So feell free to take this over and rework for that case, just
sending it out as I had a prototype around for some recent debug session
on my machine.
proxmox-rrd/src/rrd.rs | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/proxmox-rrd/src/rrd.rs b/proxmox-rrd/src/rrd.rs
index 4bf4f01b..73a0ebd4 100644
--- a/proxmox-rrd/src/rrd.rs
+++ b/proxmox-rrd/src/rrd.rs
@@ -378,6 +378,11 @@ impl Database {
if rrd.source.last_update < 0.0 {
bail!("rrd file has negative last_update time");
+ } else if rrd.source.last_update > proxmox_time::epoch_f64() {
+ let mut rrd = rrd;
+ log::error!("rrd file has last_update time from the future, clamping to now!");
+ rrd.source.last_update = proxmox_time::epoch_f64();
+ return Ok(rrd);
}
Ok(rrd)
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [pbs-devel] [PATCH 1/2] rrd: relay error to update database to caller
2024-11-14 13:29 [pbs-devel] [PATCH 1/2] rrd: relay error to update database to caller Thomas Lamprecht
2024-11-14 13:29 ` [pbs-devel] [PATCH 2/2] rrd: clamp future last_update time on load Thomas Lamprecht
@ 2025-02-10 11:39 ` Wolfgang Bumiller
1 sibling, 0 replies; 3+ messages in thread
From: Wolfgang Bumiller @ 2025-02-10 11:39 UTC (permalink / raw)
To: Thomas Lamprecht; +Cc: pbs-devel
On Thu, Nov 14, 2024 at 02:29:49PM +0100, Thomas Lamprecht wrote:
> It does not make much sense to just log here, especially as the update
> fn has no context about what RRD series it's operating on.
>
> I.e., logged message previously:
> > rrd update failed: time in past (...)
>
> vs logged message now:
> > rrd::update_value 'host/cpu' failed - time in past (...)
>
> The callers of the Database::update fn in the RRD Cache map can
> already handle errors, albeit it won't save the freshly loaded RRD in
> the map anymore if the update fails, any load will still do that
> though.
One of the call-chains comes from `apply_journal` (through
`apply_and_commit_journal_thread`). If the journal is corrupted
(negative time, NaNs) or the time is otherwise "messed up" (time in the
past).
Curiously the `journal_applied` result from that method
return the *previous* time the thread ran. The thread will set this to
true if it succeeded.
So technically this could return "yes the journal is applied" while
logging "failed to apply the journal"... 🤔
This is a bit weird.
Maybe we should pass the name down the `apply_journal()` call chain so
it can log and skip like before?
>
> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> ---
> It might be slightly nicer to factor out the common call to update to
> happen after getting/creating the RRD, but it's not trivial to do so as
> efficiently due to ownership handover when inserting the RRD in the map.
>
> proxmox-rrd/src/cache/rrd_map.rs | 4 ++--
> proxmox-rrd/src/rrd.rs | 14 ++++++--------
> 2 files changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/proxmox-rrd/src/cache/rrd_map.rs b/proxmox-rrd/src/cache/rrd_map.rs
> index 0ef61cfa..4bcedade 100644
> --- a/proxmox-rrd/src/cache/rrd_map.rs
> +++ b/proxmox-rrd/src/cache/rrd_map.rs
> @@ -42,7 +42,7 @@ impl RRDMap {
> ) -> Result<(), Error> {
> if let Some(rrd) = self.map.get_mut(rel_path) {
> if !new_only || time > rrd.last_update() {
> - rrd.update(time, value);
> + rrd.update(time, value)?;
> }
> } else {
> let mut path = self.config.basedir.clone();
> @@ -61,7 +61,7 @@ impl RRDMap {
> };
>
> if !new_only || time > rrd.last_update() {
> - rrd.update(time, value);
> + rrd.update(time, value)?;
> }
> self.map.insert(rel_path.to_string(), rrd);
> }
> diff --git a/proxmox-rrd/src/rrd.rs b/proxmox-rrd/src/rrd.rs
> index 440abe06..4bf4f01b 100644
> --- a/proxmox-rrd/src/rrd.rs
> +++ b/proxmox-rrd/src/rrd.rs
> @@ -469,14 +469,10 @@ impl Database {
> /// Update the value (in memory)
> ///
> /// Note: This does not call [Self::save].
> - pub fn update(&mut self, time: f64, value: f64) {
> - let value = match self.source.compute_new_value(time, value) {
> - Ok(value) => value,
> - Err(err) => {
> - log::error!("rrd update failed: {}", err);
> - return;
> - }
> - };
> + pub fn update(&mut self, time: f64, value: f64) -> Result<(), Error> {
> + let value = self
> + .source
> + .compute_new_value(time, value)?;
>
> let last_update = self.source.last_update;
> self.source.last_update = time;
> @@ -485,6 +481,8 @@ impl Database {
> rra.delete_old_slots(time, last_update);
> rra.compute_new_value(time, last_update, value);
> }
> +
> + Ok(())
> }
>
> /// Extract data from the archive
> --
> 2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-02-10 11:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-14 13:29 [pbs-devel] [PATCH 1/2] rrd: relay error to update database to caller Thomas Lamprecht
2024-11-14 13:29 ` [pbs-devel] [PATCH 2/2] rrd: clamp future last_update time on load Thomas Lamprecht
2025-02-10 11:39 ` [pbs-devel] [PATCH 1/2] rrd: relay error to update database to caller Wolfgang Bumiller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox