public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts
@ 2020-12-21 13:56 Stefan Reiter
  2020-12-21 13:56 ` [pbs-devel] [RFC proxmox 1/2] add tools::future with TimeoutFutureExt Stefan Reiter
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Stefan Reiter @ 2020-12-21 13:56 UTC (permalink / raw)
  To: pbs-devel

Not a comprehensive overhaul, but should fix the most common hangs to at least
finish *sometime*. Tested on an intentionally slow PBS with VM backups - QEMU
still hangs, but resumes to work after 20 seconds as intended.

20 seconds was chosen by fair dice roll, seems to be a good limit for opening an
HTTP connection (so not counting actual data transfer)?

Disclaimer: The 'proxmox' patches were a *bit* over my head, so I hope the
generics and stuff make sense, appreciate any feedback there - it's certainly
interesting to play with higher-level Rust like this, would love to learn :)


proxmox: Stefan Reiter (1):
  add tools::future with TimeoutFutureExt

 proxmox/src/tools/future.rs | 48 +++++++++++++++++++++++++++++++++++++
 proxmox/src/tools/mod.rs    |  1 +
 2 files changed, 49 insertions(+)
 create mode 100644 proxmox/src/tools/future.rs

proxmox-backup: Stefan Reiter (1):
  http_client: add timeouts for critical connects

 src/client/http_client.rs | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

-- 
2.20.1




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [pbs-devel] [RFC proxmox 1/2] add tools::future with TimeoutFutureExt
  2020-12-21 13:56 [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts Stefan Reiter
@ 2020-12-21 13:56 ` Stefan Reiter
  2020-12-21 13:56 ` [pbs-devel] [RFC proxmox-backup 2/2] http_client: add timeouts for critical connects Stefan Reiter
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Stefan Reiter @ 2020-12-21 13:56 UTC (permalink / raw)
  To: pbs-devel

Implements shorthands to automatically cancel a long-running future
after a timeout is reached.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 proxmox/src/tools/future.rs | 48 +++++++++++++++++++++++++++++++++++++
 proxmox/src/tools/mod.rs    |  1 +
 2 files changed, 49 insertions(+)
 create mode 100644 proxmox/src/tools/future.rs

diff --git a/proxmox/src/tools/future.rs b/proxmox/src/tools/future.rs
new file mode 100644
index 0000000..476fd11
--- /dev/null
+++ b/proxmox/src/tools/future.rs
@@ -0,0 +1,48 @@
+//! Common extensions for Futures
+use anyhow::Error;
+use futures::future::{select, Either, FutureExt};
+use std::future::Future;
+use std::time::Duration;
+use tokio::time::delay_for;
+
+impl<T> TimeoutFutureExt for T where T: Future {}
+
+/// Implements a timeout for futures, automatically aborting them if the timeout runs out before
+/// the base future completes.
+pub trait TimeoutFutureExt: Future {
+    /// Returned Future returns 'None' in case the timeout was reached, otherwise the original
+    /// return value.
+    fn or_timeout<'a>(
+        self,
+        timeout: Duration,
+    ) -> Box<dyn Future<Output = Option<Self::Output>> + Unpin + Send + 'a>
+    where
+        Self: Sized + Unpin + Send + 'a,
+    {
+        let timeout_fut = delay_for(timeout);
+        Box::new(select(self, timeout_fut).map(|res| match res {
+            Either::Left((result, _)) => Some(result),
+            Either::Right(((), _)) => None,
+        }))
+    }
+
+    /// Returned Future returns either the original result, or `Err<err>` in case the timeout is
+    /// reached. Basically a shorthand to flatten a future that returns a `Result<_, Error>` with a
+    /// timeout. The base Future can return any kind of Error that can be made into an
+    /// `anyhow::Error`.
+    fn or_timeout_err<'a, O, E>(
+        self,
+        timeout: Duration,
+        err: Error,
+    ) -> Box<dyn Future<Output = Result<O, Error>> + Unpin + Send + 'a>
+    where
+        Self: Sized + Unpin + Send + 'a,
+        Self::Output: Into<Result<O, E>>,
+        E: Into<Error> + std::error::Error + Send + Sync + 'static,
+    {
+        Box::new(self.or_timeout(timeout).map(|res| match res {
+            Some(res) => res.into().map_err(Error::from),
+            None => Err(err),
+        }))
+    }
+}
diff --git a/proxmox/src/tools/mod.rs b/proxmox/src/tools/mod.rs
index ff3a720..5d1f46e 100644
--- a/proxmox/src/tools/mod.rs
+++ b/proxmox/src/tools/mod.rs
@@ -20,6 +20,7 @@ pub mod serde;
 pub mod time;
 pub mod uuid;
 pub mod vec;
+pub mod future;
 
 #[cfg(feature = "websocket")]
 pub mod websocket;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 10+ messages in thread

* [pbs-devel] [RFC proxmox-backup 2/2] http_client: add timeouts for critical connects
  2020-12-21 13:56 [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts Stefan Reiter
  2020-12-21 13:56 ` [pbs-devel] [RFC proxmox 1/2] add tools::future with TimeoutFutureExt Stefan Reiter
@ 2020-12-21 13:56 ` Stefan Reiter
  2020-12-21 15:36 ` [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts Dietmar Maurer
  2020-12-22 12:32 ` [pbs-devel] applied: " Dietmar Maurer
  3 siblings, 0 replies; 10+ messages in thread
From: Stefan Reiter @ 2020-12-21 13:56 UTC (permalink / raw)
  To: pbs-devel

Use timeout futures for sections that might hang in certain error
conditions. This is mostly intended to be used as a safeguard, not a
first line of defense - i.e. best-effort avoidance of total hangs.

Not every future used for the HttpClient/H2Client is changed, only those
where a quick response is to be expected. For example, the response
reading futures are left alone, so data transfer is never capped with
timeout, only the initial server connect.

It is also used for upgrading to H2 connections, as that can take a long
time on overloaded servers.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/client/http_client.rs | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/client/http_client.rs b/src/client/http_client.rs
index a9b9c06c..92df9572 100644
--- a/src/client/http_client.rs
+++ b/src/client/http_client.rs
@@ -18,6 +18,7 @@ use proxmox::{
     api::error::HttpError,
     sys::linux::tty,
     tools::fs::{file_get_json, replace_file, CreateOptions},
+    tools::future::TimeoutFutureExt,
 };
 
 use super::pipe_to_stream::PipeToSendStream;
@@ -29,6 +30,10 @@ use crate::tools::{
     http::HttpsConnector,
 };
 
+/// Timeout used for several HTTP operations that are expected to finish quickly but may block in
+/// certain error conditions.
+const HTTP_TIMEOUT: Duration = Duration::from_secs(20);
+
 #[derive(Clone)]
 pub struct AuthInfo {
     pub auth_id: Authid,
@@ -557,7 +562,10 @@ impl HttpClient {
         let enc_ticket = format!("PBSAuthCookie={}", percent_encode(auth.ticket.as_bytes(), DEFAULT_ENCODE_SET));
         req.headers_mut().insert("Cookie", HeaderValue::from_str(&enc_ticket).unwrap());
 
-        let resp = client.request(req).await?;
+        let resp = client
+            .request(req)
+            .or_timeout_err(HTTP_TIMEOUT, format_err!("http download request timed out"))
+            .await?;
         let status = resp.status();
         if !status.is_success() {
             HttpClient::api_response(resp)
@@ -624,7 +632,10 @@ impl HttpClient {
 
         req.headers_mut().insert("UPGRADE", HeaderValue::from_str(&protocol_name).unwrap());
 
-        let resp = client.request(req).await?;
+        let resp = client
+            .request(req)
+            .or_timeout_err(HTTP_TIMEOUT, format_err!("http upgrade request timed out"))
+            .await?;
         let status = resp.status();
 
         if status != http::StatusCode::SWITCHING_PROTOCOLS {
@@ -705,7 +716,7 @@ impl HttpClient {
     ) -> Result<Value, Error> {
 
         client.request(req)
-            .map_err(Error::from)
+            .or_timeout_err(HTTP_TIMEOUT, format_err!("http request timed out"))
             .and_then(Self::api_response)
             .await
     }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts
  2020-12-21 13:56 [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts Stefan Reiter
  2020-12-21 13:56 ` [pbs-devel] [RFC proxmox 1/2] add tools::future with TimeoutFutureExt Stefan Reiter
  2020-12-21 13:56 ` [pbs-devel] [RFC proxmox-backup 2/2] http_client: add timeouts for critical connects Stefan Reiter
@ 2020-12-21 15:36 ` Dietmar Maurer
  2020-12-21 15:45   ` Dietmar Maurer
  2020-12-21 15:49   ` Stefan Reiter
  2020-12-22 12:32 ` [pbs-devel] applied: " Dietmar Maurer
  3 siblings, 2 replies; 10+ messages in thread
From: Dietmar Maurer @ 2020-12-21 15:36 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

> Not a comprehensive overhaul, but should fix the most common hangs to at least
> finish *sometime*. 

We already have TCP timeouts. So why exactly do we need those sort timeouts?




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts
  2020-12-21 15:36 ` [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts Dietmar Maurer
@ 2020-12-21 15:45   ` Dietmar Maurer
  2020-12-21 15:49   ` Stefan Reiter
  1 sibling, 0 replies; 10+ messages in thread
From: Dietmar Maurer @ 2020-12-21 15:45 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter


> On 12/21/2020 4:36 PM Dietmar Maurer <dietmar@proxmox.com> wrote:
> 
>  
> > Not a comprehensive overhaul, but should fix the most common hangs to at least
> > finish *sometime*. 
> 
> We already have TCP timeouts. So why exactly do we need those sort timeouts?

s/sort/short/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts
  2020-12-21 15:36 ` [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts Dietmar Maurer
  2020-12-21 15:45   ` Dietmar Maurer
@ 2020-12-21 15:49   ` Stefan Reiter
  2020-12-21 16:40     ` Dietmar Maurer
  1 sibling, 1 reply; 10+ messages in thread
From: Stefan Reiter @ 2020-12-21 15:49 UTC (permalink / raw)
  To: Dietmar Maurer, Proxmox Backup Server development discussion

On 12/21/20 4:36 PM, Dietmar Maurer wrote:
>> Not a comprehensive overhaul, but should fix the most common hangs to at least
>> finish *sometime*.
> 
> We already have TCP timeouts. So why exactly do we need those short timeouts?
> 

It's not a TCP timeout if the server hangs. This prevents the case where 
the client has a connection established but the server fails to send 
data within the given time.

Came up during discussion of this report:
https://forum.proxmox.com/threads/qmp-command-backup-failed-got-timeout.77749/#post-357700

where high load on the server (from too many verification tasks, which 
is a different problem) causes VM clients to hang for unreasonable 
amounts of time.

(Note that with QEMU 5.2 we can easily do the 'connect' async in the 
background as well, preventing the full VM to hang, but we still need 
some way to timeout the connection attempt, lest it stays active in the 
background)




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts
  2020-12-21 15:49   ` Stefan Reiter
@ 2020-12-21 16:40     ` Dietmar Maurer
  2020-12-22  8:47       ` Stefan Reiter
  0 siblings, 1 reply; 10+ messages in thread
From: Dietmar Maurer @ 2020-12-21 16:40 UTC (permalink / raw)
  To: Stefan Reiter, Proxmox Backup Server development discussion

> Came up during discussion of this report:
> https://forum.proxmox.com/threads/qmp-command-backup-failed-got-timeout.77749/#post-357700
> 
> where high load on the server (from too many verification tasks, which 
> is a different problem) causes VM clients to hang for unreasonable 
> amounts of time.

But with a 20s timeout, the backup would fail quit fast - this is probably also not what the user wants?




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts
  2020-12-21 16:40     ` Dietmar Maurer
@ 2020-12-22  8:47       ` Stefan Reiter
  2020-12-22 10:40         ` Dietmar Maurer
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Reiter @ 2020-12-22  8:47 UTC (permalink / raw)
  To: Dietmar Maurer, Proxmox Backup Server development discussion

On 12/21/20 5:40 PM, Dietmar Maurer wrote:
>> Came up during discussion of this report:
>> https://forum.proxmox.com/threads/qmp-command-backup-failed-got-timeout.77749/#post-357700
>>
>> where high load on the server (from too many verification tasks, which
>> is a different problem) causes VM clients to hang for unreasonable
>> amounts of time.
> 
> But with a 20s timeout, the backup would fail quit fast - this is probably also not what the user wants?
> 

This is only on the initial UPGRADE request, once that is established 
nothing is changed. If that takes longer than 20s (or 30, 40, as stated 
the exact number was more of a guess on my part) the server is probably 
under enough load that another backup client isn't going to help.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts
  2020-12-22  8:47       ` Stefan Reiter
@ 2020-12-22 10:40         ` Dietmar Maurer
  0 siblings, 0 replies; 10+ messages in thread
From: Dietmar Maurer @ 2020-12-22 10:40 UTC (permalink / raw)
  To: Stefan Reiter, Proxmox Backup Server development discussion


> On 12/22/2020 9:47 AM Stefan Reiter <s.reiter@proxmox.com> wrote:
> 
>  
> On 12/21/20 5:40 PM, Dietmar Maurer wrote:
> >> Came up during discussion of this report:
> >> https://forum.proxmox.com/threads/qmp-command-backup-failed-got-timeout.77749/#post-357700
> >>
> >> where high load on the server (from too many verification tasks, which
> >> is a different problem) causes VM clients to hang for unreasonable
> >> amounts of time.
> > 
> > But with a 20s timeout, the backup would fail quit fast - this is probably also not what the user wants?
> > 
> 
> This is only on the initial UPGRADE request, 

I also do not want that backup fails in initial UPGRADE (maybe I do not understand your arguments).




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [pbs-devel] applied: [RFC 0/2] backup client: implement some HTTP timeouts
  2020-12-21 13:56 [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts Stefan Reiter
                   ` (2 preceding siblings ...)
  2020-12-21 15:36 ` [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts Dietmar Maurer
@ 2020-12-22 12:32 ` Dietmar Maurer
  3 siblings, 0 replies; 10+ messages in thread
From: Dietmar Maurer @ 2020-12-22 12:32 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

applied

> On 12/21/2020 2:56 PM Stefan Reiter <s.reiter@proxmox.com> wrote:
> 
>  
> Not a comprehensive overhaul, but should fix the most common hangs to at least
> finish *sometime*. Tested on an intentionally slow PBS with VM backups - QEMU
> still hangs, but resumes to work after 20 seconds as intended.
> 
> 20 seconds was chosen by fair dice roll, seems to be a good limit for opening an
> HTTP connection (so not counting actual data transfer)?
> 
> Disclaimer: The 'proxmox' patches were a *bit* over my head, so I hope the
> generics and stuff make sense, appreciate any feedback there - it's certainly
> interesting to play with higher-level Rust like this, would love to learn :)
> 
> 
> proxmox: Stefan Reiter (1):
>   add tools::future with TimeoutFutureExt
> 
>  proxmox/src/tools/future.rs | 48 +++++++++++++++++++++++++++++++++++++
>  proxmox/src/tools/mod.rs    |  1 +
>  2 files changed, 49 insertions(+)
>  create mode 100644 proxmox/src/tools/future.rs
> 
> proxmox-backup: Stefan Reiter (1):
>   http_client: add timeouts for critical connects
> 
>  src/client/http_client.rs | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> -- 
> 2.20.1
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-12-22 12:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-21 13:56 [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts Stefan Reiter
2020-12-21 13:56 ` [pbs-devel] [RFC proxmox 1/2] add tools::future with TimeoutFutureExt Stefan Reiter
2020-12-21 13:56 ` [pbs-devel] [RFC proxmox-backup 2/2] http_client: add timeouts for critical connects Stefan Reiter
2020-12-21 15:36 ` [pbs-devel] [RFC 0/2] backup client: implement some HTTP timeouts Dietmar Maurer
2020-12-21 15:45   ` Dietmar Maurer
2020-12-21 15:49   ` Stefan Reiter
2020-12-21 16:40     ` Dietmar Maurer
2020-12-22  8:47       ` Stefan Reiter
2020-12-22 10:40         ` Dietmar Maurer
2020-12-22 12:32 ` [pbs-devel] applied: " Dietmar Maurer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal