all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox master v1] fix #6738: rest-server: improve TLS handshake error logging
@ 2025-09-10 15:40 Max R. Carrara
  2025-11-07 13:03 ` Max R. Carrara
  2025-11-07 14:24 ` Hannes Laimer
  0 siblings, 2 replies; 3+ messages in thread
From: Max R. Carrara @ 2025-09-10 15:40 UTC (permalink / raw)
  To: pbs-devel

Right now, when a client connects to the server, we continuously peek
into the TCP stream's buffer until we can perform the TLS handshake
check. If the client closes the connection after sending < 5 bytes, we
exit the peek loop. If the client instead keeps the connection open,
the future times out if the handshake check isn't made.

Instead of exiting with `ErrorKind::ConnectionAborted`, exit with
`ErrorKind::Other` instead, as that variant isn't ever returned by
stdlib functions. That way we can be sure we don't accidentally ignore
an error.

Reduce the amount of log noise by ignoring our `ErrorKind::Other`
errors, as those are usually only returned when some monitoring
solution opens and closes a TCP connection for its healthchecks [0].

Log timeouts without their error chain (like before), as
there isn't any additional context for them.

Because any other kinds of errors should be sufficiently rare, log
them with their error chain. The error chain is formatted to stay in
one line, as the default formatting looks quite awkward otherwise.

Make the context added to the `async_io()` coroutine call more
general and informative, as not only the stream buffer peeking can
fail. Shorten the general log message as well.

[0]: https://forum.proxmox.com/threads/subject-intermittent-backup-errors-and-repeated-tls-handshake-failures.170089/post-795100

Fixes: #6738
Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
---

Tested this interactively in my Python console by opening a raw TCP
stream using the `socket` module.

 proxmox-rest-server/src/connection.rs | 29 ++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/proxmox-rest-server/src/connection.rs b/proxmox-rest-server/src/connection.rs
index 9511b7cb..81c0e05e 100644
--- a/proxmox-rest-server/src/connection.rs
+++ b/proxmox-rest-server/src/connection.rs
@@ -464,7 +464,28 @@ impl AcceptBuilder {
                 }
             }
             Err(err) => {
-                log::error!("[{peer}] failed to check for TLS handshake: {err}");
+                // Ignore io::Errors of kind "Other", as that means the client didn't send any
+                // additional data between stream buffer peeks in our case
+                if let Some(io::ErrorKind::Other) =
+                    err.downcast_ref::<io::Error>().map(io::Error::kind)
+                {
+                    return;
+                }
+
+                // Timeouts are printed without their cause
+                if err.downcast_ref::<tokio::time::error::Elapsed>().is_some() {
+                    log::error!("[{peer}] TLS handshake error: {err}");
+                    return;
+                }
+
+                // As other kinds of errors are relatively rare, print them with their error chain
+                let mut err_chain = err.chain().map(ToString::to_string);
+                let err_text = err_chain.by_ref().take(1).collect::<Vec<_>>().join("");
+                let cause = err_chain.collect::<Vec<_>>().join(" - ");
+
+                log::error!(
+                    "[{peer}] TLS handshake error: {err_text} - Caused by: {cause}"
+                );
             }
         }
     }
@@ -505,7 +526,7 @@ impl AcceptBuilder {
                             // if we detect the same peek len again but still got a readable stream,
                             // the connection was probably closed, so abort here
                             if peek_len == previous_peek_len {
-                                Err(io::ErrorKind::ConnectionAborted.into())
+                                Err(io::ErrorKind::Other.into())
                             } else {
                                 previous_peek_len = peek_len;
                                 Err(io::ErrorKind::WouldBlock.into())
@@ -516,7 +537,9 @@ impl AcceptBuilder {
                     }
                 })
                 .await
-                .context("couldn't peek into incoming TCP stream")
+                .context(
+                    "failed to determine whether the peer is initiating a TLS handshake or not",
+                )
         };
 
         tokio::time::timeout(timeout, future)
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pbs-devel] [PATCH proxmox master v1] fix #6738: rest-server: improve TLS handshake error logging
  2025-09-10 15:40 [pbs-devel] [PATCH proxmox master v1] fix #6738: rest-server: improve TLS handshake error logging Max R. Carrara
@ 2025-11-07 13:03 ` Max R. Carrara
  2025-11-07 14:24 ` Hannes Laimer
  1 sibling, 0 replies; 3+ messages in thread
From: Max R. Carrara @ 2025-11-07 13:03 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On Wed Sep 10, 2025 at 5:40 PM CEST, Max R. Carrara wrote:
> Right now, when a client connects to the server, we continuously peek
> into the TCP stream's buffer until we can perform the TLS handshake
> check. If the client closes the connection after sending < 5 bytes, we
> exit the peek loop. If the client instead keeps the connection open,
> the future times out if the handshake check isn't made.
>
> Instead of exiting with `ErrorKind::ConnectionAborted`, exit with
> `ErrorKind::Other` instead, as that variant isn't ever returned by
> stdlib functions. That way we can be sure we don't accidentally ignore
> an error.
>
> Reduce the amount of log noise by ignoring our `ErrorKind::Other`
> errors, as those are usually only returned when some monitoring
> solution opens and closes a TCP connection for its healthchecks [0].
>
> Log timeouts without their error chain (like before), as
> there isn't any additional context for them.
>
> Because any other kinds of errors should be sufficiently rare, log
> them with their error chain. The error chain is formatted to stay in
> one line, as the default formatting looks quite awkward otherwise.
>
> Make the context added to the `async_io()` coroutine call more
> general and informative, as not only the stream buffer peeking can
> fail. Shorten the general log message as well.
>
> [0]: https://forum.proxmox.com/threads/subject-intermittent-backup-errors-and-repeated-tls-handshake-failures.170089/post-795100
>
> Fixes: #6738
> Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
> ---
>
> Tested this interactively in my Python console by opening a raw TCP
> stream using the `socket` module.
>
>  proxmox-rest-server/src/connection.rs | 29 ++++++++++++++++++++++++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
>

bump


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pbs-devel] [PATCH proxmox master v1] fix #6738: rest-server: improve TLS handshake error logging
  2025-09-10 15:40 [pbs-devel] [PATCH proxmox master v1] fix #6738: rest-server: improve TLS handshake error logging Max R. Carrara
  2025-11-07 13:03 ` Max R. Carrara
@ 2025-11-07 14:24 ` Hannes Laimer
  1 sibling, 0 replies; 3+ messages in thread
From: Hannes Laimer @ 2025-11-07 14:24 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Max R. Carrara

comment + small nit inline

On 9/10/25 17:40, Max R. Carrara wrote:
> Right now, when a client connects to the server, we continuously peek
> into the TCP stream's buffer until we can perform the TLS handshake
> check. If the client closes the connection after sending < 5 bytes, we
> exit the peek loop. If the client instead keeps the connection open,
> the future times out if the handshake check isn't made.
> 
> Instead of exiting with `ErrorKind::ConnectionAborted`, exit with
> `ErrorKind::Other` instead, as that variant isn't ever returned by
> stdlib functions. That way we can be sure we don't accidentally ignore
> an error.
> 

I don't think this is accurate, stdlib maps things it doesn't have an
ErrorKind for to ::Other [1]

Could we maybe have something like
```
#[derive(Debug, Error)]
#[error("client stopped sending data while waiting for TLS handshake 
bytes")]
struct HandshakeNoProgress;

...
if peek_len == previous_peek_len {
     Err(HandshakeNoProgress.into())
} else {
...
if err.downcast_ref::<HandshakeNoProgress>().is_some() {
     return;
}
log::error!("[{peer}] TLS handshake error: {err:?}");
```
note: did not test this. Like that we could avoid unnecessary logging,
but don't lose out on potentially actual errors.


[1] 
https://github.com/rust-lang/rust/blob/96064126a086a8428d66e07fb3b91421bb86a512/library/std/src/sys/pal/unix/mod.rs#L235

> Reduce the amount of log noise by ignoring our `ErrorKind::Other`
> errors, as those are usually only returned when some monitoring
> solution opens and closes a TCP connection for its healthchecks [0].
> 
> Log timeouts without their error chain (like before), as
> there isn't any additional context for them.
> 
> Because any other kinds of errors should be sufficiently rare, log
> them with their error chain. The error chain is formatted to stay in
> one line, as the default formatting looks quite awkward otherwise.
> 
> Make the context added to the `async_io()` coroutine call more
> general and informative, as not only the stream buffer peeking can
> fail. Shorten the general log message as well.
> 
> [0]: https://forum.proxmox.com/threads/subject-intermittent-backup-errors-and-repeated-tls-handshake-failures.170089/post-795100
> 
> Fixes: #6738
> Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
> ---
> 
> Tested this interactively in my Python console by opening a raw TCP
> stream using the `socket` module.
> 
>   proxmox-rest-server/src/connection.rs | 29 ++++++++++++++++++++++++---
>   1 file changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/proxmox-rest-server/src/connection.rs b/proxmox-rest-server/src/connection.rs
> index 9511b7cb..81c0e05e 100644
> --- a/proxmox-rest-server/src/connection.rs
> +++ b/proxmox-rest-server/src/connection.rs
> @@ -464,7 +464,28 @@ impl AcceptBuilder {
>                   }
>               }
>               Err(err) => {
> -                log::error!("[{peer}] failed to check for TLS handshake: {err}");
> +                // Ignore io::Errors of kind "Other", as that means the client didn't send any
> +                // additional data between stream buffer peeks in our case
> +                if let Some(io::ErrorKind::Other) =
> +                    err.downcast_ref::<io::Error>().map(io::Error::kind)
> +                {
> +                    return;
> +                }
> +
> +                // Timeouts are printed without their cause
> +                if err.downcast_ref::<tokio::time::error::Elapsed>().is_some() {
> +                    log::error!("[{peer}] TLS handshake error: {err}");
> +                    return;
> +                }
> +
> +                // As other kinds of errors are relatively rare, print them with their error chain
> +                let mut err_chain = err.chain().map(ToString::to_string);
> +                let err_text = err_chain.by_ref().take(1).collect::<Vec<_>>().join("");
> +                let cause = err_chain.collect::<Vec<_>>().join(" - ");
> +
> +                log::error!(
> +                    "[{peer}] TLS handshake error: {err_text} - Caused by: {cause}"

`cause` could be empty, no? we should probably skip the ' - Caused by: '
in that case

> +                );
>               }
>           }
>       }
> @@ -505,7 +526,7 @@ impl AcceptBuilder {
>                               // if we detect the same peek len again but still got a readable stream,
>                               // the connection was probably closed, so abort here
>                               if peek_len == previous_peek_len {
> -                                Err(io::ErrorKind::ConnectionAborted.into())
> +                                Err(io::ErrorKind::Other.into())
>                               } else {
>                                   previous_peek_len = peek_len;
>                                   Err(io::ErrorKind::WouldBlock.into())
> @@ -516,7 +537,9 @@ impl AcceptBuilder {
>                       }
>                   })
>                   .await
> -                .context("couldn't peek into incoming TCP stream")
> +                .context(
> +                    "failed to determine whether the peer is initiating a TLS handshake or not",
> +                )
>           };
>   
>           tokio::time::timeout(timeout, future)



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-11-07 14:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-10 15:40 [pbs-devel] [PATCH proxmox master v1] fix #6738: rest-server: improve TLS handshake error logging Max R. Carrara
2025-11-07 13:03 ` Max R. Carrara
2025-11-07 14:24 ` Hannes Laimer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal