From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 740221FF16B for ; Fri, 7 Nov 2025 15:24:05 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 1AE261272B; Fri, 7 Nov 2025 15:24:47 +0100 (CET) Message-ID: Date: Fri, 7 Nov 2025 15:24:42 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Proxmox Backup Server development discussion , "Max R. Carrara" References: <20250910154041.1032251-1-m.carrara@proxmox.com> Content-Language: en-US From: Hannes Laimer In-Reply-To: <20250910154041.1032251-1-m.carrara@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1762525463106 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.043 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox master v1] fix #6738: rest-server: improve TLS handshake error logging X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" comment + small nit inline On 9/10/25 17:40, Max R. Carrara wrote: > Right now, when a client connects to the server, we continuously peek > into the TCP stream's buffer until we can perform the TLS handshake > check. If the client closes the connection after sending < 5 bytes, we > exit the peek loop. If the client instead keeps the connection open, > the future times out if the handshake check isn't made. > > Instead of exiting with `ErrorKind::ConnectionAborted`, exit with > `ErrorKind::Other` instead, as that variant isn't ever returned by > stdlib functions. That way we can be sure we don't accidentally ignore > an error. > I don't think this is accurate, stdlib maps things it doesn't have an ErrorKind for to ::Other [1] Could we maybe have something like ``` #[derive(Debug, Error)] #[error("client stopped sending data while waiting for TLS handshake bytes")] struct HandshakeNoProgress; ... if peek_len == previous_peek_len { Err(HandshakeNoProgress.into()) } else { ... if err.downcast_ref::().is_some() { return; } log::error!("[{peer}] TLS handshake error: {err:?}"); ``` note: did not test this. Like that we could avoid unnecessary logging, but don't lose out on potentially actual errors. [1] https://github.com/rust-lang/rust/blob/96064126a086a8428d66e07fb3b91421bb86a512/library/std/src/sys/pal/unix/mod.rs#L235 > Reduce the amount of log noise by ignoring our `ErrorKind::Other` > errors, as those are usually only returned when some monitoring > solution opens and closes a TCP connection for its healthchecks [0]. > > Log timeouts without their error chain (like before), as > there isn't any additional context for them. > > Because any other kinds of errors should be sufficiently rare, log > them with their error chain. The error chain is formatted to stay in > one line, as the default formatting looks quite awkward otherwise. > > Make the context added to the `async_io()` coroutine call more > general and informative, as not only the stream buffer peeking can > fail. Shorten the general log message as well. > > [0]: https://forum.proxmox.com/threads/subject-intermittent-backup-errors-and-repeated-tls-handshake-failures.170089/post-795100 > > Fixes: #6738 > Signed-off-by: Max R. Carrara > --- > > Tested this interactively in my Python console by opening a raw TCP > stream using the `socket` module. > > proxmox-rest-server/src/connection.rs | 29 ++++++++++++++++++++++++--- > 1 file changed, 26 insertions(+), 3 deletions(-) > > diff --git a/proxmox-rest-server/src/connection.rs b/proxmox-rest-server/src/connection.rs > index 9511b7cb..81c0e05e 100644 > --- a/proxmox-rest-server/src/connection.rs > +++ b/proxmox-rest-server/src/connection.rs > @@ -464,7 +464,28 @@ impl AcceptBuilder { > } > } > Err(err) => { > - log::error!("[{peer}] failed to check for TLS handshake: {err}"); > + // Ignore io::Errors of kind "Other", as that means the client didn't send any > + // additional data between stream buffer peeks in our case > + if let Some(io::ErrorKind::Other) = > + err.downcast_ref::().map(io::Error::kind) > + { > + return; > + } > + > + // Timeouts are printed without their cause > + if err.downcast_ref::().is_some() { > + log::error!("[{peer}] TLS handshake error: {err}"); > + return; > + } > + > + // As other kinds of errors are relatively rare, print them with their error chain > + let mut err_chain = err.chain().map(ToString::to_string); > + let err_text = err_chain.by_ref().take(1).collect::>().join(""); > + let cause = err_chain.collect::>().join(" - "); > + > + log::error!( > + "[{peer}] TLS handshake error: {err_text} - Caused by: {cause}" `cause` could be empty, no? we should probably skip the ' - Caused by: ' in that case > + ); > } > } > } > @@ -505,7 +526,7 @@ impl AcceptBuilder { > // if we detect the same peek len again but still got a readable stream, > // the connection was probably closed, so abort here > if peek_len == previous_peek_len { > - Err(io::ErrorKind::ConnectionAborted.into()) > + Err(io::ErrorKind::Other.into()) > } else { > previous_peek_len = peek_len; > Err(io::ErrorKind::WouldBlock.into()) > @@ -516,7 +537,9 @@ impl AcceptBuilder { > } > }) > .await > - .context("couldn't peek into incoming TCP stream") > + .context( > + "failed to determine whether the peer is initiating a TLS handshake or not", > + ) > }; > > tokio::time::timeout(timeout, future) _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel