public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH pve-http-server v2 0/1] fix unexpected EOF for client when closing TLS session
@ 2025-03-05 21:44 Rob Rozestraten via pve-devel
  0 siblings, 0 replies; only message in thread
From: Rob Rozestraten via pve-devel @ 2025-03-05 21:44 UTC (permalink / raw)
  To: pve-devel; +Cc: Rob Rozestraten

[-- Attachment #1: Type: message/rfc822, Size: 12580 bytes --]

From: Rob Rozestraten <admin@truthsolo.net>
To: pve-devel@lists.proxmox.com
Subject: [PATCH pve-http-server v2 0/1] fix unexpected EOF for client when closing TLS session
Date: Wed, 05 Mar 2025 21:44:55 +0000
Message-ID: <20250305214447.128975-1-admin@truthsolo.net>

> A little bit of analysis of why this is needed and okay would be great to have here in the commit message ;)

Thanks for the feedback, Fabian! First time with git-email..

I'm adding details with logs here, and a detailed description of the
issue & solution in the commit.

> nit: the if is not needed, as `stoptls` only does anything anyway if $hdl has a FH and is a TLS-enabled handle ;)

Thanks! :D

I added some debugging to AnyEvent.pm. Here is a curl request that returns 200 OK:

1741209019.890976 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): ACCEPT FH10 CONN1
1741209019.891069 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): Setting TLS to autostart
::ffff:10.36.88.28 - root@pam!katello_test [05/03/2025:14:10:19 -0700] "GET /api2/json/cluster HTTP/1.1" 200 360
1741209019.916084 worker[1636452]: PVE::APIServer::AnyEvent +423: finish_response: KEEPALIVE 99
1741209019.916693 worker[1636452]: AnyEvent::Handle +1350: __ANON__: EOF accept_connections()
1741209019.916716 worker[1636452]: PVE::APIServer::AnyEvent +1951: (eval): >>> accept_connections().on_eof
1741209019.916756 worker[1636452]: PVE::APIServer::AnyEvent +1954: client_do_disconnect: START client_do_disconnect()
1741209019.916827 worker[1636452]: PVE::APIServer::AnyEvent +1954: client_do_disconnect: close connection AnyEvent::Handle=HASH(0x595788db52a8)                                                                                               
1741209019.916845 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> shutdown($hdl->{fh}, 1)
1741209019.916889 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> ok
1741209019.916920 worker[1636452]: PVE::APIServer::AnyEvent +1954: client_do_disconnect: CLOSE FH10 CONN0

A curl request that returns 400: 

1741209023.423134 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): ACCEPT FH10 CONN1
1741209023.423207 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): Setting TLS to autostart
::ffff:10.36.88.28 - root@pam!katello_test [05/03/2025:14:10:23 -0700] "GET /api2/json/cluster/nextid?vmid=106 HTTP/1.1" 400 55                                                                                                               
1741209023.439439 worker[1636452]: PVE::APIServer::AnyEvent +423: finish_response: DRAIN finish_response()
1741209023.439486 worker[1636452]: PVE::APIServer::AnyEvent +219: (eval): >>> finish_response().on_drain
1741209023.439547 worker[1636452]: PVE::APIServer::AnyEvent +222: client_do_disconnect: START client_do_disconnect()
1741209023.439591 worker[1636452]: PVE::APIServer::AnyEvent +222: client_do_disconnect: close connection AnyEvent::Handle=HASH(0x595788de2318)                                                                                                
1741209023.439610 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> shutdown($hdl->{fh}, 1)
1741209023.439648 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> ok
1741209023.439682 worker[1636452]: PVE::APIServer::AnyEvent +222: client_do_disconnect: CLOSE FH10 CONN0

Requests returning 400 send "Connection: close" and tear down the socket immediately, resulting in an unexpected EOF at the client. Requests returning 200 send "Connection: keep-alive" and have a 5 second timeout.

If we make a request but let the connection sit idle, the 5 second timeout is reached, and we receive unexpected EOF again:

1741209026.694101 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): ACCEPT FH10 CONN1
1741209026.694173 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): Setting TLS to autostart
::ffff:10.36.88.28 - root@pam!katello_test [05/03/2025:14:10:26 -0700] "GET /api2/json/cluster HTTP/1.1" 200 360
1741209026.698288 worker[1636452]: PVE::APIServer::AnyEvent +423: finish_response: KEEPALIVE 99
1741209031.699939 worker[1636452]: AnyEvent::Handle +655: __ANON__: ERROR accept_connections()
1741209031.700154 worker[1636452]: PVE::APIServer::AnyEvent +1962: (eval): >>> accept_connections().on_error
1741209031.700186 worker[1636452]: PVE::APIServer::AnyEvent +1969: client_do_disconnect: START client_do_disconnect()
1741209031.700212 worker[1636452]: PVE::APIServer::AnyEvent +1969: client_do_disconnect: close connection AnyEvent::Handle=HASH(0x595788de1e68)
1741209031.700240 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> shutdown($hdl->{fh}, 1)
1741209031.700362 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> ok
1741209031.700408 worker[1636452]: PVE::APIServer::AnyEvent +1969: client_do_disconnect: CLOSE FH10 CONN0

To fix this, we can send a TLS close notify before shutting down the socket.

If we do stoptls() before clearing the handlers, however, we enter client_do_disconnect() again:

1741209459.761847 worker[1639023]: PVE::APIServer::AnyEvent +1910: (eval): ACCEPT FH10 CONN1
1741209459.761948 worker[1639023]: PVE::APIServer::AnyEvent +1910: (eval): Setting TLS to autostart
::ffff:10.36.88.28 - root@pam!katello_test [05/03/2025:14:17:39 -0700] "GET /api2/json/cluster/nextid?vmid=106 HTTP/1.1" 400 55                                                                                                               
1741209459.787932 worker[1639023]: PVE::APIServer::AnyEvent +423: finish_response: DRAIN finish_response()
1741209459.788003 worker[1639023]: PVE::APIServer::AnyEvent +219: (eval): >>> finish_response().on_drain
1741209459.788030 worker[1639023]: PVE::APIServer::AnyEvent +222: client_do_disconnect: START client_do_disconnect()
1741209459.788076 worker[1639023]: PVE::APIServer::AnyEvent +222: client_do_disconnect: close connection AnyEvent::Handle=HASH(0x603027214400)                                                                                                
1741209459.788130 worker[1639023]: PVE::APIServer::AnyEvent +191: __ANON__: >>> $hdl->stoptls()
1741209459.788221 worker[1639023]: PVE::APIServer::AnyEvent +219: (eval): >>> finish_response().on_drain
1741209459.788248 worker[1639023]: PVE::APIServer::AnyEvent +222: client_do_disconnect: START client_do_disconnect()
1741209459.788493 worker[1639023]: PVE::APIServer::AnyEvent +222: client_do_disconnect: ERROR: detected empty handle
1741209459.788553 worker[1639023]: PVE::APIServer::AnyEvent +191: __ANON__: >>> shutdown($hdl->{fh}, 1)
1741209459.788593 worker[1639023]: PVE::APIServer::AnyEvent +191: __ANON__: >>> ok
1741209459.788649 worker[1639023]: PVE::APIServer::AnyEvent +222: client_do_disconnect: CLOSE FH10 CONN0

So we move these after the clearing of handlers. This was fixed by commit f737984 for some cases, perhaps for all cases by clearing handlers prior to shutdown?

> If I read the docs right, this could block (would that be an issue here?) and could potentially destroy the handle (so that might need to be rechecked afterwards to prevent spurious warnings?)
> 
> what happens if we initiate the teardown, and the client never acks it?

I don't see this blocking in my observations. The server continues with socket teardown immediately after sending TLS close notify. If no response is received from the client, it remains in FIN_WAIT1 until that timeout is reached. This was tested by making a request with keepalive then immediately blocking all outbound on the client prior to reaching the 5 second timeout:

1741209811.245870 worker[1640562]: PVE::APIServer::AnyEvent +1910: (eval): ACCEPT FH10 CONN1
1741209811.245963 worker[1640562]: PVE::APIServer::AnyEvent +1910: (eval): Setting TLS to autostart
::ffff:10.36.88.28 - root@pam!katello_test [05/03/2025:14:23:31 -0700] "GET /api2/json/cluster HTTP/1.1" 200 360
1741209811.260657 worker[1640562]: PVE::APIServer::AnyEvent +423: finish_response: KEEPALIVE 99
1741209816.262383 worker[1640562]: AnyEvent::Handle +655: __ANON__: ERROR accept_connections()
1741209816.262532 worker[1640562]: PVE::APIServer::AnyEvent +1962: (eval): >>> accept_connections().on_error
1741209816.262631 worker[1640562]: PVE::APIServer::AnyEvent +1969: client_do_disconnect: START client_do_disconnect()
1741209816.262706 worker[1640562]: PVE::APIServer::AnyEvent +1969: client_do_disconnect: close connection AnyEvent::Handle=HASH(0x5eede90cc068)                                                                                               
1741209816.262750 worker[1640562]: PVE::APIServer::AnyEvent +191: __ANON__: >>> $hdl->stoptls()
1741209816.262924 worker[1640562]: PVE::APIServer::AnyEvent +191: __ANON__: >>> shutdown($hdl->{fh}, 1)
1741209816.262975 worker[1640562]: PVE::APIServer::AnyEvent +191: __ANON__: >>> ok
1741209816.262997 worker[1640562]: PVE::APIServer::AnyEvent +1969: client_do_disconnect: CLOSE FH10 CONN0

There is almost no time between stoptls() and shutdown(). We can see the client never responded by inspecting the connection on the server:

FIN-WAIT-1 0      25     [::ffff:10.36.88.27]:8006  [::ffff:10.36.88.28]:48382

As for the handle being destroyed, it gets deleted in client_do_disconnect() and I do not see it being used beyond there. I observed no warnings during testing, but it's certainly possible I missed something. If there are any additional tests I can do here, please let me know!

cheers,

Rob Rozestraten (1):
  fix unexpected EOF for client when closing TLS session

 src/PVE/APIServer/AnyEvent.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.48.1




[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-03-05 21:46 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-05 21:44 [pve-devel] [PATCH pve-http-server v2 0/1] fix unexpected EOF for client when closing TLS session Rob Rozestraten via pve-devel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal