Hi everyone, For a couple of foundation, I run and manage 4 Proxmox VE nodes with Ceph as main storage for VMs. After changing the hardware of one of the nodes, I had to reinstall it since the new motherboard don't support classic boot and had to change for UEFI. I was able to do that as cleanly as possible: * Successful restoration of SSH keys in /root and node configuration in /var/ lib/pve-cluster * Successful cleanup of former system keys of the reinstalled node in older nodes configuration * Successful restoration of Ceph (yet not simple at all, the documentation IMHO lacks a clean procedure, unless I couldn't find it) => As far as I know, everything works as expected, except the online migration (with or without HA) to the reinstalled machine. Online migration from the reinstalled to older nodes works. ------- The fail happens with the following output: task started by HA resource agent 2022-06-18 13:13:20 starting migration of VM 119 to node 'taal' (192.168.1.251) 2022-06-18 13:13:20 starting VM 119 on remote node 'taal' 2022-06-18 13:13:21 start remote tunnel 2022-06-18 13:13:22 ssh tunnel ver 1 2022-06-18 13:13:22 starting online/live migration on unix:/run/qemu-server/ 119.migrate 2022-06-18 13:13:22 set migration capabilities 2022-06-18 13:13:22 migration downtime limit: 100 ms 2022-06-18 13:13:22 migration cachesize: 64.0 MiB 2022-06-18 13:13:22 set migration parameters 2022-06-18 13:13:22 start migrate command to unix:/run/qemu-server/119.migrate channel 2: open failed: connect failed: open failed 2022-06-18 13:13:23 migration status error: failed 2022-06-18 13:13:23 ERROR: online migrate failure - aborting 2022-06-18 13:13:23 aborting phase 2 - cleanup resources 2022-06-18 13:13:23 migrate_cancel 2022-06-18 13:13:25 ERROR: migration finished with problems (duration 00:00:06) TASK ERROR: migration problems ------ ------ On syslog server I see this: Jun 18 13:13:20 taal qm[4036852]: starting task UPID:taal: 003D98F5:00FA02E3:62ADB350:qmstart:119:root@pam : Jun 18 13:13:20 taal qm[4036853]: start VM 119: UPID:taal: 003D98F5:00FA02E3:62ADB350:qmstart:119:root@pam: Jun 18 13:13:20 pinatubo pmxcfs[1551671]: [status] notice: received log Jun 18 13:13:20 mayon pmxcfs[1039713]: [status] notice: received log Jun 18 13:13:20 ragang pmxcfs[1659329]: [status] notice: received log Jun 18 13:13:20 taal systemd[1]: Started 119.scope. Jun 18 13:13:21 taal systemd-udevd[4036885]: Using default interface naming scheme 'v247'. Jun 18 13:13:21 taal systemd-udevd[4036885]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Jun 18 13:13:21 taal kernel: [163848.311597] device tap119i0 entered promiscuous mode Jun 18 13:13:21 taal kernel: [163848.318750] vmbr0: port 6(tap119i0) entered blocking state Jun 18 13:13:21 taal kernel: [163848.319250] vmbr0: port 6(tap119i0) entered disabled state Jun 18 13:13:21 taal kernel: [163848.319797] vmbr0: port 6(tap119i0) entered blocking state Jun 18 13:13:21 taal kernel: [163848.320320] vmbr0: port 6(tap119i0) entered forwarding state Jun 18 13:13:21 taal systemd-udevd[4036884]: Using default interface naming scheme 'v247'. Jun 18 13:13:21 taal systemd-udevd[4036884]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Jun 18 13:13:21 taal kernel: [163848.711643] device tap119i1 entered promiscuous mode Jun 18 13:13:21 taal kernel: [163848.718476] vmbr0: port 7(tap119i1) entered blocking state Jun 18 13:13:21 taal kernel: [163848.718951] vmbr0: port 7(tap119i1) entered disabled state Jun 18 13:13:21 taal kernel: [163848.719477] vmbr0: port 7(tap119i1) entered blocking state Jun 18 13:13:21 taal kernel: [163848.719982] vmbr0: port 7(tap119i1) entered forwarding state Jun 18 13:13:21 taal qm[4036852]: end task UPID:taal: 003D98F5:00FA02E3:62ADB350:qmstart:119:root@pam: OK Jun 18 13:13:21 ragang pmxcfs[1659329]: [status] notice: received log Jun 18 13:13:21 pinatubo pmxcfs[1551671]: [status] notice: received log Jun 18 13:13:21 mayon pmxcfs[1039713]: [status] notice: received log Jun 18 13:13:21 taal systemd[1]: session-343.scope: Succeeded. Jun 18 13:13:22 taal systemd[1]: Started Session 344 of user root. Jun 18 13:13:22 ragang QEMU[4364]: kvm: Unable to write to socket: Broken pipe Jun 18 13:13:23 taal systemd[1]: Started Session 345 of user root. Jun 18 13:13:24 taal qm[4036985]: starting task UPID:taal:003D997A: 00FA042C:62ADB354:qmstop:119:root@pam: Jun 18 13:13:24 taal qm[4036986]: stop VM 119: UPID:taal:003D997A:00FA042C: 62ADB354:qmstop:119:root@pam: Jun 18 13:13:24 ragang pmxcfs[1659329]: [status] notice: received log Jun 18 13:13:24 pinatubo pmxcfs[1551671]: [status] notice: received log Jun 18 13:13:24 mayon pmxcfs[1039713]: [status] notice: received log Jun 18 13:13:24 taal QEMU[4036862]: kvm: terminating on signal 15 from pid 4036986 (task UPID:taal:003D997A:00FA042C :62ADB354:qmstop:119:root@pam:) Jun 18 13:13:24 taal qm[4036985]: end task UPID:taal:003D997A: 00FA042C:62ADB354:qmstop:119:root@pam: OK Jun 18 13:13:24 pinatubo pmxcfs[1551671]: [status] notice: received log Jun 18 13:13:24 mayon pmxcfs[1039713]: [status] notice: received log Jun 18 13:13:24 ragang pmxcfs[1659329]: [status] notice: received log Jun 18 13:13:24 taal systemd[1]: session-345.scope: Succeeded. Jun 18 13:13:24 taal systemd[1]: session-344.scope: Succeeded. Jun 18 13:13:24 taal kernel: [163851.231170] vmbr0: port 6(tap119i0) entered disabled state Jun 18 13:13:24 taal kernel: [163851.459489] vmbr0: port 7(tap119i1) entered disabled state Jun 18 13:13:24 taal qmeventd[1993]: read: Connection reset by peer Jun 18 13:13:24 taal systemd[1]: 119.scope: Succeeded. Jun 18 13:13:24 taal systemd[1]: 119.scope: Consumed 1.099s CPU time. Jun 18 13:13:25 ragang pve-ha-lrm[2866146]: Task 'UPID:ragang: 002BBBE4:2519FE90:62ADB34F:qmigrate:119:root@pam:' sti ll active, waiting Jun 18 13:13:25 taal systemd[1]: Started Session 346 of user root. Jun 18 13:13:25 taal systemd[1]: session-346.scope: Succeeded. Jun 18 13:13:25 ragang pve-ha-lrm[2866148]: migration problems Jun 18 13:13:25 ragang pve-ha-lrm[2866146]: end task UPID:ragang: 002BBBE4:2519FE90:62ADB34F:qmigrate:119:root@pam: migration problems Jun 18 13:13:25 ragang pve-ha-lrm[2866146]: service vm:119 not moved (migration error) ------ mayon, pinatubo and ragang are the old nodes that didn't change. The reinstalled node is named taal. On those logs, ragang is the origin node initiating the migration. I suspect lrm to be in trouble since on all the node status is shown active, except on taal where it's idle. Restarting pve-ha-lrm service is not fixing the issue. Thank you for any your help. -- Geoffray Levasseur Ingénieur système E-3S http://www.geoffray-levasseur.org GNU/PG public key : 2B3E 4116 769C 609F 0D17 07FD 5BA9 4CC9 E9D5 AC1B Tu patere legem quam ipse fecisti.