public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
From: Geoffray Levasseur <fatalerrors@geoffray-levasseur.org>
To: pve-user@lists.proxmox.com
Subject: [PVE-User] Live migration fail from old nodes to freshly installed one
Date: Sat, 18 Jun 2022 13:29:05 +0200	[thread overview]
Message-ID: <3555362.dWV9SEqChM@isarog> (raw)

[-- Attachment #1: Type: text/plain, Size: 7152 bytes --]

Hi everyone,

For a couple of foundation, I run and manage 4 Proxmox VE nodes with Ceph as 
main storage for VMs. After changing the hardware of one of the nodes, I had 
to reinstall it since the new motherboard don't support classic boot and had 
to change for UEFI. I was able to do that as cleanly as possible:

* Successful restoration of SSH keys in /root and node configuration in /var/
lib/pve-cluster
* Successful cleanup of former system keys of the reinstalled node in older 
nodes configuration
* Successful restoration of Ceph (yet not simple at all, the documentation 
IMHO lacks a clean procedure, unless I couldn't find it)

=> As far as I know, everything works as expected, except the online migration 
(with or without HA) to the reinstalled machine. Online migration from the 
reinstalled to older nodes works.

------- The fail happens with the following output:
task started by HA resource agent
2022-06-18 13:13:20 starting migration of VM 119 to node 'taal' 
(192.168.1.251)
2022-06-18 13:13:20 starting VM 119 on remote node 'taal'
2022-06-18 13:13:21 start remote tunnel
2022-06-18 13:13:22 ssh tunnel ver 1
2022-06-18 13:13:22 starting online/live migration on unix:/run/qemu-server/
119.migrate
2022-06-18 13:13:22 set migration capabilities
2022-06-18 13:13:22 migration downtime limit: 100 ms
2022-06-18 13:13:22 migration cachesize: 64.0 MiB
2022-06-18 13:13:22 set migration parameters
2022-06-18 13:13:22 start migrate command to unix:/run/qemu-server/119.migrate
channel 2: open failed: connect failed: open failed

2022-06-18 13:13:23 migration status error: failed
2022-06-18 13:13:23 ERROR: online migrate failure - aborting
2022-06-18 13:13:23 aborting phase 2 - cleanup resources
2022-06-18 13:13:23 migrate_cancel
2022-06-18 13:13:25 ERROR: migration finished with problems (duration 
00:00:06)
TASK ERROR: migration problems
------

------ On syslog server I see this:
Jun 18 13:13:20 taal qm[4036852]: <root@pam> starting task UPID:taal:
003D98F5:00FA02E3:62ADB350:qmstart:119:root@pam
:
Jun 18 13:13:20 taal qm[4036853]: start VM 119: UPID:taal:
003D98F5:00FA02E3:62ADB350:qmstart:119:root@pam:
Jun 18 13:13:20 pinatubo pmxcfs[1551671]: [status] notice: received log
Jun 18 13:13:20 mayon pmxcfs[1039713]: [status] notice: received log
Jun 18 13:13:20 ragang pmxcfs[1659329]: [status] notice: received log
Jun 18 13:13:20 taal systemd[1]: Started 119.scope.
Jun 18 13:13:21 taal systemd-udevd[4036885]: Using default interface naming 
scheme 'v247'.
Jun 18 13:13:21 taal systemd-udevd[4036885]: ethtool: autonegotiation is unset 
or enabled, the speed and duplex are 
not writable.
Jun 18 13:13:21 taal kernel: [163848.311597] device tap119i0 entered 
promiscuous mode
Jun 18 13:13:21 taal kernel: [163848.318750] vmbr0: port 6(tap119i0) entered 
blocking state
Jun 18 13:13:21 taal kernel: [163848.319250] vmbr0: port 6(tap119i0) entered 
disabled state
Jun 18 13:13:21 taal kernel: [163848.319797] vmbr0: port 6(tap119i0) entered 
blocking state
Jun 18 13:13:21 taal kernel: [163848.320320] vmbr0: port 6(tap119i0) entered 
forwarding state
Jun 18 13:13:21 taal systemd-udevd[4036884]: Using default interface naming 
scheme 'v247'.
Jun 18 13:13:21 taal systemd-udevd[4036884]: ethtool: autonegotiation is unset 
or enabled, the speed and duplex are 
not writable.
Jun 18 13:13:21 taal kernel: [163848.711643] device tap119i1 entered 
promiscuous mode
Jun 18 13:13:21 taal kernel: [163848.718476] vmbr0: port 7(tap119i1) entered 
blocking state
Jun 18 13:13:21 taal kernel: [163848.718951] vmbr0: port 7(tap119i1) entered 
disabled state
Jun 18 13:13:21 taal kernel: [163848.719477] vmbr0: port 7(tap119i1) entered 
blocking state
Jun 18 13:13:21 taal kernel: [163848.719982] vmbr0: port 7(tap119i1) entered 
forwarding state
Jun 18 13:13:21 taal qm[4036852]: <root@pam> end task UPID:taal:
003D98F5:00FA02E3:62ADB350:qmstart:119:root@pam: OK
Jun 18 13:13:21 ragang pmxcfs[1659329]: [status] notice: received log
Jun 18 13:13:21 pinatubo pmxcfs[1551671]: [status] notice: received log
Jun 18 13:13:21 mayon pmxcfs[1039713]: [status] notice: received log
Jun 18 13:13:21 taal systemd[1]: session-343.scope: Succeeded.
Jun 18 13:13:22 taal systemd[1]: Started Session 344 of user root.
Jun 18 13:13:22 ragang QEMU[4364]: kvm: Unable to write to socket: Broken pipe
Jun 18 13:13:23 taal systemd[1]: Started Session 345 of user root.
Jun 18 13:13:24 taal qm[4036985]: <root@pam> starting task UPID:taal:003D997A:
00FA042C:62ADB354:qmstop:119:root@pam:
Jun 18 13:13:24 taal qm[4036986]: stop VM 119: UPID:taal:003D997A:00FA042C:
62ADB354:qmstop:119:root@pam:
Jun 18 13:13:24 ragang pmxcfs[1659329]: [status] notice: received log
Jun 18 13:13:24 pinatubo pmxcfs[1551671]: [status] notice: received log
Jun 18 13:13:24 mayon pmxcfs[1039713]: [status] notice: received log
Jun 18 13:13:24 taal QEMU[4036862]: kvm: terminating on signal 15 from pid 
4036986 (task UPID:taal:003D997A:00FA042C
:62ADB354:qmstop:119:root@pam:)
Jun 18 13:13:24 taal qm[4036985]: <root@pam> end task UPID:taal:003D997A:
00FA042C:62ADB354:qmstop:119:root@pam: OK
Jun 18 13:13:24 pinatubo pmxcfs[1551671]: [status] notice: received log
Jun 18 13:13:24 mayon pmxcfs[1039713]: [status] notice: received log
Jun 18 13:13:24 ragang pmxcfs[1659329]: [status] notice: received log
Jun 18 13:13:24 taal systemd[1]: session-345.scope: Succeeded.
Jun 18 13:13:24 taal systemd[1]: session-344.scope: Succeeded.
Jun 18 13:13:24 taal kernel: [163851.231170] vmbr0: port 6(tap119i0) entered 
disabled state
Jun 18 13:13:24 taal kernel: [163851.459489] vmbr0: port 7(tap119i1) entered 
disabled state
Jun 18 13:13:24 taal qmeventd[1993]: read: Connection reset by peer
Jun 18 13:13:24 taal systemd[1]: 119.scope: Succeeded.
Jun 18 13:13:24 taal systemd[1]: 119.scope: Consumed 1.099s CPU time.
Jun 18 13:13:25 ragang pve-ha-lrm[2866146]: Task 'UPID:ragang:
002BBBE4:2519FE90:62ADB34F:qmigrate:119:root@pam:' sti
ll active, waiting
Jun 18 13:13:25 taal systemd[1]: Started Session 346 of user root.
Jun 18 13:13:25 taal systemd[1]: session-346.scope: Succeeded.
Jun 18 13:13:25 ragang pve-ha-lrm[2866148]: migration problems
Jun 18 13:13:25 ragang pve-ha-lrm[2866146]: <root@pam> end task UPID:ragang:
002BBBE4:2519FE90:62ADB34F:qmigrate:119:root@pam: migration problems
Jun 18 13:13:25 ragang pve-ha-lrm[2866146]: service vm:119 not moved 
(migration error)
------

mayon, pinatubo and ragang are the old nodes that didn't change. The 
reinstalled node is named taal. On those logs, ragang is the origin node 
initiating the migration.

I suspect lrm to be in trouble since on all the node status is shown active, 
except on taal where it's idle. Restarting pve-ha-lrm service is not fixing 
the issue.

Thank you for any your help.
-- 
Geoffray Levasseur
Ingénieur système E-3S
        <geoffray.levasseur@e-3s.com>
        <fatalerrors@geoffray-levasseur.org>
        http://www.geoffray-levasseur.org
GNU/PG public key : 2B3E 4116 769C 609F 0D17  07FD 5BA9 4CC9 E9D5 AC1B
Tu patere legem quam ipse fecisti.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

                 reply	other threads:[~2022-06-18 11:38 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3555362.dWV9SEqChM@isarog \
    --to=fatalerrors@geoffray-levasseur.org \
    --cc=pve-user@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal