public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
From: Frank Thommen <f.thommen@dkfz-heidelberg.de>
To: pve-user@lists.proxmox.com
Subject: Re: [PVE-User] After update Ceph monitor shows wrong version in UI and is down and out of quorum
Date: Tue, 5 Jan 2021 20:44:36 +0100	[thread overview]
Message-ID: <cca8ce9d-2de2-3fb5-0522-3b7bb0f4132c@dkfz-heidelberg.de> (raw)
In-Reply-To: <ccbcca68-59fc-944e-d90c-c26ae20b17e5@gmail.com>



On 05.01.21 20:29, Uwe Sauter wrote:
> Frank,
> 
> Am 05.01.21 um 20:24 schrieb Frank Thommen:
>> Hi Uwe,
>>
>>> did you look into the log of MON and OSD?
>>
>> I can't see any specific MON and OSD logs. However the log available 
>> in the UI (Ceph -> Log) has lots of messages regarding scrubbing but 
>> no messages regarding issues with starting the monitor
>>
> 
> On each host the logs should be in /var/log/ceph. These should be 
> rotated (see /etc/logrotate.d/ceph-common for details).

ok.  I see lots of

-----------------------
2021-01-05 20:38:05.900 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:07.208 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:08.688 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:08.744 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:09.092 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:12.268 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:12.468 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:12.964 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:15.752 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:17.440 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:19.388 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:19.468 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:22.712 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
2021-01-05 20:38:22.828 7f979e753700  1 mon.odcf-pve02@-1(probing) e4 
handle_auth_request failed to assign global_id
-----------------------

in the mon log on the problematic host.

When (unsuccessfully) starting the monitor through the UI, the following 
entries appear in ceph.audit.log:

-----------------------
2021-01-05 20:40:07.635369 mon.odcf-pve03 (mon.1) 288082 : audit [DBG] 
from='client.? 192.168.255.2:0/2418486168' entity='client.admin' 
cmd=[{"format":"json","prefix":"mgr metadata"}]: dispatch
2021-01-05 20:40:07.636592 mon.odcf-pve03 (mon.1) 288083 : audit [DBG] 
from='client.? 192.168.255.2:0/2418486168' entity='client.admin' 
cmd=[{"format":"json","prefix":"mgr dump"}]: dispatch
2021-01-05 20:40:08.296793 mon.odcf-pve03 (mon.1) 288084 : audit [DBG] 
from='client.? 192.168.255.2:0/778781756' entity='client.admin' 
cmd=[{"format":"json","prefix":"mon metadata"}]: dispatch
2021-01-05 20:40:08.297767 mon.odcf-pve03 (mon.1) 288085 : audit [DBG] 
from='client.? 192.168.255.2:0/778781756' entity='client.admin' 
cmd=[{"prefix":"quorum_status","format":"json"}]: dispatch
2021-01-05 20:40:08.436982 mon.odcf-pve01 (mon.0) 389632 : audit [DBG] 
from='client.? 192.168.255.2:0/784579843' entity='client.admin' 
cmd=[{"format":"json","prefix":"df"}]: dispatch
-----------------------

192.168.255.2 is the IP number of the problematic host in the Ceph mesh 
network. odcf-pve01 and odcf-pve03 are the "good" nodes.

However I am not sure, what kind of information I should look for in the 
logs

Frank

> 
> Regards,
> 
>      Uwe
> 
> 
> 
>>
>>> Can you provide the list of installed packages of the affected host 
>>> and the rest of the cluster?
>>
>> let me compile the lists and post them somewhere.  They are quite long.
>>
>>>
>>> Is the output of "ceph status" the same for all hosts?
>>
>> yes
>>
>> Frank
>>
>>>
>>>
>>> Regards,
>>>
>>>      Uwe
>>>
>>> Am 05.01.21 um 20:01 schrieb Frank Thommen:
>>>>
>>>> On 04.01.21 12:44, Frank Thommen wrote:
>>>>>
>>>>> Dear all,
>>>>>
>>>>> one of our three PVE hypervisors in the cluster crashed (it was 
>>>>> fenced successfully) and rebooted automatically.  I took the chance 
>>>>> to do a complete dist-upgrade and rebooted again.
>>>>>
>>>>> The PVE Ceph dashboard now reports, that
>>>>>
>>>>>    * the monitor on the host is down (out of quorum), and
>>>>>    * "A newer version was installed but old version still running, 
>>>>> please restart"
>>>>>
>>>>> The Ceph UI reports monitor version 14.2.11 while in fact 14.2.16 
>>>>> is installed. The hypervisor has been rebooted twice since the 
>>>>> upgrade, so it should be basically impossible that the old version 
>>>>> is still running.
>>>>>
>>>>> `systemctl restart ceph.target` and restarting the monitor through 
>>>>> the PVE Ceph UI didn't help. The hypervisor is running PVE 6.3-3 
>>>>> (the other two are running 6.3-2 with monitor 14.2.15)
>>>>>
>>>>> What to do in this situation?
>>>>>
>>>>> I am happy with either UI or commandline instructions, but I have 
>>>>> no Ceph experience besides setting up it up following the PVE 
>>>>> instructions.
>>>>>
>>>>> Any help or hint is appreciated.
>>>>> Cheers, Frank
>>>>
>>>> In an attempt to fix the issue I destroyed the monitor through the 
>>>> UI and recreated it.  Unfortunately it can still not be started.  A 
>>>> popup tells me that the monitor has been started, but the overview 
>>>> still shows "stopped" and there is no version number any more.
>>>>
>>>> Then I stopped and started Ceph on the node (`pveceph stop; pveceph 
>>>> start`) which resulted in a degraded cluster (1 host down, 7 of 21 
>>>> OSDs down). OSDs cannot be started through the UI either.
>>>>
>>>> I feel extremely uncomfortable with this situation and would 
>>>> appreciate any hint as to how I should proceed with the problem.
>>>>
>>>> Cheers, Frank
>>>>
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-user@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user



  reply	other threads:[~2021-01-05 19:45 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-04 11:44 Frank Thommen
2021-01-05 19:01 ` Frank Thommen
2021-01-05 19:08   ` Frank Thommen
2021-01-05 19:10   ` Uwe Sauter
2021-01-05 19:24     ` Frank Thommen
2021-01-05 19:29       ` Uwe Sauter
2021-01-05 19:44         ` Frank Thommen [this message]
     [not found]           ` <f3ca5f88-5cbd-9807-02d7-d8f24fcbefdb@gmail.com>
2021-01-05 20:17             ` Frank Thommen
2021-01-08 10:36               ` Frank Thommen
2021-01-08 10:45                 ` Uwe Sauter
2021-01-08 11:05                   ` Frank Thommen
2021-01-08 11:27                     ` Peter Simon
2021-01-08 11:44                       ` Frank Thommen
2021-01-08 11:57                         ` Peter Simon
2021-01-08 12:01                         ` Frank Thommen
2021-01-16 12:26                           ` Frank Thommen
     [not found]           ` <058f3eca-2e6f-eead-365a-4d451fa160d3@gmail.com>
2021-01-05 20:18             ` Frank Thommen
2021-01-05 19:35       ` Frank Thommen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cca8ce9d-2de2-3fb5-0522-3b7bb0f4132c@dkfz-heidelberg.de \
    --to=f.thommen@dkfz-heidelberg.de \
    --cc=pve-user@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal