From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 306596620E for ; Tue, 5 Jan 2021 20:30:09 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2853ACD1F for ; Tue, 5 Jan 2021 20:30:09 +0100 (CET) Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 7257BCD10 for ; Tue, 5 Jan 2021 20:30:07 +0100 (CET) Received: by mail-wm1-x333.google.com with SMTP id e25so610083wme.0 for ; Tue, 05 Jan 2021 11:30:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=reply-to:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=2Zw74un44G4Vok8Qs2sA0d4Z5DRY+vSyPjFdbZ4U1cY=; b=ffEPnX13PQizMuNQBuHE8nP6NjIDG7ETw8jwCiU/GZs+qfHSWuhZTvj2ibob/X/f3N CT0gYXLNpmzG2rJ8+aVBuxrg364x4qBf/4t6DxKepyjEBmVTbKnEbMUg5Jjd3CHByyOn ErjzI38z6LFaTDmB2LU6LwDErQ/yfgs2FqjID/32Nxu7Eec6ymfBZlweqcjT+9Q2eMqx MA5o6TcCCQVkbGLwedU13H/lox8M+s1o+eRKVlblD1bRRjgpreOdxmHmNkT13JMNc/OH /rva4nCxeAcghd0BsvmV7zv8ZMtBfHLnF36qgJNzQLCXuL99+KSWebj7Zd3XnSJVNN5v 4Aag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:reply-to:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=2Zw74un44G4Vok8Qs2sA0d4Z5DRY+vSyPjFdbZ4U1cY=; b=eAha0072ReOQFdlkXSAUUurkVS2te5dvLQZ2oXnNQjyucozxmeLrBJT1sDvVU3LaQa YRj15KrP8MugRhd8Glm+7eJYk+q+57nDT4EGwzrSQ2Trk4I5WVCIrgw2Nj4cndKMlE/9 loKaCRQtfbFWxm5l7eK8UTF85bGxCN6R4paJDiFkk3UUpchakAwWdJA22ZNIe4THJCL1 6zr2LS+teJoVC27/Hn2nfOOYEog5YXBmz0RHlUpHDJMC20tB2cuNKGkA885sH4p3Fwu2 //OgFD0UyHrHxp71Wz/8JSJHWneBI6w5WKdpEqD+c9uUYXfKd07gvE0eT3rmGAwKXNNq YizA== X-Gm-Message-State: AOAM533FVlFna6P9cXwr4FR2zC1X1/EBUSsyfakgPAB+aHae7RzH0qMQ ygdl/C7TGJdzGERusZtWcPq7n6vZ6AE= X-Google-Smtp-Source: ABdhPJwqbVSbAC1vTCnyLxvN59n2lQ169PJUD35Eydr7Q/wXxuG5Y1QNDdpI8g+YYnEaGQUXs4yqTw== X-Received: by 2002:a1c:7c19:: with SMTP id x25mr607291wmc.94.1609875001172; Tue, 05 Jan 2021 11:30:01 -0800 (PST) Received: from ?IPv6:2a02:8070:a3c0:1400:b618:8ccd:66f6:e7a? ([2a02:8070:a3c0:1400:b618:8ccd:66f6:e7a]) by smtp.googlemail.com with ESMTPSA id 138sm189136wma.41.2021.01.05.11.30.00 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 05 Jan 2021 11:30:00 -0800 (PST) Reply-To: uwe.sauter.de@gmail.com To: pve-user@lists.proxmox.com References: <21dec802-c6e8-d395-1444-7b30df5620cd@dkfz-heidelberg.de> <255b8af8-8834-0f24-d9a6-819f2d2cf8c8@dkfz-heidelberg.de> <9811d98a-ebf2-8590-ddd0-3b707ede4a4e@dkfz-heidelberg.de> From: Uwe Sauter Message-ID: Date: Tue, 5 Jan 2021 20:29:59 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <9811d98a-ebf2-8590-ddd0-3b707ede4a4e@dkfz-heidelberg.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: de-DE Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.400 Adjusted score from AWL reputation of From: address DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider KAM_ASCII_DIVIDERS 0.8 Spam that uses ascii formatting tricks NICE_REPLY_A -0.001 Looks like a legit reply (A) RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, ceph.target] Subject: Re: [PVE-User] After update Ceph monitor shows wrong version in UI and is down and out of quorum X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jan 2021 19:30:09 -0000 Frank, Am 05.01.21 um 20:24 schrieb Frank Thommen: > Hi Uwe, > >> did you look into the log of MON and OSD? > > I can't see any specific MON and OSD logs. However the log available in the UI (Ceph -> Log) has lots of messages > regarding scrubbing but no messages regarding issues with starting the monitor > On each host the logs should be in /var/log/ceph. These should be rotated (see /etc/logrotate.d/ceph-common for details). Regards, Uwe > >> Can you provide the list of installed packages of the affected host and the rest of the cluster? > > let me compile the lists and post them somewhere.  They are quite long. > >> >> Is the output of "ceph status" the same for all hosts? > > yes > > Frank > >> >> >> Regards, >> >>      Uwe >> >> Am 05.01.21 um 20:01 schrieb Frank Thommen: >>> >>> On 04.01.21 12:44, Frank Thommen wrote: >>>> >>>> Dear all, >>>> >>>> one of our three PVE hypervisors in the cluster crashed (it was fenced successfully) and rebooted automatically.  I >>>> took the chance to do a complete dist-upgrade and rebooted again. >>>> >>>> The PVE Ceph dashboard now reports, that >>>> >>>>    * the monitor on the host is down (out of quorum), and >>>>    * "A newer version was installed but old version still running, please restart" >>>> >>>> The Ceph UI reports monitor version 14.2.11 while in fact 14.2.16 is installed. The hypervisor has been rebooted >>>> twice since the upgrade, so it should be basically impossible that the old version is still running. >>>> >>>> `systemctl restart ceph.target` and restarting the monitor through the PVE Ceph UI didn't help. The hypervisor is >>>> running PVE 6.3-3 (the other two are running 6.3-2 with monitor 14.2.15) >>>> >>>> What to do in this situation? >>>> >>>> I am happy with either UI or commandline instructions, but I have no Ceph experience besides setting up it up >>>> following the PVE instructions. >>>> >>>> Any help or hint is appreciated. >>>> Cheers, Frank >>> >>> In an attempt to fix the issue I destroyed the monitor through the UI and recreated it.  Unfortunately it can still >>> not be started.  A popup tells me that the monitor has been started, but the overview still shows "stopped" and there >>> is no version number any more. >>> >>> Then I stopped and started Ceph on the node (`pveceph stop; pveceph start`) which resulted in a degraded cluster (1 >>> host down, 7 of 21 OSDs down). OSDs cannot be started through the UI either. >>> >>> I feel extremely uncomfortable with this situation and would appreciate any hint as to how I should proceed with the >>> problem. >>> >>> Cheers, Frank >>> >>> _______________________________________________ >>> pve-user mailing list >>> pve-user@lists.proxmox.com >>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> >> _______________________________________________ >> pve-user mailing list >> pve-user@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user