From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 678F78707C for ; Wed, 29 Dec 2021 12:17:08 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 5F4EEE8EE for ; Wed, 29 Dec 2021 12:17:08 +0100 (CET) Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 82344E8E3 for ; Wed, 29 Dec 2021 12:17:07 +0100 (CET) Received: by mail-wm1-x335.google.com with SMTP id k66-20020a1ca145000000b00345fa984108so5858636wme.2 for ; Wed, 29 Dec 2021 03:17:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:reply-to:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=v3rEBUO8/JLWPLfp/yGqyWo9aW+8jO5TfbyKrocnsEc=; b=BkHeD1WmaRWyTtktmgCTgwuZzyo71+XIjnVTfxgp9zwdnX0KaCwUsYk5Inz11u8KZu OPfoUtfEDZvWnU0JalI2UQ1zHRlowzRoL5HdtxkOFaNvF0gLb7Dh4zp+gBQluoPLUJk7 XlgbiBteW6zp4V9ZtQY3+qIbpjWzPNeOcsTCG6MjhRFklRYdp9B2/DtkkIhmhv0FvGlF +UB0nmAmcjhZhMOebzrUXXHLMYXPuvu+emueMlZzQd8Ezy5a4SyC+/qnaTFohg7OZxE6 g53WN6HpmWwez7M9eJ7RB8UNH768x7ojjhcEu+qyYVgM+1XbWTMeKuU7zIC2k3tLN0oa UPRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:reply-to :subject:content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=v3rEBUO8/JLWPLfp/yGqyWo9aW+8jO5TfbyKrocnsEc=; b=HOhTC6ce3i7e53lN1hvWXEwYKsAQSnZIAymfDIgccZ8kPoA3tW5b4SHuwwzXSrKyWX MAMERM+A6ff7PasZMXXSie4T0qtGz6/QhUZPMP7B5cS/6dShvz74qMt9nl0P8GVQBN0v 0Uaqf1gKmmvqNgxcn7utv2Ly1GSbZeXuTIz6Q2AFGPX/gzLgDQOlT9pt7ct9D89O8Md3 kk3935irBLfhUOswSu2otEo61nQFGitXAGUmFyFQji6dbnShC/1J5QlqbzaGlJapqMF9 XJm+qFVvGqpuFr4B+1VKZzqSc/HWx24vWLvLJO3FHSM0Ycqfr3TlrMSMvWtEmJU8R4yB g/5g== X-Gm-Message-State: AOAM530amZ+ZeVV7bmrrg3cWVHe2E7Uaya3EFvRt1cc/IngoKk7RL+ke YhU/YXGa7tGmOVHSDEoGDwASxy97TxE= X-Google-Smtp-Source: ABdhPJzPgEUgAILGu+IrBZ/DSBonS47PmH68QEUc/5A3mVCy/JqbzFMMuvuJu2dlJljLyl6ozT3hvQ== X-Received: by 2002:a1c:4d0d:: with SMTP id o13mr21318255wmh.70.1640776621188; Wed, 29 Dec 2021 03:17:01 -0800 (PST) Received: from ?IPV6:2a02:8070:a2a8:1a00:5605:dbff:fe76:161c? ([2a02:8070:a2a8:1a00:5605:dbff:fe76:161c]) by smtp.googlemail.com with ESMTPSA id o5sm3164433wmc.39.2021.12.29.03.17.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Dec 2021 03:17:00 -0800 (PST) Message-ID: <015106bc-726b-da07-c3cf-80b63197b2c7@gmail.com> Date: Wed, 29 Dec 2021 12:16:59 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.1 Reply-To: uwe.sauter.de@gmail.com Content-Language: en-US To: Proxmox VE user list , =?UTF-8?B?0KHQtdGA0LPQtdC5INCm0LDQsdC+0LvQvtCy?= References: <6f23d719-1931-cc81-899d-3202047c4a56@binovo.es> <101971ad-519a-9af2-249e-433df28b1f1a@t8.ru> <0dd27e4e-391d-6262-bbf5-db84229accad@t8.ru> From: Uwe Sauter In-Reply-To: <0dd27e4e-391d-6262-bbf5-db84229accad@t8.ru> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.543 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider KAM_ASCII_DIVIDERS 0.8 Spam that uses ascii formatting tricks NICE_REPLY_A -3.024 Looks like a legit reply (A) RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [PVE-User] [ceph-users] Re: Ceph Usage web and terminal. X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Dec 2021 11:17:08 -0000 Just a feeling but I'd say that the imbalance in OSDs (one host having many more disks than the rest) is your problem. Assuming that your configuration keeps 3 copies of each VM image then the imbalance probably means that 2 of these 3 copies reside on pve-3111 and if this host is unavailable, all VM images with 2 copies on that host become unresponsive, too. Check your failure domain for Ceph and possibly change it from OSD to host. This should prevent that one host holds multiple copies of a VM image. Regards, Uwe Am 29.12.21 um 09:36 schrieb Сергей Цаболов: > Hello to all. > > In my case I have the 7 node cluster Proxmox and working Ceph (ceph version 15.2.15  octopus > (stable)": 7) > > Ceph HEALTH_OK > > ceph -s >   cluster: >     id:     9662e3fa-4ce6-41df-8d74-5deaa41a8dde >     health: HEALTH_OK > >   services: >     mon: 7 daemons, quorum pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 (age 17h) >     mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, pve-3105, pve-3101, pve-3111, > pve-3108 >     mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby >     osd: 22 osds: 22 up (since 17h), 22 in (since 17h) > >   task status: > >   data: >     pools:   4 pools, 1089 pgs >     objects: 1.09M objects, 4.1 TiB >     usage:   7.7 TiB used, 99 TiB / 106 TiB avail >     pgs:     1089 active+clean > > --------------------------------------------------------------------------------------------------------------------- > > > ceph osd tree > > ID   CLASS  WEIGHT     TYPE NAME            STATUS  REWEIGHT PRI-AFF >  -1         106.43005  root default > -13          14.55478      host pve-3101 >  10    hdd    7.27739          osd.10           up   1.00000 1.00000 >  11    hdd    7.27739          osd.11           up   1.00000 1.00000 > -11          14.55478      host pve-3103 >   8    hdd    7.27739          osd.8            up   1.00000 1.00000 >   9    hdd    7.27739          osd.9            up   1.00000 1.00000 >  -3          14.55478      host pve-3105 >   0    hdd    7.27739          osd.0            up   1.00000 1.00000 >   1    hdd    7.27739          osd.1            up   1.00000 1.00000 >  -5          14.55478      host pve-3107 >   2    hdd    7.27739          osd.2            up   1.00000 1.00000 >   3    hdd    7.27739          osd.3            up   1.00000 1.00000 >  -9          14.55478      host pve-3108 >   6    hdd    7.27739          osd.6            up   1.00000 1.00000 >   7    hdd    7.27739          osd.7            up   1.00000 1.00000 >  -7          14.55478      host pve-3109 >   4    hdd    7.27739          osd.4            up   1.00000 1.00000 >   5    hdd    7.27739          osd.5            up   1.00000 1.00000 > -15          19.10138      host pve-3111 >  12    hdd   10.91409          osd.12           up   1.00000 1.00000 >  13    hdd    0.90970          osd.13           up   1.00000 1.00000 >  14    hdd    0.90970          osd.14           up   1.00000 1.00000 >  15    hdd    0.90970          osd.15           up   1.00000 1.00000 >  16    hdd    0.90970          osd.16           up   1.00000 1.00000 >  17    hdd    0.90970          osd.17           up   1.00000 1.00000 >  18    hdd    0.90970          osd.18           up   1.00000 1.00000 >  19    hdd    0.90970          osd.19           up   1.00000 1.00000 >  20    hdd    0.90970          osd.20           up   1.00000 1.00000 >  21    hdd    0.90970          osd.21           up   1.00000 1.00000 > > --------------------------------------------------------------------------------------------------------------- > > > POOL                               ID  PGS   STORED   OBJECTS USED     %USED  MAX AVAIL > vm.pool                            2  1024  3.0 TiB  863.31k  6.0 TiB   6.38     44 TiB  (this pool > have the all VM disk) > > --------------------------------------------------------------------------------------------------------------- > > > ceph osd map vm.pool vm.pool.object > osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 (2.d5) -> up ([2,4], p2) > acting ([2,4], p2) > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > pveversion -v > proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve) > pve-manager: 6.4-13 (running version: 6.4-13/9f411e79) > pve-kernel-helper: 6.4-8 > pve-kernel-5.4: 6.4-7 > pve-kernel-5.4.143-1-pve: 5.4.143-1 > pve-kernel-5.4.106-1-pve: 5.4.106-1 > ceph: 15.2.15-pve1~bpo10 > ceph-fuse: 15.2.15-pve1~bpo10 > corosync: 3.1.2-pve1 > criu: 3.11-3 > glusterfs-client: 5.5-3 > ifupdown: residual config > ifupdown2: 3.0.0-1+pve4~bpo10 > ksm-control-daemon: 1.3-1 > libjs-extjs: 6.0.1-10 > libknet1: 1.22-pve1~bpo10+1 > libproxmox-acme-perl: 1.1.0 > libproxmox-backup-qemu0: 1.1.0-1 > libpve-access-control: 6.4-3 > libpve-apiclient-perl: 3.1-3 > libpve-common-perl: 6.4-4 > libpve-guest-common-perl: 3.1-5 > libpve-http-server-perl: 3.2-3 > libpve-storage-perl: 6.4-1 > libqb0: 1.0.5-1 > libspice-server1: 0.14.2-4~pve6+1 > lvm2: 2.03.02-pve4 > lxc-pve: 4.0.6-2 > lxcfs: 4.0.6-pve1 > novnc-pve: 1.1.0-1 > proxmox-backup-client: 1.1.13-2 > proxmox-mini-journalreader: 1.1-1 > proxmox-widget-toolkit: 2.6-1 > pve-cluster: 6.4-1 > pve-container: 3.3-6 > pve-docs: 6.4-2 > pve-edk2-firmware: 2.20200531-1 > pve-firewall: 4.1-4 > pve-firmware: 3.3-2 > pve-ha-manager: 3.1-1 > pve-i18n: 2.3-1 > pve-qemu-kvm: 5.2.0-6 > pve-xtermjs: 4.7.0-3 > qemu-server: 6.4-2 > smartmontools: 7.2-pve2 > spiceterm: 3.1-1 > vncterm: 1.6-2 > zfsutils-linux: 2.0.6-pve1~bpo10+1 > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > And now my problem: > > For all VM I have one pool for VM disks > > When  node/host pve-3111  is shutdown in many of other nodes/hosts pve-3107, pve-3105  VM not > shutdown but not available in network. > > After the node/host is up Ceph back to HEALTH_OK and the all VM back to access in Network (without > reboot). > > Can some one to suggest me what I can to check in Ceph ? > > Thanks. >