From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id B5E72871B2 for ; Wed, 29 Dec 2021 15:14:05 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A4CBEFFDD for ; Wed, 29 Dec 2021 15:13:35 +0100 (CET) Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 2AB80FFD2 for ; Wed, 29 Dec 2021 15:13:32 +0100 (CET) Received: by mail-wm1-x333.google.com with SMTP id a203-20020a1c7fd4000000b003457874263aso14511139wmd.2 for ; Wed, 29 Dec 2021 06:13:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:reply-to:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=YpHigRQEAvZwM11qcXxFpQ+NhWbwk1yitVI+IiuOuT8=; b=H0AfPepGm2UsgE1kP38/Z/m+lopJJ5OtXKUjFaYFmdFKmqQsRO5b7yRrw8C/z2WMpG 1Ca8zpX6gM+4HqAK7JXBgXrYYyPq402MRmVqg0NTfUAQcMcTCv0Yi9SPElphO7ZFbURq IlskLhOjeRlajAiazBmfo7ZrWpY6e72Oe1uyOd73g3/ZCF25aIGyzmMQ4mQdWSO3Wkpf G8OwOAr2NYXJqv8PCwIitqE0xSBf8V2zmqTGg73Dr5DNLlfSRpXytDy+oPKuGZXWNxVm 26+KLcXKZyLhiCC44+M9vM1XfXBwG76YsoTXyYtiT+lWNSoswRWnyHoClvJ7Uil95B+w gIpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:reply-to :subject:content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=YpHigRQEAvZwM11qcXxFpQ+NhWbwk1yitVI+IiuOuT8=; b=qOaKR4QHB4rjfMnPl5TDUcMJL3iJJ4aNoY+ylDU4q/OwLo8wKmNQ9O35aAIAQ/rv08 0eOLMPOVcsU5CS2zmuDXhOcPOrrTtaCmkPM3PZgxoVwtTS0E4N2Cr3Auq0chYFg2yEhg ZwnxWL/LTQKiahFjiDA3llGJ681pcubCyHZ/516yyAuiCBgAgnJthrVBqdpI0KE/WnOv FGAFPED2lyIsI+xJGKKnT03YsBvTX927FlqNv8v6bLJlpfvSzd71mOuaHaHYkgiC1q2E UDEr3QXegzvHPx5eyynBwo8EAEu+BkrBew1sKFj5kntE+OVQMCQADIIQehiIQWNoGKbY ZP8Q== X-Gm-Message-State: AOAM531sdB04cg6abF4LtxFJP49H6IQ7xVbfi0JTq6E375mscCYoCSWA iApkgv7yAdyVU2MQVnllHlwyZxTG1KI= X-Google-Smtp-Source: ABdhPJw6ZNTWH+IpmfGd+TjfuFGNrNLehQPfKvm0hBhFFVKcFLCtwh3l1kiKIwRk7Kn8WYBqxNtA3Q== X-Received: by 2002:a1c:808f:: with SMTP id b137mr21816619wmd.6.1640787205625; Wed, 29 Dec 2021 06:13:25 -0800 (PST) Received: from ?IPV6:2a02:8070:a2a8:1a00:5605:dbff:fe76:161c? ([2a02:8070:a2a8:1a00:5605:dbff:fe76:161c]) by smtp.googlemail.com with ESMTPSA id p1sm23656313wma.42.2021.12.29.06.13.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Dec 2021 06:13:25 -0800 (PST) Message-ID: <37683acd-0874-2955-4ac0-77ad7d03c67f@gmail.com> Date: Wed, 29 Dec 2021 15:13:23 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.1 Reply-To: uwe.sauter.de@gmail.com Content-Language: de-DE To: =?UTF-8?B?0KHQtdGA0LPQtdC5INCm0LDQsdC+0LvQvtCy?= , Proxmox VE user list References: <6f23d719-1931-cc81-899d-3202047c4a56@binovo.es> <101971ad-519a-9af2-249e-433df28b1f1a@t8.ru> <0dd27e4e-391d-6262-bbf5-db84229accad@t8.ru> <015106bc-726b-da07-c3cf-80b63197b2c7@gmail.com> <216fd781-c35a-6e99-2662-6fe6378adc23@t8.ru> <550c21eb-5371-6f3e-f1f4-bccbc6b5384b@gmail.com> <131ea5ec-89df-4c90-5808-451c33abbb05@t8.ru> From: Uwe Sauter In-Reply-To: <131ea5ec-89df-4c90-5808-451c33abbb05@t8.ru> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.407 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider KAM_ASCII_DIVIDERS 0.8 Spam that uses ascii formatting tricks NICE_REPLY_A -3.024 Looks like a legit reply (A) RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [ceph.com] Subject: Re: [PVE-User] [ceph-users] Re: Ceph Usage web and terminal. X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Dec 2021 14:14:05 -0000 Am 29.12.21 um 15:06 schrieb Сергей Цаболов: > Ok,  I understand the case. > > 29.12.2021 16:13, Uwe Sauter пишет: >> Am 29.12.21 um 13:51 schrieb Сергей Цаболов: >>> Hi, Uwe >>> >>> 29.12.2021 14:16, Uwe Sauter пишет: >>>> Just a feeling but I'd say that the imbalance in OSDs (one host having many more disks than the >>>> rest) is your problem. >>> Yes, last node in cluster have more disk then the rest, but >>> >>> one disk is 12TB and all others 9 HD is 1TB >>> >>>> Assuming that your configuration keeps 3 copies of each VM image then the imbalance probably means >>>> that 2 of these 3 copies reside on pve-3111 and if this host is unavailable, all VM images with 2 >>>> copies on that host become unresponsive, too. >>> In Proxmox web ceph pool I set the  Size: 2 , Min.Size: 2 >>> >> So this means that you want to have 2 copies in the regular case (size) and also 2 copies in the >> failure case (min size) so that the VMs stay available. > Yes I think before like you answer, but is not so worked. >> >> So you might solve your problem by decreasing min size to 1 (dangerous!!) or by increasing size to >> 3, which means that in the regular case you will have 3 copies but if only 2 are available, it will >> still work and re-sync the 3rd copy once it comes online again. > > I understand if decreasing min.size to 1 is very (dangerous!!!) > > If I increasing to 3 min.size keep 2 is default . > > But I'm afraid if set the 3/2 (good choice) MAX AVAIL in pool is will decrease in two or more space, > or am I wrong? Hoping I understood you correctly: With size=2, min.size=2 you get 50% of your raw storage space as usable storage space (because you keep 2 copies of usable data). Increasing size to 3 will naturally decrease the usable storage space to 33% of raw storage space because you will keep 3 copies of usable data. This might be the price you need to pay to keep your cluster running. There are other options to keep more of the storage space as usable (like erasure coding your data, comparable to what RAID 5 or 6 does) but those options have other implications on availability and performance. And I don't know enough abouth Ceph configuration to be of any help with these options. Regards, Uwe > For now I have with all disk : > > CLASS  SIZE         AVAIL       USED         RAW USED  %RAW USED > hdd    `106 TiB      99 TiB      7.7 TiB       7.7 TiB       7.26 > TOTAL  106 TiB      99 TiB      7.7 TiB       7.7 TiB       7.26 > > --- POOLS --- > POOL                             ID      PGS       STORED OBJECTS  USED         %USED      MAX AVAIL > device_health_metrics   1         1          8.3 MiB 22   17 MiB              0             44 TiB > vm.pool                         2          1024    3.0 TiB   864.55k  6.0 TiB       6.39         44 > TiB ( terminal 44 TiB = 48.37 ) in web I see  51.50 TB > cephfs_data                   3         32         874 GiB 223.76k  1.7 TiB       1.91         44 TiB > cephfs_metadata            4        32           25 MiB 27   51 MiB      0                       44 TiB > > > Am I right in my reasoning ? > > Thank you! > > > >> >>> With :  ceph osd map vm.pool object-name (vm ID) I see some of vm object one copy is on osd.12, >>> example : >>> >>> osdmap e14321 pool 'vm.pool' (2) object '114' -> pg 2.10486407 (2.7) -> up ([12,8], p12) acting >>> ([12,8], p12) >>> >>> But this example : >>> >>> osdmap e14321 pool 'vm.pool' (2) object '113' -> pg 2.8bd09f6d (2.36d) -> up ([10,7], p10) acting >>> ([10,7], p10) >>> >>> osd.10 and osd.7 >>> >>>> Check your failure domain for Ceph and possibly change it from OSD to host. This should prevent >>>> that >>>> one host holds multiple copies of a VM image. >>> I didn 't understand a little what to check  ? >>> >>> Can you explain me with example? >>> >> I don't have an example but you can read about the concept at: >> >> https://docs.ceph.com/en/latest/rados/operations/crush-map/#crush-maps >> >> >> Regards, >> >>     Uwe >> >> >> >>>> >>>> Regards, >>>> >>>>      Uwe >>>> >>>> Am 29.12.21 um 09:36 schrieb Сергей Цаболов: >>>>> Hello to all. >>>>> >>>>> In my case I have the 7 node cluster Proxmox and working Ceph (ceph version 15.2.15  octopus >>>>> (stable)": 7) >>>>> >>>>> Ceph HEALTH_OK >>>>> >>>>> ceph -s >>>>>     cluster: >>>>>       id:     9662e3fa-4ce6-41df-8d74-5deaa41a8dde >>>>>       health: HEALTH_OK >>>>> >>>>>     services: >>>>>       mon: 7 daemons, quorum pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 >>>>> (age 17h) >>>>>       mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, pve-3105, pve-3101, >>>>> pve-3111, >>>>> pve-3108 >>>>>       mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby >>>>>       osd: 22 osds: 22 up (since 17h), 22 in (since 17h) >>>>> >>>>>     task status: >>>>> >>>>>     data: >>>>>       pools:   4 pools, 1089 pgs >>>>>       objects: 1.09M objects, 4.1 TiB >>>>>       usage:   7.7 TiB used, 99 TiB / 106 TiB avail >>>>>       pgs:     1089 active+clean >>>>> >>>>> --------------------------------------------------------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> >>>>> ceph osd tree >>>>> >>>>> ID   CLASS  WEIGHT     TYPE NAME            STATUS  REWEIGHT PRI-AFF >>>>>    -1         106.43005  root default >>>>> -13          14.55478      host pve-3101 >>>>>    10    hdd    7.27739          osd.10           up   1.00000 1.00000 >>>>>    11    hdd    7.27739          osd.11           up   1.00000 1.00000 >>>>> -11          14.55478      host pve-3103 >>>>>     8    hdd    7.27739          osd.8            up   1.00000 1.00000 >>>>>     9    hdd    7.27739          osd.9            up   1.00000 1.00000 >>>>>    -3          14.55478      host pve-3105 >>>>>     0    hdd    7.27739          osd.0            up   1.00000 1.00000 >>>>>     1    hdd    7.27739          osd.1            up   1.00000 1.00000 >>>>>    -5          14.55478      host pve-3107 >>>>>     2    hdd    7.27739          osd.2            up   1.00000 1.00000 >>>>>     3    hdd    7.27739          osd.3            up   1.00000 1.00000 >>>>>    -9          14.55478      host pve-3108 >>>>>     6    hdd    7.27739          osd.6            up   1.00000 1.00000 >>>>>     7    hdd    7.27739          osd.7            up   1.00000 1.00000 >>>>>    -7          14.55478      host pve-3109 >>>>>     4    hdd    7.27739          osd.4            up   1.00000 1.00000 >>>>>     5    hdd    7.27739          osd.5            up   1.00000 1.00000 >>>>> -15          19.10138      host pve-3111 >>>>>    12    hdd   10.91409          osd.12           up   1.00000 1.00000 >>>>>    13    hdd    0.90970          osd.13           up   1.00000 1.00000 >>>>>    14    hdd    0.90970          osd.14           up   1.00000 1.00000 >>>>>    15    hdd    0.90970          osd.15           up   1.00000 1.00000 >>>>>    16    hdd    0.90970          osd.16           up   1.00000 1.00000 >>>>>    17    hdd    0.90970          osd.17           up   1.00000 1.00000 >>>>>    18    hdd    0.90970          osd.18           up   1.00000 1.00000 >>>>>    19    hdd    0.90970          osd.19           up   1.00000 1.00000 >>>>>    20    hdd    0.90970          osd.20           up   1.00000 1.00000 >>>>>    21    hdd    0.90970          osd.21           up   1.00000 1.00000 >>>>> >>>>> --------------------------------------------------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> >>>>> POOL                               ID  PGS   STORED   OBJECTS USED     %USED  MAX AVAIL >>>>> vm.pool                            2  1024  3.0 TiB  863.31k  6.0 TiB   6.38     44 TiB  (this >>>>> pool >>>>> have the all VM disk) >>>>> >>>>> --------------------------------------------------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> >>>>> ceph osd map vm.pool vm.pool.object >>>>> osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 (2.d5) -> up ([2,4], p2) >>>>> acting ([2,4], p2) >>>>> >>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> pveversion -v >>>>> proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve) >>>>> pve-manager: 6.4-13 (running version: 6.4-13/9f411e79) >>>>> pve-kernel-helper: 6.4-8 >>>>> pve-kernel-5.4: 6.4-7 >>>>> pve-kernel-5.4.143-1-pve: 5.4.143-1 >>>>> pve-kernel-5.4.106-1-pve: 5.4.106-1 >>>>> ceph: 15.2.15-pve1~bpo10 >>>>> ceph-fuse: 15.2.15-pve1~bpo10 >>>>> corosync: 3.1.2-pve1 >>>>> criu: 3.11-3 >>>>> glusterfs-client: 5.5-3 >>>>> ifupdown: residual config >>>>> ifupdown2: 3.0.0-1+pve4~bpo10 >>>>> ksm-control-daemon: 1.3-1 >>>>> libjs-extjs: 6.0.1-10 >>>>> libknet1: 1.22-pve1~bpo10+1 >>>>> libproxmox-acme-perl: 1.1.0 >>>>> libproxmox-backup-qemu0: 1.1.0-1 >>>>> libpve-access-control: 6.4-3 >>>>> libpve-apiclient-perl: 3.1-3 >>>>> libpve-common-perl: 6.4-4 >>>>> libpve-guest-common-perl: 3.1-5 >>>>> libpve-http-server-perl: 3.2-3 >>>>> libpve-storage-perl: 6.4-1 >>>>> libqb0: 1.0.5-1 >>>>> libspice-server1: 0.14.2-4~pve6+1 >>>>> lvm2: 2.03.02-pve4 >>>>> lxc-pve: 4.0.6-2 >>>>> lxcfs: 4.0.6-pve1 >>>>> novnc-pve: 1.1.0-1 >>>>> proxmox-backup-client: 1.1.13-2 >>>>> proxmox-mini-journalreader: 1.1-1 >>>>> proxmox-widget-toolkit: 2.6-1 >>>>> pve-cluster: 6.4-1 >>>>> pve-container: 3.3-6 >>>>> pve-docs: 6.4-2 >>>>> pve-edk2-firmware: 2.20200531-1 >>>>> pve-firewall: 4.1-4 >>>>> pve-firmware: 3.3-2 >>>>> pve-ha-manager: 3.1-1 >>>>> pve-i18n: 2.3-1 >>>>> pve-qemu-kvm: 5.2.0-6 >>>>> pve-xtermjs: 4.7.0-3 >>>>> qemu-server: 6.4-2 >>>>> smartmontools: 7.2-pve2 >>>>> spiceterm: 3.1-1 >>>>> vncterm: 1.6-2 >>>>> zfsutils-linux: 2.0.6-pve1~bpo10+1 >>>>> >>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> >>>>> And now my problem: >>>>> >>>>> For all VM I have one pool for VM disks >>>>> >>>>> When  node/host pve-3111  is shutdown in many of other nodes/hosts pve-3107, pve-3105  VM not >>>>> shutdown but not available in network. >>>>> >>>>> After the node/host is up Ceph back to HEALTH_OK and the all VM back to access in Network (without >>>>> reboot). >>>>> >>>>> Can some one to suggest me what I can to check in Ceph ? >>>>> >>>>> Thanks. >>>>> >>