* Re: [PVE-User] [ceph-users] Re: Ceph Usage web and terminal.
[not found] ` <101971ad-519a-9af2-249e-433df28b1f1a@t8.ru>
@ 2021-12-29 8:36 ` Сергей Цаболов
2021-12-29 11:16 ` Uwe Sauter
0 siblings, 1 reply; 6+ messages in thread
From: Сергей Цаболов @ 2021-12-29 8:36 UTC (permalink / raw)
To: pve-user
Hello to all.
In my case I have the 7 node cluster Proxmox and working Ceph (ceph
version 15.2.15 octopus (stable)": 7)
Ceph HEALTH_OK
ceph -s
cluster:
id: 9662e3fa-4ce6-41df-8d74-5deaa41a8dde
health: HEALTH_OK
services:
mon: 7 daemons, quorum
pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 (age 17h)
mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103,
pve-3105, pve-3101, pve-3111, pve-3108
mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby
osd: 22 osds: 22 up (since 17h), 22 in (since 17h)
task status:
data:
pools: 4 pools, 1089 pgs
objects: 1.09M objects, 4.1 TiB
usage: 7.7 TiB used, 99 TiB / 106 TiB avail
pgs: 1089 active+clean
---------------------------------------------------------------------------------------------------------------------
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 106.43005 root default
-13 14.55478 host pve-3101
10 hdd 7.27739 osd.10 up 1.00000 1.00000
11 hdd 7.27739 osd.11 up 1.00000 1.00000
-11 14.55478 host pve-3103
8 hdd 7.27739 osd.8 up 1.00000 1.00000
9 hdd 7.27739 osd.9 up 1.00000 1.00000
-3 14.55478 host pve-3105
0 hdd 7.27739 osd.0 up 1.00000 1.00000
1 hdd 7.27739 osd.1 up 1.00000 1.00000
-5 14.55478 host pve-3107
2 hdd 7.27739 osd.2 up 1.00000 1.00000
3 hdd 7.27739 osd.3 up 1.00000 1.00000
-9 14.55478 host pve-3108
6 hdd 7.27739 osd.6 up 1.00000 1.00000
7 hdd 7.27739 osd.7 up 1.00000 1.00000
-7 14.55478 host pve-3109
4 hdd 7.27739 osd.4 up 1.00000 1.00000
5 hdd 7.27739 osd.5 up 1.00000 1.00000
-15 19.10138 host pve-3111
12 hdd 10.91409 osd.12 up 1.00000 1.00000
13 hdd 0.90970 osd.13 up 1.00000 1.00000
14 hdd 0.90970 osd.14 up 1.00000 1.00000
15 hdd 0.90970 osd.15 up 1.00000 1.00000
16 hdd 0.90970 osd.16 up 1.00000 1.00000
17 hdd 0.90970 osd.17 up 1.00000 1.00000
18 hdd 0.90970 osd.18 up 1.00000 1.00000
19 hdd 0.90970 osd.19 up 1.00000 1.00000
20 hdd 0.90970 osd.20 up 1.00000 1.00000
21 hdd 0.90970 osd.21 up 1.00000 1.00000
---------------------------------------------------------------------------------------------------------------
POOL ID PGS STORED OBJECTS USED
%USED MAX AVAIL
vm.pool 2 1024 3.0 TiB 863.31k 6.0 TiB
6.38 44 TiB (this pool have the all VM disk)
---------------------------------------------------------------------------------------------------------------
ceph osd map vm.pool vm.pool.object
osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg
2.196f68d5 (2.d5) -> up ([2,4], p2) acting ([2,4], p2)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-helper: 6.4-8
pve-kernel-5.4: 6.4-7
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph: 15.2.15-pve1~bpo10
ceph-fuse: 15.2.15-pve1~bpo10
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve1~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.6-pve1~bpo10+1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
And now my problem:
For all VM I have one pool for VM disks
When node/host pve-3111 is shutdown in many of other nodes/hosts
pve-3107, pve-3105 VM not shutdown but not available in network.
After the node/host is up Ceph back to HEALTH_OK and the all VM back to
access in Network (without reboot).
Can some one to suggest me what I can to check in Ceph ?
Thanks.
--
-------------------------
Best Regards
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PVE-User] [ceph-users] Re: Ceph Usage web and terminal.
2021-12-29 8:36 ` [PVE-User] [ceph-users] Re: Ceph Usage web and terminal Сергей Цаболов
@ 2021-12-29 11:16 ` Uwe Sauter
2021-12-29 12:51 ` Сергей Цаболов
0 siblings, 1 reply; 6+ messages in thread
From: Uwe Sauter @ 2021-12-29 11:16 UTC (permalink / raw)
To: Proxmox VE user list,
Сергей
Цаболов
Just a feeling but I'd say that the imbalance in OSDs (one host having many more disks than the
rest) is your problem.
Assuming that your configuration keeps 3 copies of each VM image then the imbalance probably means
that 2 of these 3 copies reside on pve-3111 and if this host is unavailable, all VM images with 2
copies on that host become unresponsive, too.
Check your failure domain for Ceph and possibly change it from OSD to host. This should prevent that
one host holds multiple copies of a VM image.
Regards,
Uwe
Am 29.12.21 um 09:36 schrieb Сергей Цаболов:
> Hello to all.
>
> In my case I have the 7 node cluster Proxmox and working Ceph (ceph version 15.2.15 octopus
> (stable)": 7)
>
> Ceph HEALTH_OK
>
> ceph -s
> cluster:
> id: 9662e3fa-4ce6-41df-8d74-5deaa41a8dde
> health: HEALTH_OK
>
> services:
> mon: 7 daemons, quorum pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 (age 17h)
> mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, pve-3105, pve-3101, pve-3111,
> pve-3108
> mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby
> osd: 22 osds: 22 up (since 17h), 22 in (since 17h)
>
> task status:
>
> data:
> pools: 4 pools, 1089 pgs
> objects: 1.09M objects, 4.1 TiB
> usage: 7.7 TiB used, 99 TiB / 106 TiB avail
> pgs: 1089 active+clean
>
> ---------------------------------------------------------------------------------------------------------------------
>
>
> ceph osd tree
>
> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
> -1 106.43005 root default
> -13 14.55478 host pve-3101
> 10 hdd 7.27739 osd.10 up 1.00000 1.00000
> 11 hdd 7.27739 osd.11 up 1.00000 1.00000
> -11 14.55478 host pve-3103
> 8 hdd 7.27739 osd.8 up 1.00000 1.00000
> 9 hdd 7.27739 osd.9 up 1.00000 1.00000
> -3 14.55478 host pve-3105
> 0 hdd 7.27739 osd.0 up 1.00000 1.00000
> 1 hdd 7.27739 osd.1 up 1.00000 1.00000
> -5 14.55478 host pve-3107
> 2 hdd 7.27739 osd.2 up 1.00000 1.00000
> 3 hdd 7.27739 osd.3 up 1.00000 1.00000
> -9 14.55478 host pve-3108
> 6 hdd 7.27739 osd.6 up 1.00000 1.00000
> 7 hdd 7.27739 osd.7 up 1.00000 1.00000
> -7 14.55478 host pve-3109
> 4 hdd 7.27739 osd.4 up 1.00000 1.00000
> 5 hdd 7.27739 osd.5 up 1.00000 1.00000
> -15 19.10138 host pve-3111
> 12 hdd 10.91409 osd.12 up 1.00000 1.00000
> 13 hdd 0.90970 osd.13 up 1.00000 1.00000
> 14 hdd 0.90970 osd.14 up 1.00000 1.00000
> 15 hdd 0.90970 osd.15 up 1.00000 1.00000
> 16 hdd 0.90970 osd.16 up 1.00000 1.00000
> 17 hdd 0.90970 osd.17 up 1.00000 1.00000
> 18 hdd 0.90970 osd.18 up 1.00000 1.00000
> 19 hdd 0.90970 osd.19 up 1.00000 1.00000
> 20 hdd 0.90970 osd.20 up 1.00000 1.00000
> 21 hdd 0.90970 osd.21 up 1.00000 1.00000
>
> ---------------------------------------------------------------------------------------------------------------
>
>
> POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
> vm.pool 2 1024 3.0 TiB 863.31k 6.0 TiB 6.38 44 TiB (this pool
> have the all VM disk)
>
> ---------------------------------------------------------------------------------------------------------------
>
>
> ceph osd map vm.pool vm.pool.object
> osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 (2.d5) -> up ([2,4], p2)
> acting ([2,4], p2)
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> pveversion -v
> proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
> pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
> pve-kernel-helper: 6.4-8
> pve-kernel-5.4: 6.4-7
> pve-kernel-5.4.143-1-pve: 5.4.143-1
> pve-kernel-5.4.106-1-pve: 5.4.106-1
> ceph: 15.2.15-pve1~bpo10
> ceph-fuse: 15.2.15-pve1~bpo10
> corosync: 3.1.2-pve1
> criu: 3.11-3
> glusterfs-client: 5.5-3
> ifupdown: residual config
> ifupdown2: 3.0.0-1+pve4~bpo10
> ksm-control-daemon: 1.3-1
> libjs-extjs: 6.0.1-10
> libknet1: 1.22-pve1~bpo10+1
> libproxmox-acme-perl: 1.1.0
> libproxmox-backup-qemu0: 1.1.0-1
> libpve-access-control: 6.4-3
> libpve-apiclient-perl: 3.1-3
> libpve-common-perl: 6.4-4
> libpve-guest-common-perl: 3.1-5
> libpve-http-server-perl: 3.2-3
> libpve-storage-perl: 6.4-1
> libqb0: 1.0.5-1
> libspice-server1: 0.14.2-4~pve6+1
> lvm2: 2.03.02-pve4
> lxc-pve: 4.0.6-2
> lxcfs: 4.0.6-pve1
> novnc-pve: 1.1.0-1
> proxmox-backup-client: 1.1.13-2
> proxmox-mini-journalreader: 1.1-1
> proxmox-widget-toolkit: 2.6-1
> pve-cluster: 6.4-1
> pve-container: 3.3-6
> pve-docs: 6.4-2
> pve-edk2-firmware: 2.20200531-1
> pve-firewall: 4.1-4
> pve-firmware: 3.3-2
> pve-ha-manager: 3.1-1
> pve-i18n: 2.3-1
> pve-qemu-kvm: 5.2.0-6
> pve-xtermjs: 4.7.0-3
> qemu-server: 6.4-2
> smartmontools: 7.2-pve2
> spiceterm: 3.1-1
> vncterm: 1.6-2
> zfsutils-linux: 2.0.6-pve1~bpo10+1
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> And now my problem:
>
> For all VM I have one pool for VM disks
>
> When node/host pve-3111 is shutdown in many of other nodes/hosts pve-3107, pve-3105 VM not
> shutdown but not available in network.
>
> After the node/host is up Ceph back to HEALTH_OK and the all VM back to access in Network (without
> reboot).
>
> Can some one to suggest me what I can to check in Ceph ?
>
> Thanks.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PVE-User] [ceph-users] Re: Ceph Usage web and terminal.
2021-12-29 11:16 ` Uwe Sauter
@ 2021-12-29 12:51 ` Сергей Цаболов
2021-12-29 13:13 ` Uwe Sauter
0 siblings, 1 reply; 6+ messages in thread
From: Сергей Цаболов @ 2021-12-29 12:51 UTC (permalink / raw)
To: uwe.sauter.de, Proxmox VE user list
Hi, Uwe
29.12.2021 14:16, Uwe Sauter пишет:
> Just a feeling but I'd say that the imbalance in OSDs (one host having many more disks than the
> rest) is your problem.
Yes, last node in cluster have more disk then the rest, but
one disk is 12TB and all others 9 HD is 1TB
>
> Assuming that your configuration keeps 3 copies of each VM image then the imbalance probably means
> that 2 of these 3 copies reside on pve-3111 and if this host is unavailable, all VM images with 2
> copies on that host become unresponsive, too.
In Proxmox web ceph pool I set the Size: 2 , Min.Size: 2
With : ceph osd map vm.pool object-name (vm ID) I see some of vm object
one copy is on osd.12, example :
osdmap e14321 pool 'vm.pool' (2) object '114' -> pg 2.10486407 (2.7) ->
up ([12,8], p12) acting ([12,8], p12)
But this example :
osdmap e14321 pool 'vm.pool' (2) object '113' -> pg 2.8bd09f6d (2.36d)
-> up ([10,7], p10) acting ([10,7], p10)
osd.10 and osd.7
>
> Check your failure domain for Ceph and possibly change it from OSD to host. This should prevent that
> one host holds multiple copies of a VM image.
I didn 't understand a little what to check ?
Can you explain me with example?
>
>
> Regards,
>
> Uwe
>
> Am 29.12.21 um 09:36 schrieb Сергей Цаболов:
>> Hello to all.
>>
>> In my case I have the 7 node cluster Proxmox and working Ceph (ceph version 15.2.15 octopus
>> (stable)": 7)
>>
>> Ceph HEALTH_OK
>>
>> ceph -s
>> cluster:
>> id: 9662e3fa-4ce6-41df-8d74-5deaa41a8dde
>> health: HEALTH_OK
>>
>> services:
>> mon: 7 daemons, quorum pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 (age 17h)
>> mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, pve-3105, pve-3101, pve-3111,
>> pve-3108
>> mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby
>> osd: 22 osds: 22 up (since 17h), 22 in (since 17h)
>>
>> task status:
>>
>> data:
>> pools: 4 pools, 1089 pgs
>> objects: 1.09M objects, 4.1 TiB
>> usage: 7.7 TiB used, 99 TiB / 106 TiB avail
>> pgs: 1089 active+clean
>>
>> ---------------------------------------------------------------------------------------------------------------------
>>
>>
>> ceph osd tree
>>
>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>> -1 106.43005 root default
>> -13 14.55478 host pve-3101
>> 10 hdd 7.27739 osd.10 up 1.00000 1.00000
>> 11 hdd 7.27739 osd.11 up 1.00000 1.00000
>> -11 14.55478 host pve-3103
>> 8 hdd 7.27739 osd.8 up 1.00000 1.00000
>> 9 hdd 7.27739 osd.9 up 1.00000 1.00000
>> -3 14.55478 host pve-3105
>> 0 hdd 7.27739 osd.0 up 1.00000 1.00000
>> 1 hdd 7.27739 osd.1 up 1.00000 1.00000
>> -5 14.55478 host pve-3107
>> 2 hdd 7.27739 osd.2 up 1.00000 1.00000
>> 3 hdd 7.27739 osd.3 up 1.00000 1.00000
>> -9 14.55478 host pve-3108
>> 6 hdd 7.27739 osd.6 up 1.00000 1.00000
>> 7 hdd 7.27739 osd.7 up 1.00000 1.00000
>> -7 14.55478 host pve-3109
>> 4 hdd 7.27739 osd.4 up 1.00000 1.00000
>> 5 hdd 7.27739 osd.5 up 1.00000 1.00000
>> -15 19.10138 host pve-3111
>> 12 hdd 10.91409 osd.12 up 1.00000 1.00000
>> 13 hdd 0.90970 osd.13 up 1.00000 1.00000
>> 14 hdd 0.90970 osd.14 up 1.00000 1.00000
>> 15 hdd 0.90970 osd.15 up 1.00000 1.00000
>> 16 hdd 0.90970 osd.16 up 1.00000 1.00000
>> 17 hdd 0.90970 osd.17 up 1.00000 1.00000
>> 18 hdd 0.90970 osd.18 up 1.00000 1.00000
>> 19 hdd 0.90970 osd.19 up 1.00000 1.00000
>> 20 hdd 0.90970 osd.20 up 1.00000 1.00000
>> 21 hdd 0.90970 osd.21 up 1.00000 1.00000
>>
>> ---------------------------------------------------------------------------------------------------------------
>>
>>
>> POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
>> vm.pool 2 1024 3.0 TiB 863.31k 6.0 TiB 6.38 44 TiB (this pool
>> have the all VM disk)
>>
>> ---------------------------------------------------------------------------------------------------------------
>>
>>
>> ceph osd map vm.pool vm.pool.object
>> osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 (2.d5) -> up ([2,4], p2)
>> acting ([2,4], p2)
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> pveversion -v
>> proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
>> pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
>> pve-kernel-helper: 6.4-8
>> pve-kernel-5.4: 6.4-7
>> pve-kernel-5.4.143-1-pve: 5.4.143-1
>> pve-kernel-5.4.106-1-pve: 5.4.106-1
>> ceph: 15.2.15-pve1~bpo10
>> ceph-fuse: 15.2.15-pve1~bpo10
>> corosync: 3.1.2-pve1
>> criu: 3.11-3
>> glusterfs-client: 5.5-3
>> ifupdown: residual config
>> ifupdown2: 3.0.0-1+pve4~bpo10
>> ksm-control-daemon: 1.3-1
>> libjs-extjs: 6.0.1-10
>> libknet1: 1.22-pve1~bpo10+1
>> libproxmox-acme-perl: 1.1.0
>> libproxmox-backup-qemu0: 1.1.0-1
>> libpve-access-control: 6.4-3
>> libpve-apiclient-perl: 3.1-3
>> libpve-common-perl: 6.4-4
>> libpve-guest-common-perl: 3.1-5
>> libpve-http-server-perl: 3.2-3
>> libpve-storage-perl: 6.4-1
>> libqb0: 1.0.5-1
>> libspice-server1: 0.14.2-4~pve6+1
>> lvm2: 2.03.02-pve4
>> lxc-pve: 4.0.6-2
>> lxcfs: 4.0.6-pve1
>> novnc-pve: 1.1.0-1
>> proxmox-backup-client: 1.1.13-2
>> proxmox-mini-journalreader: 1.1-1
>> proxmox-widget-toolkit: 2.6-1
>> pve-cluster: 6.4-1
>> pve-container: 3.3-6
>> pve-docs: 6.4-2
>> pve-edk2-firmware: 2.20200531-1
>> pve-firewall: 4.1-4
>> pve-firmware: 3.3-2
>> pve-ha-manager: 3.1-1
>> pve-i18n: 2.3-1
>> pve-qemu-kvm: 5.2.0-6
>> pve-xtermjs: 4.7.0-3
>> qemu-server: 6.4-2
>> smartmontools: 7.2-pve2
>> spiceterm: 3.1-1
>> vncterm: 1.6-2
>> zfsutils-linux: 2.0.6-pve1~bpo10+1
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>> And now my problem:
>>
>> For all VM I have one pool for VM disks
>>
>> When node/host pve-3111 is shutdown in many of other nodes/hosts pve-3107, pve-3105 VM not
>> shutdown but not available in network.
>>
>> After the node/host is up Ceph back to HEALTH_OK and the all VM back to access in Network (without
>> reboot).
>>
>> Can some one to suggest me what I can to check in Ceph ?
>>
>> Thanks.
>>
>
--
-------------------------
С уважением
Сергей Цаболов,
Системный администратор
ООО "Т8"
Тел.: +74992716161,
Моб: +79850334875
tsabolov@t8.ru
ООО «Т8», 107076, г. Москва, Краснобогатырская ул., д. 44, стр.1
www.t8.ru
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PVE-User] [ceph-users] Re: Ceph Usage web and terminal.
2021-12-29 12:51 ` Сергей Цаболов
@ 2021-12-29 13:13 ` Uwe Sauter
2021-12-29 14:06 ` Сергей Цаболов
0 siblings, 1 reply; 6+ messages in thread
From: Uwe Sauter @ 2021-12-29 13:13 UTC (permalink / raw)
To: Сергей
Цаболов,
Proxmox VE user list
Am 29.12.21 um 13:51 schrieb Сергей Цаболов:
> Hi, Uwe
>
> 29.12.2021 14:16, Uwe Sauter пишет:
>> Just a feeling but I'd say that the imbalance in OSDs (one host having many more disks than the
>> rest) is your problem.
>
> Yes, last node in cluster have more disk then the rest, but
>
> one disk is 12TB and all others 9 HD is 1TB
>
>>
>> Assuming that your configuration keeps 3 copies of each VM image then the imbalance probably means
>> that 2 of these 3 copies reside on pve-3111 and if this host is unavailable, all VM images with 2
>> copies on that host become unresponsive, too.
>
> In Proxmox web ceph pool I set the Size: 2 , Min.Size: 2
>
So this means that you want to have 2 copies in the regular case (size) and also 2 copies in the
failure case (min size) so that the VMs stay available.
So you might solve your problem by decreasing min size to 1 (dangerous!!) or by increasing size to
3, which means that in the regular case you will have 3 copies but if only 2 are available, it will
still work and re-sync the 3rd copy once it comes online again.
> With : ceph osd map vm.pool object-name (vm ID) I see some of vm object one copy is on osd.12,
> example :
>
> osdmap e14321 pool 'vm.pool' (2) object '114' -> pg 2.10486407 (2.7) -> up ([12,8], p12) acting
> ([12,8], p12)
>
> But this example :
>
> osdmap e14321 pool 'vm.pool' (2) object '113' -> pg 2.8bd09f6d (2.36d) -> up ([10,7], p10) acting
> ([10,7], p10)
>
> osd.10 and osd.7
>
>>
>> Check your failure domain for Ceph and possibly change it from OSD to host. This should prevent that
>> one host holds multiple copies of a VM image.
>
> I didn 't understand a little what to check ?
>
> Can you explain me with example?
>
I don't have an example but you can read about the concept at:
https://docs.ceph.com/en/latest/rados/operations/crush-map/#crush-maps
Regards,
Uwe
>
>>
>>
>> Regards,
>>
>> Uwe
>>
>> Am 29.12.21 um 09:36 schrieb Сергей Цаболов:
>>> Hello to all.
>>>
>>> In my case I have the 7 node cluster Proxmox and working Ceph (ceph version 15.2.15 octopus
>>> (stable)": 7)
>>>
>>> Ceph HEALTH_OK
>>>
>>> ceph -s
>>> cluster:
>>> id: 9662e3fa-4ce6-41df-8d74-5deaa41a8dde
>>> health: HEALTH_OK
>>>
>>> services:
>>> mon: 7 daemons, quorum pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 (age 17h)
>>> mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, pve-3105, pve-3101, pve-3111,
>>> pve-3108
>>> mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby
>>> osd: 22 osds: 22 up (since 17h), 22 in (since 17h)
>>>
>>> task status:
>>>
>>> data:
>>> pools: 4 pools, 1089 pgs
>>> objects: 1.09M objects, 4.1 TiB
>>> usage: 7.7 TiB used, 99 TiB / 106 TiB avail
>>> pgs: 1089 active+clean
>>>
>>> ---------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>> ceph osd tree
>>>
>>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>>> -1 106.43005 root default
>>> -13 14.55478 host pve-3101
>>> 10 hdd 7.27739 osd.10 up 1.00000 1.00000
>>> 11 hdd 7.27739 osd.11 up 1.00000 1.00000
>>> -11 14.55478 host pve-3103
>>> 8 hdd 7.27739 osd.8 up 1.00000 1.00000
>>> 9 hdd 7.27739 osd.9 up 1.00000 1.00000
>>> -3 14.55478 host pve-3105
>>> 0 hdd 7.27739 osd.0 up 1.00000 1.00000
>>> 1 hdd 7.27739 osd.1 up 1.00000 1.00000
>>> -5 14.55478 host pve-3107
>>> 2 hdd 7.27739 osd.2 up 1.00000 1.00000
>>> 3 hdd 7.27739 osd.3 up 1.00000 1.00000
>>> -9 14.55478 host pve-3108
>>> 6 hdd 7.27739 osd.6 up 1.00000 1.00000
>>> 7 hdd 7.27739 osd.7 up 1.00000 1.00000
>>> -7 14.55478 host pve-3109
>>> 4 hdd 7.27739 osd.4 up 1.00000 1.00000
>>> 5 hdd 7.27739 osd.5 up 1.00000 1.00000
>>> -15 19.10138 host pve-3111
>>> 12 hdd 10.91409 osd.12 up 1.00000 1.00000
>>> 13 hdd 0.90970 osd.13 up 1.00000 1.00000
>>> 14 hdd 0.90970 osd.14 up 1.00000 1.00000
>>> 15 hdd 0.90970 osd.15 up 1.00000 1.00000
>>> 16 hdd 0.90970 osd.16 up 1.00000 1.00000
>>> 17 hdd 0.90970 osd.17 up 1.00000 1.00000
>>> 18 hdd 0.90970 osd.18 up 1.00000 1.00000
>>> 19 hdd 0.90970 osd.19 up 1.00000 1.00000
>>> 20 hdd 0.90970 osd.20 up 1.00000 1.00000
>>> 21 hdd 0.90970 osd.21 up 1.00000 1.00000
>>>
>>> ---------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>> POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
>>> vm.pool 2 1024 3.0 TiB 863.31k 6.0 TiB 6.38 44 TiB (this pool
>>> have the all VM disk)
>>>
>>> ---------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>> ceph osd map vm.pool vm.pool.object
>>> osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 (2.d5) -> up ([2,4], p2)
>>> acting ([2,4], p2)
>>>
>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> pveversion -v
>>> proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
>>> pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
>>> pve-kernel-helper: 6.4-8
>>> pve-kernel-5.4: 6.4-7
>>> pve-kernel-5.4.143-1-pve: 5.4.143-1
>>> pve-kernel-5.4.106-1-pve: 5.4.106-1
>>> ceph: 15.2.15-pve1~bpo10
>>> ceph-fuse: 15.2.15-pve1~bpo10
>>> corosync: 3.1.2-pve1
>>> criu: 3.11-3
>>> glusterfs-client: 5.5-3
>>> ifupdown: residual config
>>> ifupdown2: 3.0.0-1+pve4~bpo10
>>> ksm-control-daemon: 1.3-1
>>> libjs-extjs: 6.0.1-10
>>> libknet1: 1.22-pve1~bpo10+1
>>> libproxmox-acme-perl: 1.1.0
>>> libproxmox-backup-qemu0: 1.1.0-1
>>> libpve-access-control: 6.4-3
>>> libpve-apiclient-perl: 3.1-3
>>> libpve-common-perl: 6.4-4
>>> libpve-guest-common-perl: 3.1-5
>>> libpve-http-server-perl: 3.2-3
>>> libpve-storage-perl: 6.4-1
>>> libqb0: 1.0.5-1
>>> libspice-server1: 0.14.2-4~pve6+1
>>> lvm2: 2.03.02-pve4
>>> lxc-pve: 4.0.6-2
>>> lxcfs: 4.0.6-pve1
>>> novnc-pve: 1.1.0-1
>>> proxmox-backup-client: 1.1.13-2
>>> proxmox-mini-journalreader: 1.1-1
>>> proxmox-widget-toolkit: 2.6-1
>>> pve-cluster: 6.4-1
>>> pve-container: 3.3-6
>>> pve-docs: 6.4-2
>>> pve-edk2-firmware: 2.20200531-1
>>> pve-firewall: 4.1-4
>>> pve-firmware: 3.3-2
>>> pve-ha-manager: 3.1-1
>>> pve-i18n: 2.3-1
>>> pve-qemu-kvm: 5.2.0-6
>>> pve-xtermjs: 4.7.0-3
>>> qemu-server: 6.4-2
>>> smartmontools: 7.2-pve2
>>> spiceterm: 3.1-1
>>> vncterm: 1.6-2
>>> zfsutils-linux: 2.0.6-pve1~bpo10+1
>>>
>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>> And now my problem:
>>>
>>> For all VM I have one pool for VM disks
>>>
>>> When node/host pve-3111 is shutdown in many of other nodes/hosts pve-3107, pve-3105 VM not
>>> shutdown but not available in network.
>>>
>>> After the node/host is up Ceph back to HEALTH_OK and the all VM back to access in Network (without
>>> reboot).
>>>
>>> Can some one to suggest me what I can to check in Ceph ?
>>>
>>> Thanks.
>>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PVE-User] [ceph-users] Re: Ceph Usage web and terminal.
2021-12-29 13:13 ` Uwe Sauter
@ 2021-12-29 14:06 ` Сергей Цаболов
2021-12-29 14:13 ` Uwe Sauter
0 siblings, 1 reply; 6+ messages in thread
From: Сергей Цаболов @ 2021-12-29 14:06 UTC (permalink / raw)
To: uwe.sauter.de, Proxmox VE user list
Ok, I understand the case.
29.12.2021 16:13, Uwe Sauter пишет:
> Am 29.12.21 um 13:51 schrieb Сергей Цаболов:
>> Hi, Uwe
>>
>> 29.12.2021 14:16, Uwe Sauter пишет:
>>> Just a feeling but I'd say that the imbalance in OSDs (one host having many more disks than the
>>> rest) is your problem.
>> Yes, last node in cluster have more disk then the rest, but
>>
>> one disk is 12TB and all others 9 HD is 1TB
>>
>>> Assuming that your configuration keeps 3 copies of each VM image then the imbalance probably means
>>> that 2 of these 3 copies reside on pve-3111 and if this host is unavailable, all VM images with 2
>>> copies on that host become unresponsive, too.
>> In Proxmox web ceph pool I set the Size: 2 , Min.Size: 2
>>
> So this means that you want to have 2 copies in the regular case (size) and also 2 copies in the
> failure case (min size) so that the VMs stay available.
Yes I think before like you answer, but is not so worked.
>
> So you might solve your problem by decreasing min size to 1 (dangerous!!) or by increasing size to
> 3, which means that in the regular case you will have 3 copies but if only 2 are available, it will
> still work and re-sync the 3rd copy once it comes online again.
I understand if decreasing min.size to 1 is very (dangerous!!!)
If I increasing to 3 min.size keep 2 is default .
But I'm afraid if set the 3/2 (good choice) MAX AVAIL in pool is will
decrease in two or more space, or am I wrong?
For now I have with all disk :
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd `106 TiB 99 TiB 7.7 TiB 7.7 TiB 7.26
TOTAL 106 TiB 99 TiB 7.7 TiB 7.7 TiB 7.26
--- POOLS ---
POOL ID PGS STORED OBJECTS
USED %USED MAX AVAIL
device_health_metrics 1 1 8.3 MiB 22 17 MiB
0 44 TiB
vm.pool 2 1024 3.0 TiB 864.55k
6.0 TiB 6.39 44 TiB ( terminal 44 TiB = 48.37 ) in web I
see 51.50 TB
cephfs_data 3 32 874 GiB 223.76k 1.7
TiB 1.91 44 TiB
cephfs_metadata 4 32 25 MiB 27 51 MiB
0 44 TiB
Am I right in my reasoning ?
Thank you!
>
>> With : ceph osd map vm.pool object-name (vm ID) I see some of vm object one copy is on osd.12,
>> example :
>>
>> osdmap e14321 pool 'vm.pool' (2) object '114' -> pg 2.10486407 (2.7) -> up ([12,8], p12) acting
>> ([12,8], p12)
>>
>> But this example :
>>
>> osdmap e14321 pool 'vm.pool' (2) object '113' -> pg 2.8bd09f6d (2.36d) -> up ([10,7], p10) acting
>> ([10,7], p10)
>>
>> osd.10 and osd.7
>>
>>> Check your failure domain for Ceph and possibly change it from OSD to host. This should prevent that
>>> one host holds multiple copies of a VM image.
>> I didn 't understand a little what to check ?
>>
>> Can you explain me with example?
>>
> I don't have an example but you can read about the concept at:
>
> https://docs.ceph.com/en/latest/rados/operations/crush-map/#crush-maps
>
>
> Regards,
>
> Uwe
>
>
>
>>>
>>> Regards,
>>>
>>> Uwe
>>>
>>> Am 29.12.21 um 09:36 schrieb Сергей Цаболов:
>>>> Hello to all.
>>>>
>>>> In my case I have the 7 node cluster Proxmox and working Ceph (ceph version 15.2.15 octopus
>>>> (stable)": 7)
>>>>
>>>> Ceph HEALTH_OK
>>>>
>>>> ceph -s
>>>> cluster:
>>>> id: 9662e3fa-4ce6-41df-8d74-5deaa41a8dde
>>>> health: HEALTH_OK
>>>>
>>>> services:
>>>> mon: 7 daemons, quorum pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 (age 17h)
>>>> mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, pve-3105, pve-3101, pve-3111,
>>>> pve-3108
>>>> mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby
>>>> osd: 22 osds: 22 up (since 17h), 22 in (since 17h)
>>>>
>>>> task status:
>>>>
>>>> data:
>>>> pools: 4 pools, 1089 pgs
>>>> objects: 1.09M objects, 4.1 TiB
>>>> usage: 7.7 TiB used, 99 TiB / 106 TiB avail
>>>> pgs: 1089 active+clean
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>> ceph osd tree
>>>>
>>>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>>>> -1 106.43005 root default
>>>> -13 14.55478 host pve-3101
>>>> 10 hdd 7.27739 osd.10 up 1.00000 1.00000
>>>> 11 hdd 7.27739 osd.11 up 1.00000 1.00000
>>>> -11 14.55478 host pve-3103
>>>> 8 hdd 7.27739 osd.8 up 1.00000 1.00000
>>>> 9 hdd 7.27739 osd.9 up 1.00000 1.00000
>>>> -3 14.55478 host pve-3105
>>>> 0 hdd 7.27739 osd.0 up 1.00000 1.00000
>>>> 1 hdd 7.27739 osd.1 up 1.00000 1.00000
>>>> -5 14.55478 host pve-3107
>>>> 2 hdd 7.27739 osd.2 up 1.00000 1.00000
>>>> 3 hdd 7.27739 osd.3 up 1.00000 1.00000
>>>> -9 14.55478 host pve-3108
>>>> 6 hdd 7.27739 osd.6 up 1.00000 1.00000
>>>> 7 hdd 7.27739 osd.7 up 1.00000 1.00000
>>>> -7 14.55478 host pve-3109
>>>> 4 hdd 7.27739 osd.4 up 1.00000 1.00000
>>>> 5 hdd 7.27739 osd.5 up 1.00000 1.00000
>>>> -15 19.10138 host pve-3111
>>>> 12 hdd 10.91409 osd.12 up 1.00000 1.00000
>>>> 13 hdd 0.90970 osd.13 up 1.00000 1.00000
>>>> 14 hdd 0.90970 osd.14 up 1.00000 1.00000
>>>> 15 hdd 0.90970 osd.15 up 1.00000 1.00000
>>>> 16 hdd 0.90970 osd.16 up 1.00000 1.00000
>>>> 17 hdd 0.90970 osd.17 up 1.00000 1.00000
>>>> 18 hdd 0.90970 osd.18 up 1.00000 1.00000
>>>> 19 hdd 0.90970 osd.19 up 1.00000 1.00000
>>>> 20 hdd 0.90970 osd.20 up 1.00000 1.00000
>>>> 21 hdd 0.90970 osd.21 up 1.00000 1.00000
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>> POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
>>>> vm.pool 2 1024 3.0 TiB 863.31k 6.0 TiB 6.38 44 TiB (this pool
>>>> have the all VM disk)
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>> ceph osd map vm.pool vm.pool.object
>>>> osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 (2.d5) -> up ([2,4], p2)
>>>> acting ([2,4], p2)
>>>>
>>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> pveversion -v
>>>> proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
>>>> pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
>>>> pve-kernel-helper: 6.4-8
>>>> pve-kernel-5.4: 6.4-7
>>>> pve-kernel-5.4.143-1-pve: 5.4.143-1
>>>> pve-kernel-5.4.106-1-pve: 5.4.106-1
>>>> ceph: 15.2.15-pve1~bpo10
>>>> ceph-fuse: 15.2.15-pve1~bpo10
>>>> corosync: 3.1.2-pve1
>>>> criu: 3.11-3
>>>> glusterfs-client: 5.5-3
>>>> ifupdown: residual config
>>>> ifupdown2: 3.0.0-1+pve4~bpo10
>>>> ksm-control-daemon: 1.3-1
>>>> libjs-extjs: 6.0.1-10
>>>> libknet1: 1.22-pve1~bpo10+1
>>>> libproxmox-acme-perl: 1.1.0
>>>> libproxmox-backup-qemu0: 1.1.0-1
>>>> libpve-access-control: 6.4-3
>>>> libpve-apiclient-perl: 3.1-3
>>>> libpve-common-perl: 6.4-4
>>>> libpve-guest-common-perl: 3.1-5
>>>> libpve-http-server-perl: 3.2-3
>>>> libpve-storage-perl: 6.4-1
>>>> libqb0: 1.0.5-1
>>>> libspice-server1: 0.14.2-4~pve6+1
>>>> lvm2: 2.03.02-pve4
>>>> lxc-pve: 4.0.6-2
>>>> lxcfs: 4.0.6-pve1
>>>> novnc-pve: 1.1.0-1
>>>> proxmox-backup-client: 1.1.13-2
>>>> proxmox-mini-journalreader: 1.1-1
>>>> proxmox-widget-toolkit: 2.6-1
>>>> pve-cluster: 6.4-1
>>>> pve-container: 3.3-6
>>>> pve-docs: 6.4-2
>>>> pve-edk2-firmware: 2.20200531-1
>>>> pve-firewall: 4.1-4
>>>> pve-firmware: 3.3-2
>>>> pve-ha-manager: 3.1-1
>>>> pve-i18n: 2.3-1
>>>> pve-qemu-kvm: 5.2.0-6
>>>> pve-xtermjs: 4.7.0-3
>>>> qemu-server: 6.4-2
>>>> smartmontools: 7.2-pve2
>>>> spiceterm: 3.1-1
>>>> vncterm: 1.6-2
>>>> zfsutils-linux: 2.0.6-pve1~bpo10+1
>>>>
>>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>> And now my problem:
>>>>
>>>> For all VM I have one pool for VM disks
>>>>
>>>> When node/host pve-3111 is shutdown in many of other nodes/hosts pve-3107, pve-3105 VM not
>>>> shutdown but not available in network.
>>>>
>>>> After the node/host is up Ceph back to HEALTH_OK and the all VM back to access in Network (without
>>>> reboot).
>>>>
>>>> Can some one to suggest me what I can to check in Ceph ?
>>>>
>>>> Thanks.
>>>>
>
--
-------------------------
С уважением
Сергей Цаболов,
Системный администратор
ООО "Т8"
Тел.: +74992716161,
Моб: +79850334875
tsabolov@t8.ru
ООО «Т8», 107076, г. Москва, Краснобогатырская ул., д. 44, стр.1
www.t8.ru
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PVE-User] [ceph-users] Re: Ceph Usage web and terminal.
2021-12-29 14:06 ` Сергей Цаболов
@ 2021-12-29 14:13 ` Uwe Sauter
0 siblings, 0 replies; 6+ messages in thread
From: Uwe Sauter @ 2021-12-29 14:13 UTC (permalink / raw)
To: Сергей
Цаболов,
Proxmox VE user list
Am 29.12.21 um 15:06 schrieb Сергей Цаболов:
> Ok, I understand the case.
>
> 29.12.2021 16:13, Uwe Sauter пишет:
>> Am 29.12.21 um 13:51 schrieb Сергей Цаболов:
>>> Hi, Uwe
>>>
>>> 29.12.2021 14:16, Uwe Sauter пишет:
>>>> Just a feeling but I'd say that the imbalance in OSDs (one host having many more disks than the
>>>> rest) is your problem.
>>> Yes, last node in cluster have more disk then the rest, but
>>>
>>> one disk is 12TB and all others 9 HD is 1TB
>>>
>>>> Assuming that your configuration keeps 3 copies of each VM image then the imbalance probably means
>>>> that 2 of these 3 copies reside on pve-3111 and if this host is unavailable, all VM images with 2
>>>> copies on that host become unresponsive, too.
>>> In Proxmox web ceph pool I set the Size: 2 , Min.Size: 2
>>>
>> So this means that you want to have 2 copies in the regular case (size) and also 2 copies in the
>> failure case (min size) so that the VMs stay available.
> Yes I think before like you answer, but is not so worked.
>>
>> So you might solve your problem by decreasing min size to 1 (dangerous!!) or by increasing size to
>> 3, which means that in the regular case you will have 3 copies but if only 2 are available, it will
>> still work and re-sync the 3rd copy once it comes online again.
>
> I understand if decreasing min.size to 1 is very (dangerous!!!)
>
> If I increasing to 3 min.size keep 2 is default .
>
> But I'm afraid if set the 3/2 (good choice) MAX AVAIL in pool is will decrease in two or more space,
> or am I wrong?
Hoping I understood you correctly:
With size=2, min.size=2 you get 50% of your raw storage space as usable storage space (because you
keep 2 copies of usable data).
Increasing size to 3 will naturally decrease the usable storage space to 33% of raw storage space
because you will keep 3 copies of usable data. This might be the price you need to pay to keep your
cluster running.
There are other options to keep more of the storage space as usable (like erasure coding your data,
comparable to what RAID 5 or 6 does) but those options have other implications on availability and
performance. And I don't know enough abouth Ceph configuration to be of any help with these options.
Regards,
Uwe
> For now I have with all disk :
>
> CLASS SIZE AVAIL USED RAW USED %RAW USED
> hdd `106 TiB 99 TiB 7.7 TiB 7.7 TiB 7.26
> TOTAL 106 TiB 99 TiB 7.7 TiB 7.7 TiB 7.26
>
> --- POOLS ---
> POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
> device_health_metrics 1 1 8.3 MiB 22 17 MiB 0 44 TiB
> vm.pool 2 1024 3.0 TiB 864.55k 6.0 TiB 6.39 44
> TiB ( terminal 44 TiB = 48.37 ) in web I see 51.50 TB
> cephfs_data 3 32 874 GiB 223.76k 1.7 TiB 1.91 44 TiB
> cephfs_metadata 4 32 25 MiB 27 51 MiB 0 44 TiB
>
>
> Am I right in my reasoning ?
>
> Thank you!
>
>
>
>>
>>> With : ceph osd map vm.pool object-name (vm ID) I see some of vm object one copy is on osd.12,
>>> example :
>>>
>>> osdmap e14321 pool 'vm.pool' (2) object '114' -> pg 2.10486407 (2.7) -> up ([12,8], p12) acting
>>> ([12,8], p12)
>>>
>>> But this example :
>>>
>>> osdmap e14321 pool 'vm.pool' (2) object '113' -> pg 2.8bd09f6d (2.36d) -> up ([10,7], p10) acting
>>> ([10,7], p10)
>>>
>>> osd.10 and osd.7
>>>
>>>> Check your failure domain for Ceph and possibly change it from OSD to host. This should prevent
>>>> that
>>>> one host holds multiple copies of a VM image.
>>> I didn 't understand a little what to check ?
>>>
>>> Can you explain me with example?
>>>
>> I don't have an example but you can read about the concept at:
>>
>> https://docs.ceph.com/en/latest/rados/operations/crush-map/#crush-maps
>>
>>
>> Regards,
>>
>> Uwe
>>
>>
>>
>>>>
>>>> Regards,
>>>>
>>>> Uwe
>>>>
>>>> Am 29.12.21 um 09:36 schrieb Сергей Цаболов:
>>>>> Hello to all.
>>>>>
>>>>> In my case I have the 7 node cluster Proxmox and working Ceph (ceph version 15.2.15 octopus
>>>>> (stable)": 7)
>>>>>
>>>>> Ceph HEALTH_OK
>>>>>
>>>>> ceph -s
>>>>> cluster:
>>>>> id: 9662e3fa-4ce6-41df-8d74-5deaa41a8dde
>>>>> health: HEALTH_OK
>>>>>
>>>>> services:
>>>>> mon: 7 daemons, quorum pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109
>>>>> (age 17h)
>>>>> mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, pve-3105, pve-3101,
>>>>> pve-3111,
>>>>> pve-3108
>>>>> mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby
>>>>> osd: 22 osds: 22 up (since 17h), 22 in (since 17h)
>>>>>
>>>>> task status:
>>>>>
>>>>> data:
>>>>> pools: 4 pools, 1089 pgs
>>>>> objects: 1.09M objects, 4.1 TiB
>>>>> usage: 7.7 TiB used, 99 TiB / 106 TiB avail
>>>>> pgs: 1089 active+clean
>>>>>
>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ceph osd tree
>>>>>
>>>>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>>>>> -1 106.43005 root default
>>>>> -13 14.55478 host pve-3101
>>>>> 10 hdd 7.27739 osd.10 up 1.00000 1.00000
>>>>> 11 hdd 7.27739 osd.11 up 1.00000 1.00000
>>>>> -11 14.55478 host pve-3103
>>>>> 8 hdd 7.27739 osd.8 up 1.00000 1.00000
>>>>> 9 hdd 7.27739 osd.9 up 1.00000 1.00000
>>>>> -3 14.55478 host pve-3105
>>>>> 0 hdd 7.27739 osd.0 up 1.00000 1.00000
>>>>> 1 hdd 7.27739 osd.1 up 1.00000 1.00000
>>>>> -5 14.55478 host pve-3107
>>>>> 2 hdd 7.27739 osd.2 up 1.00000 1.00000
>>>>> 3 hdd 7.27739 osd.3 up 1.00000 1.00000
>>>>> -9 14.55478 host pve-3108
>>>>> 6 hdd 7.27739 osd.6 up 1.00000 1.00000
>>>>> 7 hdd 7.27739 osd.7 up 1.00000 1.00000
>>>>> -7 14.55478 host pve-3109
>>>>> 4 hdd 7.27739 osd.4 up 1.00000 1.00000
>>>>> 5 hdd 7.27739 osd.5 up 1.00000 1.00000
>>>>> -15 19.10138 host pve-3111
>>>>> 12 hdd 10.91409 osd.12 up 1.00000 1.00000
>>>>> 13 hdd 0.90970 osd.13 up 1.00000 1.00000
>>>>> 14 hdd 0.90970 osd.14 up 1.00000 1.00000
>>>>> 15 hdd 0.90970 osd.15 up 1.00000 1.00000
>>>>> 16 hdd 0.90970 osd.16 up 1.00000 1.00000
>>>>> 17 hdd 0.90970 osd.17 up 1.00000 1.00000
>>>>> 18 hdd 0.90970 osd.18 up 1.00000 1.00000
>>>>> 19 hdd 0.90970 osd.19 up 1.00000 1.00000
>>>>> 20 hdd 0.90970 osd.20 up 1.00000 1.00000
>>>>> 21 hdd 0.90970 osd.21 up 1.00000 1.00000
>>>>>
>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
>>>>> vm.pool 2 1024 3.0 TiB 863.31k 6.0 TiB 6.38 44 TiB (this
>>>>> pool
>>>>> have the all VM disk)
>>>>>
>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ceph osd map vm.pool vm.pool.object
>>>>> osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 (2.d5) -> up ([2,4], p2)
>>>>> acting ([2,4], p2)
>>>>>
>>>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> pveversion -v
>>>>> proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
>>>>> pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
>>>>> pve-kernel-helper: 6.4-8
>>>>> pve-kernel-5.4: 6.4-7
>>>>> pve-kernel-5.4.143-1-pve: 5.4.143-1
>>>>> pve-kernel-5.4.106-1-pve: 5.4.106-1
>>>>> ceph: 15.2.15-pve1~bpo10
>>>>> ceph-fuse: 15.2.15-pve1~bpo10
>>>>> corosync: 3.1.2-pve1
>>>>> criu: 3.11-3
>>>>> glusterfs-client: 5.5-3
>>>>> ifupdown: residual config
>>>>> ifupdown2: 3.0.0-1+pve4~bpo10
>>>>> ksm-control-daemon: 1.3-1
>>>>> libjs-extjs: 6.0.1-10
>>>>> libknet1: 1.22-pve1~bpo10+1
>>>>> libproxmox-acme-perl: 1.1.0
>>>>> libproxmox-backup-qemu0: 1.1.0-1
>>>>> libpve-access-control: 6.4-3
>>>>> libpve-apiclient-perl: 3.1-3
>>>>> libpve-common-perl: 6.4-4
>>>>> libpve-guest-common-perl: 3.1-5
>>>>> libpve-http-server-perl: 3.2-3
>>>>> libpve-storage-perl: 6.4-1
>>>>> libqb0: 1.0.5-1
>>>>> libspice-server1: 0.14.2-4~pve6+1
>>>>> lvm2: 2.03.02-pve4
>>>>> lxc-pve: 4.0.6-2
>>>>> lxcfs: 4.0.6-pve1
>>>>> novnc-pve: 1.1.0-1
>>>>> proxmox-backup-client: 1.1.13-2
>>>>> proxmox-mini-journalreader: 1.1-1
>>>>> proxmox-widget-toolkit: 2.6-1
>>>>> pve-cluster: 6.4-1
>>>>> pve-container: 3.3-6
>>>>> pve-docs: 6.4-2
>>>>> pve-edk2-firmware: 2.20200531-1
>>>>> pve-firewall: 4.1-4
>>>>> pve-firmware: 3.3-2
>>>>> pve-ha-manager: 3.1-1
>>>>> pve-i18n: 2.3-1
>>>>> pve-qemu-kvm: 5.2.0-6
>>>>> pve-xtermjs: 4.7.0-3
>>>>> qemu-server: 6.4-2
>>>>> smartmontools: 7.2-pve2
>>>>> spiceterm: 3.1-1
>>>>> vncterm: 1.6-2
>>>>> zfsutils-linux: 2.0.6-pve1~bpo10+1
>>>>>
>>>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> And now my problem:
>>>>>
>>>>> For all VM I have one pool for VM disks
>>>>>
>>>>> When node/host pve-3111 is shutdown in many of other nodes/hosts pve-3107, pve-3105 VM not
>>>>> shutdown but not available in network.
>>>>>
>>>>> After the node/host is up Ceph back to HEALTH_OK and the all VM back to access in Network (without
>>>>> reboot).
>>>>>
>>>>> Can some one to suggest me what I can to check in Ceph ?
>>>>>
>>>>> Thanks.
>>>>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-12-29 14:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <e6b7d3f3-d6ce-ef41-cfa8-36b011243ebc@t8.ru>
[not found] ` <6f23d719-1931-cc81-899d-3202047c4a56@binovo.es>
[not found] ` <101971ad-519a-9af2-249e-433df28b1f1a@t8.ru>
2021-12-29 8:36 ` [PVE-User] [ceph-users] Re: Ceph Usage web and terminal Сергей Цаболов
2021-12-29 11:16 ` Uwe Sauter
2021-12-29 12:51 ` Сергей Цаболов
2021-12-29 13:13 ` Uwe Sauter
2021-12-29 14:06 ` Сергей Цаболов
2021-12-29 14:13 ` Uwe Sauter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox