From: Uwe Sauter <uwe.sauter.de@gmail.com>
To: "Proxmox VE user list" <pve-user@lists.proxmox.com>,
"Сергей Цаболов" <tsabolov@t8.ru>
Subject: Re: [PVE-User] [ceph-users] Re: Ceph Usage web and terminal.
Date: Wed, 29 Dec 2021 12:16:59 +0100 [thread overview]
Message-ID: <015106bc-726b-da07-c3cf-80b63197b2c7@gmail.com> (raw)
In-Reply-To: <0dd27e4e-391d-6262-bbf5-db84229accad@t8.ru>
Just a feeling but I'd say that the imbalance in OSDs (one host having many more disks than the
rest) is your problem.
Assuming that your configuration keeps 3 copies of each VM image then the imbalance probably means
that 2 of these 3 copies reside on pve-3111 and if this host is unavailable, all VM images with 2
copies on that host become unresponsive, too.
Check your failure domain for Ceph and possibly change it from OSD to host. This should prevent that
one host holds multiple copies of a VM image.
Regards,
Uwe
Am 29.12.21 um 09:36 schrieb Сергей Цаболов:
> Hello to all.
>
> In my case I have the 7 node cluster Proxmox and working Ceph (ceph version 15.2.15 octopus
> (stable)": 7)
>
> Ceph HEALTH_OK
>
> ceph -s
> cluster:
> id: 9662e3fa-4ce6-41df-8d74-5deaa41a8dde
> health: HEALTH_OK
>
> services:
> mon: 7 daemons, quorum pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 (age 17h)
> mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, pve-3105, pve-3101, pve-3111,
> pve-3108
> mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby
> osd: 22 osds: 22 up (since 17h), 22 in (since 17h)
>
> task status:
>
> data:
> pools: 4 pools, 1089 pgs
> objects: 1.09M objects, 4.1 TiB
> usage: 7.7 TiB used, 99 TiB / 106 TiB avail
> pgs: 1089 active+clean
>
> ---------------------------------------------------------------------------------------------------------------------
>
>
> ceph osd tree
>
> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
> -1 106.43005 root default
> -13 14.55478 host pve-3101
> 10 hdd 7.27739 osd.10 up 1.00000 1.00000
> 11 hdd 7.27739 osd.11 up 1.00000 1.00000
> -11 14.55478 host pve-3103
> 8 hdd 7.27739 osd.8 up 1.00000 1.00000
> 9 hdd 7.27739 osd.9 up 1.00000 1.00000
> -3 14.55478 host pve-3105
> 0 hdd 7.27739 osd.0 up 1.00000 1.00000
> 1 hdd 7.27739 osd.1 up 1.00000 1.00000
> -5 14.55478 host pve-3107
> 2 hdd 7.27739 osd.2 up 1.00000 1.00000
> 3 hdd 7.27739 osd.3 up 1.00000 1.00000
> -9 14.55478 host pve-3108
> 6 hdd 7.27739 osd.6 up 1.00000 1.00000
> 7 hdd 7.27739 osd.7 up 1.00000 1.00000
> -7 14.55478 host pve-3109
> 4 hdd 7.27739 osd.4 up 1.00000 1.00000
> 5 hdd 7.27739 osd.5 up 1.00000 1.00000
> -15 19.10138 host pve-3111
> 12 hdd 10.91409 osd.12 up 1.00000 1.00000
> 13 hdd 0.90970 osd.13 up 1.00000 1.00000
> 14 hdd 0.90970 osd.14 up 1.00000 1.00000
> 15 hdd 0.90970 osd.15 up 1.00000 1.00000
> 16 hdd 0.90970 osd.16 up 1.00000 1.00000
> 17 hdd 0.90970 osd.17 up 1.00000 1.00000
> 18 hdd 0.90970 osd.18 up 1.00000 1.00000
> 19 hdd 0.90970 osd.19 up 1.00000 1.00000
> 20 hdd 0.90970 osd.20 up 1.00000 1.00000
> 21 hdd 0.90970 osd.21 up 1.00000 1.00000
>
> ---------------------------------------------------------------------------------------------------------------
>
>
> POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
> vm.pool 2 1024 3.0 TiB 863.31k 6.0 TiB 6.38 44 TiB (this pool
> have the all VM disk)
>
> ---------------------------------------------------------------------------------------------------------------
>
>
> ceph osd map vm.pool vm.pool.object
> osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 (2.d5) -> up ([2,4], p2)
> acting ([2,4], p2)
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> pveversion -v
> proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
> pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
> pve-kernel-helper: 6.4-8
> pve-kernel-5.4: 6.4-7
> pve-kernel-5.4.143-1-pve: 5.4.143-1
> pve-kernel-5.4.106-1-pve: 5.4.106-1
> ceph: 15.2.15-pve1~bpo10
> ceph-fuse: 15.2.15-pve1~bpo10
> corosync: 3.1.2-pve1
> criu: 3.11-3
> glusterfs-client: 5.5-3
> ifupdown: residual config
> ifupdown2: 3.0.0-1+pve4~bpo10
> ksm-control-daemon: 1.3-1
> libjs-extjs: 6.0.1-10
> libknet1: 1.22-pve1~bpo10+1
> libproxmox-acme-perl: 1.1.0
> libproxmox-backup-qemu0: 1.1.0-1
> libpve-access-control: 6.4-3
> libpve-apiclient-perl: 3.1-3
> libpve-common-perl: 6.4-4
> libpve-guest-common-perl: 3.1-5
> libpve-http-server-perl: 3.2-3
> libpve-storage-perl: 6.4-1
> libqb0: 1.0.5-1
> libspice-server1: 0.14.2-4~pve6+1
> lvm2: 2.03.02-pve4
> lxc-pve: 4.0.6-2
> lxcfs: 4.0.6-pve1
> novnc-pve: 1.1.0-1
> proxmox-backup-client: 1.1.13-2
> proxmox-mini-journalreader: 1.1-1
> proxmox-widget-toolkit: 2.6-1
> pve-cluster: 6.4-1
> pve-container: 3.3-6
> pve-docs: 6.4-2
> pve-edk2-firmware: 2.20200531-1
> pve-firewall: 4.1-4
> pve-firmware: 3.3-2
> pve-ha-manager: 3.1-1
> pve-i18n: 2.3-1
> pve-qemu-kvm: 5.2.0-6
> pve-xtermjs: 4.7.0-3
> qemu-server: 6.4-2
> smartmontools: 7.2-pve2
> spiceterm: 3.1-1
> vncterm: 1.6-2
> zfsutils-linux: 2.0.6-pve1~bpo10+1
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> And now my problem:
>
> For all VM I have one pool for VM disks
>
> When node/host pve-3111 is shutdown in many of other nodes/hosts pve-3107, pve-3105 VM not
> shutdown but not available in network.
>
> After the node/host is up Ceph back to HEALTH_OK and the all VM back to access in Network (without
> reboot).
>
> Can some one to suggest me what I can to check in Ceph ?
>
> Thanks.
>
next prev parent reply other threads:[~2021-12-29 11:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <e6b7d3f3-d6ce-ef41-cfa8-36b011243ebc@t8.ru>
[not found] ` <6f23d719-1931-cc81-899d-3202047c4a56@binovo.es>
[not found] ` <101971ad-519a-9af2-249e-433df28b1f1a@t8.ru>
2021-12-29 8:36 ` Сергей Цаболов
2021-12-29 11:16 ` Uwe Sauter [this message]
2021-12-29 12:51 ` Сергей Цаболов
2021-12-29 13:13 ` Uwe Sauter
2021-12-29 14:06 ` Сергей Цаболов
2021-12-29 14:13 ` Uwe Sauter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=015106bc-726b-da07-c3cf-80b63197b2c7@gmail.com \
--to=uwe.sauter.de@gmail.com \
--cc=pve-user@lists.proxmox.com \
--cc=tsabolov@t8.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox