From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 8BA339C34E for ; Wed, 31 May 2023 16:55:30 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 6C15DBEC5 for ; Wed, 31 May 2023 16:55:30 +0200 (CEST) Received: from mout.web.de (mout.web.de [212.227.17.11]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 31 May 2023 16:55:28 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=s29768273; t=1685544922; x=1686149722; i=devzero@web.de; bh=NkBKM9A6aHc2iNxNOMN+Wzwef3hmu5QGSfuBCsmZxUc=; h=X-UI-Sender-Class:Date:Subject:To:Cc:References:From:In-Reply-To; b=vPR3po0h2I8hgaYU+WBGZiYxocxUU6/9z8AQeJZ1RqMkpfzKsFUMT4sBbs5U/ggRveo3SsF BeJMZ6pQZuuKon4WPjsIoXL6hv66bJYGt6Gv2FvsL5NPOWB4+ejM5tiIlEnPBftc2W4/vyK3Q AE6EldZelFX6CGrMqPlKfSB6AUF++/E6qlLhMHwRhYgj37XHmLlqrWtie40ceN4a6sY8YlXR0 sbNqFJzoJirQYgNjz39dM2eApaEZ/4MkJ854HI9Viou6TQThCBWmIwo/V09MPRLxXnKyh555q iQkEJUS/SJFDEsIaa+jRwMt7X9OP7htt24bLDfk5t7S6NN+9ClQw== X-UI-Sender-Class: 814a7b36-bfc1-4dae-8640-3722d8ec6cd6 Received: from [172.20.35.164] ([37.24.118.138]) by smtp.web.de (mrweb106 [213.165.67.124]) with ESMTPSA (Nemesis) id 1M8TBS-1pzxNa17jM-004Ywq; Wed, 31 May 2023 16:55:22 +0200 Message-ID: <70805eb7-a98e-8169-dfe0-93bfe2f67de5@web.de> Date: Wed, 31 May 2023 16:55:21 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.11.1 To: Christian Schoepplein Cc: pve-user@lists.proxmox.com References: From: Roland In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Provags-ID: V03:K1:2B0X3Ska//+en6THZvXccbmQ6LFxhb/HRMlKXe2ynnjNQftCnl8 srPo4ZcdjWuXG+JLIV3/2KH9GeZFZoEt/Dan0VjLmcIAYoTInoJxSkR0Zk8FhGzOmxuilVt cCW9KQfdQUx6HYH6yIfYwfjRS0dXgv3USNZelixVdVdIPQU4p6ABI7XE0L/A5tmYpTivwKc VkqIH8XLNlyH37Ef6sDHQ== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:pwgqOnoqQYc=;EcUFjtK9zQEIMZ41lEKs019aYRW +dlVH9+Ufq3njCBjfRNEVDm9rpLO0y0VWfpYrNccCAKKuGuZAcwZX0W4se0qmIMgjXxC0uiQ4 9Dp8FYA44/ch8D0BmhzEUewLLD5oHkL0RmC72Taa48Z2cVL28D6Lbt1vBa9f0YCb/C3J+N2XU EGsdFsnyQELnKGX3NlXijdNVLqquOxi1bmf7qNhTCmo60oHd8p1jENEjJrl2AEarszi+9L+sc xSfBARygPa1GlDV98Y9AiGPgutVo2EEjElEbGkaixGPr2o081ZrVm7XiMkGQUrKobvmS/Vt48 cd0pBD+12Ai031MO6Txt76P/uF2ae+VAxtq+GgXERva/CSw4nrLE/ol6+kz9ua7vymFGTFYiu hcFU9FZXUtnqIWlTzVHFjDSZmQvSXl0hOadUbU1mJSBOQRGPdp0M+7Ii2ZpecfnP8GGHfqfOZ p+5HSKj29dPvqByuzmx53Pu40sw9BufrHvMp0/TV8F0Qk7fbJWuwbK/HWvJQfG8BIJR0d93z1 Lju/tUzQuouvlsv7BE71aovavIBB/IK+ZTZYik9r6c4QpQEM9rHfYPYTBFTxyhxDasSD5seit pi3YnP9GuwlHoeUWAZvObu/Bk56f8b+0ySaExiNU5K8H7jQeXpnuJhcT1fXm36hD7sq9kxm3l o1seBBZM0x+6/JZsccF/6pv0ez12BOIG1tFO5sv+gYaWT2q/ZWFhQXcxsWqlkP6e5YiLh7M/d Y6NJehnXxVQX1AlT3mNdqMzfjO159cip0/ezCySqg2Q0oIRW8TTluOuYpPm2gk69qFjwHtmTE ysXVx6VFpcpzoggz4E5IHUVY0ox5T0aQRZbBCGh+1Tja3uV1ctQFcyyNYTzPSc/JRREVUJ27B J0nv7G/zDmkVEPgajncVLhXN8EPoXc7/koPwuPzcE9qEI/NVhN2LYEaWKjkq0e3gYOgdLNrd0 XzzOXOPiRe6Ti5W96di8vVYr+u0= X-SPAM-LEVEL: Spam detection results: 0 AWL 0.140 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain DMARC_PASS -0.1 DMARC pass policy FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider NICE_REPLY_A -0.09 Looks like a legit reply (A) RCVD_IN_MSPIKE_H3 0.001 Good reputation (+3) RCVD_IN_MSPIKE_WL 0.001 Mailspike good senders SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: Re: [PVE-User] Proxmox and glusterfs: VMs get corupted X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 May 2023 14:55:30 -0000 >Was k=C3=B6nnte ich denn noch versuchen? W=C3=BCrde es vielleicht Sinn ma= chen das >Image-Format von qcow2 auf raw umzustellen? wir ja, ich w=C3=BCrd mal testhalber auf raw umstellen. und discard aus machen= . >wir haben qcow2 vor allem wegen >der Snapshots und Platzersparnis gew=C3=A4hlt dito. fahren damit ganz gut, obwohl wir qcow2 on top zfs datasets haben...= . >Wir haben glusterfs deshalb gew=C3=A4hlt, weil es uns am unkompliziertest= en >schien und weil wir etwas Respekt vor z.B. Ceph haben. das geht mir exakt auch so, ich habe deshalb bislang einen bogen um ceph g= emacht und schiele auch schon eine weile auf glusterfs, habe aber den eindruck da=C3=9F das i= m kontext proxmox (oder auch sonstwie) irgendwie sehr exotisch ist (sonst h=C3=A4tte hier vielleicht au= ch mal wer geantwortet https://forum.proxmox.com/threads/glusterfs-sharding-afr-question.118554/ = ). wir haben aus diesem grund bislang auf shared storage verzichtet. ich bin irgendwie gebranntes kind was san und shared storage angeht, habe = fr=C3=BCher sogar mal SANs inkl. San-Virtualisierung mit IBM SVC an der backe gehabt und auch no= ch nie so viel =C3=A4rger mit IT gehabt und auch nie so schlecht geschlafen) wir haben aktuell nur local storage , setzen auf cold-standy ausfall-hardw= are und replizieren die lokalen storages mit sanoid auf ein ausfall-system. ich h=C3=A4tte aber auch lieber eine einfache und gut wartbare online/redu= ndanz-l=C3=B6sung. mit ceph kann ich mir nicht anfreunden. und wenn ich so lese was leute da = f=C3=BCr n=C3=B6te ud herausforderungen mit haben bin ich auch offengesagt froh drum es NICHT an= der backe zu haben. gr=C3=BCsse roland Am 31.05.23 um 16:23 schrieb Christian Schoepplein: > Hallo Roland, > > danke f=C3=BCr deine Antwort und Tipps. > > Ich hab nun mehrere hundert gr=C3=B6=C3=9Fere und sehr gro=C3=9Fe Files = nach > /mnt/pve/gfs_vms geschrieben und die md5-Summen verglichen, alles kein > Problem. Auch beim Lesen nicht. > > Wenn ich aio auf threads setze, wird es gef=C3=BChlt leider sogar noch s= chlimmer > mit den kaputten VMs. Ich hab Folgendes in der VM Konfig stehen: > > scsi0: gfs_vms:200/vm-200-disk-0.qcow2,discard=3Don,aio=3Dthreads,size= =3D10444M > > Ist das so richtig? Laut den Prozessen sollte es stimmen: > > root 1708993 4.3 1.7 3370764 1174016 ? Sl 15:32 1:40 > /usr/bin/kvm -id 200 -name testvm,debug-threads=3Don -no-shutdown -chard= ev socket,id=3Dqmp,path=3D/var/run/qemu-server/200.qmp,server=3Don,wait=3D= off -mon chardev=3Dqmp,mode=3Dcontrol -ch > ardev socket,id=3Dqmp-event,path=3D/var/run/qmeventd.sock,reconnect=3D5 = -mon > chardev=3Dqmp-event,mode=3Dcontrol -pidfile /var/run/qemu-server/200.pid= -daemonize -smbios type=3D1,uuid=3D0da99a1f-a9ac-4999-a6c4-203cd39ff72e -= smp 1,sockets=3D1,cores=3D1,maxcpus > =3D1 -nodefaults -boot > menu=3Don,strict=3Don,reboot-timeout=3D1000,splash=3D/usr/share/qemu-ser= ver/bootsplash.jpg -vnc unix:/var/run/qemu-server/200.vnc,password=3Don -c= pu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 2048 -object = memory-ba > ckend-ram,id=3Dram-node0,size=3D2048M -numa > node,nodeid=3D0,cpus=3D0,memdev=3Dram-node0 -readconfig /usr/share/qemu-= server/pve-q35-4.0.cfg -device vmgenid,guid=3Ddc4109a1-7b6f-4735-9685-ca50= a38744e2 -device usb-tablet,id=3Dtablet,bus=3Dehci.0,port=3D1 -chard > ev > socket,id=3Dserial0,path=3D/var/run/qemu-server/200.serial0,server=3Don,= wait=3Doff -device isa-serial,chardev=3Dserial0 -device VGA,id=3Dvga,bus= =3Dpcie.0,addr=3D0x1 -chardev socket,path=3D/var/run/qemu-server/200.qga,s= erver=3Don,wait=3Doff,id=3Dqga0 -device vir > tio-serial,id=3Dqga0,bus=3Dpci.0,addr=3D0x8 -device > virtserialport,chardev=3Dqga0,name=3Dorg.qemu.guest_agent.0 -device virt= io-balloon-pci,id=3Dballoon0,bus=3Dpci.0,addr=3D0x3,free-page-reporting=3D= on -iscsi initiator-name=3Diqn.1993-08.org.debian:01:cbb6926f9 > 59d -drive > file=3Dgluster://gluster1.linova.de/gfs_vms/images/200/vm-200-cloudinit.= qcow2,if=3Dnone,id=3Ddrive-ide2,media=3Dcdrom,aio=3Dio_uring -device ide-c= d,bus=3Dide.1,unit=3D0,drive=3Ddrive-ide2,id=3Dide2 -device virtio-scsi-pc= i,id=3Dscsihw0,bus=3Dpci.0,addr > =3D0x5 -drive > file=3Dgluster://gluster1.linova.de/gfs_vms/images/200/vm-200-disk-0.qco= w2,if=3Dnone,id=3Ddrive-scsi0,aio=3Dthreads,discard=3Don,format=3Dqcow2,ca= che=3Dnone,detect-zeroes=3Dunmap -device scsi-hd,bus=3Dscsihw0.0,channel= =3D0,scsi-id=3D0,lun=3D0,drive=3Ddri > ve-scsi0,id=3Dscsi0,bootindex=3D101 -netdev > type=3Dtap,id=3Dnet0,ifname=3Dtap200i0,script=3D/var/lib/qemu-server/pve= -bridge,downscript=3D/var/lib/qemu-server/pve-bridgedown,vhost=3Don -devic= e virtio-net-pci,mac=3D5E:1F:9A:04:D6:6C,netdev=3Dnet0,bus=3Dpci.0,addr=3D > 0x12,id=3Dnet0,rx_queue_size=3D1024,tx_queue_size=3D1024 -machine type= =3Dq35+pve0 > > Ich werd das Ganze jeztt nochmal mit einem lokalen Storage Backend > probieren, geh aber davon aus, dass es damit l=C3=A4uft. > > Leider hat das gluster-Zeugs ein Kollege aufgesetzt, wenn es daran also > liegt, muss ich mich wohl n=C3=A4her damit besch=C3=A4ftigen... > > Wir haben glusterfs deshalb gew=C3=A4hlt, weil es uns am unkompliziertes= ten > schien und weil wir etwas Respekt vor z.B. Ceph haben. > > Was k=C3=B6nnte ich denn noch versuchen? W=C3=BCrde es vielleicht Sinn m= achen das > Image-Format von qcow2 auf raw umzustellen? wir haben qcow2 vor allem we= gen > der Snapshots und Platzersparnis gew=C3=A4hlt, falls das mit glusterfs n= icht > vern=C3=BCnftig funktioniert, m=C3=BCssten wir da ggf. auch nochmal scha= uen. > > Ich selbst habe bisher virtuelle Maschinen immer nur mit libvirt betrieb= en, > ohne ein zentrales Storage. Daher kommen gerade viele neue Themen zusamm= en, > die alle recht komplex sind :-(. Daher w=C3=A4re ich =C3=BCber jeden Tip= p f=C3=BCr ein > sinnvolles Setup froh :-). > > Ciao und danke, > > Christian > > > On Tue, May 30, 2023 at 06:46:51PM +0200, Roland wrote: >> if /mnt/pve/gfs_vms is a writeable path from inside pve host, did you c= heck if there is >> also corruption when reading/writing large files there and compare with= md5sum after copy ? >> >> furthermore, i remember there was a gluster/qcow2 issue with aio=3Dnati= ve some years ago, >> could you retry with aio=3Dthreads for the virtual disks ? >> >> regards >> roland >> >> Am 30.05.23 um 18:32 schrieb Christian Schoepplein: >>> Hi, >>> >>> we are testing the current proxmox version with a glusterfs storage ba= ckend >>> and have a strange issue with file getting corupted inside the virtual >>> machines. For what reason ever from one moment to another binaries can= not >>> longer be executed, scripts are damaged and so on. In the logs I get e= rrors >>> like this: >>> >>> May 30 11:22:36 ns1 dockerd[1234]: time=3D"2023-05-30T11:22:36.8747650= 91+02:00" level=3Dwarning msg=3D"Running modprobe bridge br_netfilter fail= ed with message: modprobe: ERROR: could not insert 'bridge': Exec format e= rror\nmodprobe: ERROR: could not insert 'br_netfilter': Exec format error\= ninsmod /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko \ninsmod /lib= /modules/5.15.0-72-generic/kernel/net/802/stp.ko \n, error: exit status 1" >>> >>> On such a broken system a file brings the following: >>> >>> root@ns1:~# file /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko >>> /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko: data >>> root@ns1:~# >>> >>> On a normal system it looks like this: >>> >>> root@gluster1:~# file /lib/modules/5.15.0-72-generic/kernel/net/802/st= p.ko >>> /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko: ELF 64-bit LSB >>> relocatable, x86-64, version 1 (SYSV), BuildID[sha1]=3D1084f7cfcffbd4c= 607724fba287c0ea7fc5775 >>> root@gluster1:~# >>> >>> there are not only kernel modules afected. I saw the same behaviour fo= r >>> scripts, icinga check modules, the sendmail binary and so on, I think = it is >>> totaly random :-(. >>> >>> We have the problems with newly installed VMs, VMs cloned from a templ= ate >>> create on our proxmox host and with VMs which we used before with libv= irtd >>> and migrated to our new proxmox machine. So IMHO it can not be related= to >>> the way we create new virtual machines... >>> >>> We are using the following software: >>> >>> root@proxmox1:~# pveversion -v >>> proxmox-ve: 7.4-1 (running kernel: 5.15.104-1-pve) >>> pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a) >>> pve-kernel-5.15: 7.4-1 >>> pve-kernel-5.15.104-1-pve: 5.15.104-2 >>> pve-kernel-5.15.102-1-pve: 5.15.102-1 >>> ceph-fuse: 15.2.17-pve1 >>> corosync: 3.1.7-pve1 >>> criu: 3.15-1+pve-1 >>> glusterfs-client: 9.2-1 >>> ifupdown2: 3.1.0-1+pmx3 >>> ksm-control-daemon: 1.4-1 >>> libjs-extjs: 7.0.0-1 >>> libknet1: 1.24-pve2 >>> libproxmox-acme-perl: 1.4.4 >>> libproxmox-backup-qemu0: 1.3.1-1 >>> libproxmox-rs-perl: 0.2.1 >>> libpve-access-control: 7.4-2 >>> libpve-apiclient-perl: 3.2-1 >>> libpve-common-perl: 7.3-4 >>> libpve-guest-common-perl: 4.2-4 >>> libpve-http-server-perl: 4.2-3 >>> libpve-rs-perl: 0.7.5 >>> libpve-storage-perl: 7.4-2 >>> libspice-server1: 0.14.3-2.1 >>> lvm2: 2.03.11-2.1 >>> lxc-pve: 5.0.2-2 >>> lxcfs: 5.0.3-pve1 >>> novnc-pve: 1.4.0-1 >>> proxmox-backup-client: 2.4.1-1 >>> proxmox-backup-file-restore: 2.4.1-1 >>> proxmox-kernel-helper: 7.4-1 >>> proxmox-mail-forward: 0.1.1-1 >>> proxmox-mini-journalreader: 1.3-1 >>> proxmox-widget-toolkit: 3.6.5 >>> pve-cluster: 7.3-3 >>> pve-container: 4.4-3 >>> pve-docs: 7.4-2 >>> pve-edk2-firmware: 3.20230228-2 >>> pve-firewall: 4.3-1 >>> pve-firmware: 3.6-4 >>> pve-ha-manager: 3.6.0 >>> pve-i18n: 2.12-1 >>> pve-qemu-kvm: 7.2.0-8 >>> pve-xtermjs: 4.16.0-1 >>> qemu-server: 7.4-3 >>> smartmontools: 7.2-pve3 >>> spiceterm: 3.2-2 >>> swtpm: 0.8.0~bpo11+3 >>> vncterm: 1.7-1 >>> zfsutils-linux: 2.1.9-pve1 >>> root@proxmox1:~# >>> >>> root@proxmox1:~# cat /etc/pve/storage.cfg >>> dir: local >>> path /var/lib/vz >>> content rootdir,iso,images,vztmpl,backup,snippets >>> >>> zfspool: local-zfs >>> pool rpool/data >>> content images,rootdir >>> sparse 1 >>> >>> glusterfs: gfs_vms >>> path /mnt/pve/gfs_vms >>> volume gfs_vms >>> content images >>> prune-backups keep-all=3D1 >>> server gluster1.linova.de >>> server2 gluster2.linova.de >>> >>> root@proxmox1:~# >>> >>> The config of a typical VM looks like this: >>> >>> root@proxmox1:~# cat /etc/pve/qemu-server/101.conf >>> #ns1 >>> agent: enabled=3D1,fstrim_cloned_disks=3D1 >>> boot: c >>> bootdisk: scsi0 >>> cicustom: user=3Dlocal:snippets/user-data >>> cores: 1 >>> hotplug: disk,network,usb >>> ide2: gfs_vms:101/vm-101-cloudinit.qcow2,media=3Dcdrom,size=3D4M >>> ipconfig0: ip=3D10.200.32.9/22,gw=3D10.200.32.1 >>> kvm: 1 >>> machine: q35 >>> memory: 2048 >>> meta: creation-qemu=3D7.2.0,ctime=3D1683718002 >>> name: ns1 >>> nameserver: 10.200.0.5 >>> net0: virtio=3D1A:61:75:25:C6:30,bridge=3Dvmbr0 >>> numa: 1 >>> ostype: l26 >>> scsi0: gfs_vms:101/vm-101-disk-0.qcow2,discard=3Don,size=3D10444M >>> scsihw: virtio-scsi-pci >>> searchdomain: linova.de >>> serial0: socket >>> smbios1: uuid=3De2f503fe-4a66-4085-86c0-bb692add6b7a >>> sockets: 1 >>> vmgenid: 3be6ec9d-7cfd-47c0-9f86-23c2e3ce5103 >>> >>> root@proxmox1:~# >>> >>> Our glusterfs storage backend consists of three servers all running Ub= untu >>> 22.04 and glusterfs version 10.1. There are no errors in the logs on t= he >>> glusterfs hosts when a VM crashes and because some times also icinga p= lugins >>> get corupted I do get a very exact time range to search in the logs fo= r >>> errors and warnings. >>> >>> However, I think it has something to do with our glusterfs setup. If I= clone >>> a VM from a template I get the following: >>> >>> root@proxmox1:~# qm clone 9000 200 --full --name testvm --description >>> "testvm" --storage gfs_vms = = [62/62] >>> create full clone of drive ide2 (gfs_vms:9000/vm-9000-cloudinit.qcow2) >>> Formatting >>> 'gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-cloudinit.qcow= 2', fmt=3Dqcow2 cluster_size=3D65536 extended_l2=3Doff preallocation=3Dmet= adata compression_type=3Dzlib size=3D4194304 lazy_refcounts=3Doff refcount= _bits=3D16 >>> [2023-05-30 16:18:17.753152 +0000] I >>> [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure i= os_sample_buf size is 1024 because ios_sample_interval is 0 >>> [2023-05-30 16:18:17.876879 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 0: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:18:17.877606 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 1: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:18:17.878275 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 2: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:18:27.761247 +0000] I [io-stats.c:4038:fini] 0-gfs_vms: >>> io-stats translator unloaded >>> [2023-05-30 16:18:28.766999 +0000] I >>> [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure i= os_sample_buf size is 1024 because ios_sample_interval is 0 >>> [2023-05-30 16:18:28.936449 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 0: >>> All subvolumes are down. Going offline until at least one of them come= s back up. >>> [2023-05-30 16:18:28.937547 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 1: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:18:28.938115 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 2: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:18:38.774387 +0000] I [io-stats.c:4038:fini] 0-gfs_vms: >>> io-stats translator unloaded >>> create full clone of drive scsi0 (gfs_vms:9000/base-9000-disk-0.qcow2) >>> Formatting >>> 'gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-disk-0.qcow2',= fmt=3Dqcow2 cluster_size=3D65536 extended_l2=3Doff preallocation=3Dmetada= ta compression_type=3Dzlib size=3D10951327744 lazy_refcounts=3Doff refcoun= t_bits=3D16 >>> [2023-05-30 16:18:39.962238 +0000] I >>> [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure i= os_sample_buf size is 1024 because ios_sample_interval is 0 >>> [2023-05-30 16:18:40.084300 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 0: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:18:40.084996 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 1: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:18:40.085505 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 2: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:18:49.970199 +0000] I [io-stats.c:4038:fini] 0-gfs_vms: >>> io-stats translator unloaded >>> [2023-05-30 16:18:50.975729 +0000] I >>> [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure i= os_sample_buf size is 1024 because ios_sample_interval is 0 >>> [2023-05-30 16:18:51.768619 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 0: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:18:51.769330 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 1: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:18:51.769822 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 2: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:19:00.984578 +0000] I [io-stats.c:4038:fini] 0-gfs_vms: >>> io-stats translator unloaded >>> transferred 0.0 B of 10.2 GiB (0.00%) >>> [2023-05-30 16:19:02.030902 +0000] I >>> [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure i= os_sample_buf size is 1024 because ios_sample_interval is 0 >>> transferred 112.8 MiB of 10.2 GiB (1.08%) >>> transferred 230.8 MiB of 10.2 GiB (2.21%) >>> transferred 340.5 MiB of 10.2 GiB (3.26%) >>> ... >>> transferred 10.1 GiB of 10.2 GiB (99.15%) >>> transferred 10.2 GiB of 10.2 GiB (100.00%) >>> transferred 10.2 GiB of 10.2 GiB (100.00%) >>> [2023-05-30 16:19:29.804006 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 0: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:19:29.804807 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 1: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:19:29.805486 +0000] E [MSGID: 108006] >>> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-= 2: All subvolumes are down. Going offline until at least one of them comes= back up. >>> [2023-05-30 16:19:32.044693 +0000] I [io-stats.c:4038:fini] 0-gfs_vms: >>> io-stats translator unloaded >>> root@proxmox1:~# >>> >>> Is this message about the subvolumes which are down normal or might th= is be >>> the reason for our strange problems? >>> >>> I have no idea how to further debug the problem so any helping idea or= hint >>> would be great. Pleae let me also know if I can provide more infos reg= arding >>> our setup. >>> >>> Ciao and thanks a lot, >>> >>> Schoepp >>>