From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 616FB98A69 for ; Wed, 26 Apr 2023 14:40:45 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 3E7801E329 for ; Wed, 26 Apr 2023 14:40:15 +0200 (CEST) Received: from picard.linux.it (picard.linux.it [213.254.12.146]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 26 Apr 2023 14:40:14 +0200 (CEST) Received: by picard.linux.it (Postfix, from userid 10) id E71653CBB4E; Wed, 26 Apr 2023 14:40:06 +0200 (CEST) Received: from news by eraldo.lilliput.linux.it with local (Exim 4.89) (envelope-from ) id 1preNV-0007Yu-Ld for pve-user@lists.proxmox.com; Wed, 26 Apr 2023 14:36:09 +0200 From: Marco Gaiarin Date: Wed, 26 Apr 2023 12:48:48 +0200 Organization: Il gaio usa sempre TIN per le liste, fallo anche tu!!! Message-ID: X-Trace: eraldo.lilliput.linux.it 1682512339 28769 192.168.24.2 (26 Apr 2023 12:32:19 GMT) X-Mailer: tin/2.6.2-20220130 ("Convalmore") (Linux/5.15.0-71-generic (x86_64)) X-Gateway-System: SmartGate 1.4.5 To: pve-user@lists.proxmox.com X-SPAM-LEVEL: Spam detection results: 0 AWL -0.729 Adjusted score from AWL reputation of From: address BAYES_20 -0.001 Bayes spam probability is 5 to 20% DMARC_MISSING 0.1 Missing DMARC policy JMQ_SPF_NEUTRAL 0.5 SPF set to ?all KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: [PVE-User] Peak load at 7.30AM... X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Apr 2023 12:40:45 -0000 Situation: a debian stretch mostly 'samba server' for a 150+ clients, in a couple of phisical server; VM get replicated between the two server every 30 minutes. Very frequently at 7.30 the VM got a high peak rate, becaming mostly irresponsive; after fiddling a bit, i've added a watchdog LOAD limit, and the VM now reboot. Looking at logs, it seems caused by the replica of the 7.30: Apr 26 07:30:12 vdmsv1 qemu-ga: info: guest-ping called Apr 26 07:30:13 vdmsv1 qemu-ga: info: guest-fsfreeze called and after some (6 to 8 minutes) watchdog reset it: Apr 26 07:36:11 vdmsv1 watchdog[2525]: loadavg 57 33 14 is higher than the given threshold 80 32 16! Apr 26 07:36:11 vdmsv1 watchdog[2525]: shutting down the system because of error 253 = 'load average too high' Some notes: 1) the replica run every 30 minutes; no other of the 47 replicas of the day seems sufficient to trigger a reboot. 2) the phisical server during the high peak seems totally unaffected (no high iodelay, no sensible load/cpu...). 3) at 7.30 there's no user (they arrive around 8.15). I'm doing some hypotesis; for example debian by default rotate logs at 6.30, but looking at file dates logs are completely rotated before the 7.00, so the 7.00 replica could have triggered the reboot... Someone have some hint on how can i debug this!? Thanks. -- E allora osservi gli altri giocare e` un gioco strano devi imparare (E. Bennato)