From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 01B008E89C for ; Sun, 13 Nov 2022 18:40:47 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D3D8F1C02B for ; Sun, 13 Nov 2022 18:40:16 +0100 (CET) Received: from picard.linux.it (picard.linux.it [IPv6:2001:1418:10:5::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Sun, 13 Nov 2022 18:40:16 +0100 (CET) Received: by picard.linux.it (Postfix, from userid 10) id E0B303CD32E; Sun, 13 Nov 2022 18:40:03 +0100 (CET) Received: from news by eraldo.lilliput.linux.it with local (Exim 4.89) (envelope-from ) id 1ouGtl-0004Z9-ET for pve-user@lists.proxmox.com; Sun, 13 Nov 2022 18:36:01 +0100 From: Marco Gaiarin Date: Sun, 13 Nov 2022 18:24:28 +0100 Organization: Il gaio usa sempre TIN per le liste, fallo anche tu!!! Message-ID: X-Trace: eraldo.lilliput.linux.it 1668360275 16448 192.168.1.24 (13 Nov 2022 17:24:35 GMT) X-Mailer: tin/2.6.2-20220130 ("Convalmore") (Linux/5.15.0-52-generic (x86_64)) X-Gateway-System: SmartGate 1.4.5 To: pve-user@lists.proxmox.com X-SPAM-LEVEL: Spam detection results: 0 AWL -1.255 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% JMQ_SPF_NEUTRAL 0.5 SPF set to ?all KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_TIME 3 Pssss. Hey Buddy, wanna buy a watch? SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [PVE-User] Replica stuck after a network outgage... X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Nov 2022 17:40:47 -0000 Situation: two servers with a direct link with a 10Gbit/s speed; after creating a new VMs on one side, and filled up with data, i've enabled replica. Network link goes down, i'm investigating, but now work. Sending side i catch: 2022-11-13 18:16:01 123-0: start replication job 2022-11-13 18:16:01 123-0: guest => VM 123, running => 6101 2022-11-13 18:16:01 123-0: volumes => local-zfs:vm-123-disk-0,rpool-data:vm-123-disk-0 2022-11-13 18:16:02 123-0: create snapshot '__replicate_123-0_1668359761__' on local-zfs:vm-123-disk-0 2022-11-13 18:16:02 123-0: create snapshot '__replicate_123-0_1668359761__' on rpool-data:vm-123-disk-0 2022-11-13 18:16:02 123-0: using insecure transmission, rate limit: 50 MByte/s 2022-11-13 18:16:02 123-0: full sync 'local-zfs:vm-123-disk-0' (__replicate_123-0_1668359761__) 2022-11-13 18:16:02 123-0: using a bandwidth limit of 50000000 bps for transferring 'local-zfs:vm-123-disk-0' 2022-11-13 18:16:04 123-0: full send of rpool/data/vm-123-disk-0@__replicate_123-0_1668359761__ estimated size is 17.9G 2022-11-13 18:16:04 123-0: total estimated size is 17.9G 2022-11-13 18:16:04 123-0: 1164 B 1.1 KB 0.44 s 2616 B/s 2.55 KB/s 2022-11-13 18:16:04 123-0: write: Broken pipe 2022-11-13 18:16:04 123-0: warning: cannot send 'rpool/data/vm-123-disk-0@__replicate_123-0_1668359761__': signal received 2022-11-13 18:16:04 123-0: cannot send 'rpool/data/vm-123-disk-0': I/O error 2022-11-13 18:16:04 123-0: command 'zfs send -Rpv -- rpool/data/vm-123-disk-0@__replicate_123-0_1668359761__' failed: exit code 1 2022-11-13 18:16:04 123-0: [svpve1] volume 'rpool/data/vm-123-disk-0' already exists 2022-11-13 18:16:04 123-0: delete previous replication snapshot '__replicate_123-0_1668359761__' on local-zfs:vm-123-disk-0 2022-11-13 18:16:04 123-0: delete previous replication snapshot '__replicate_123-0_1668359761__' on rpool-data:vm-123-disk-0 2022-11-13 18:16:04 123-0: end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-123-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_123-0_1668359761__ | /usr/bin/cstream -t 50000000' failed: exit code 2 and receiving side i see two processes stuck: root@svpve1:~# ps aux | grep [z]fs root 23000 0.0 0.1 301468 81720 ? Ss 12:10 0:01 /usr/bin/perl /usr/sbin/pvesm import rpool-data:vm-123-disk-0 zfs tcp://10.5.251.0/24 -with-snapshots 1 -allow-rename 0 root 23003 0.0 0.0 8956 3404 ? S 12:10 0:11 zfs recv -F -- rpool-data/vm-123-disk-0 time of processes match with the crash time. Can i safely kill them? Thanks. -- Risulterebbe che i due ladroni crocefissi accanto al Signore fossero socialisti: infatti erano ladri e occupavano due posti su tre. (Anonimo)