From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 5992B6BBFA for ; Mon, 14 Dec 2020 13:52:55 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 4FF1516165 for ; Mon, 14 Dec 2020 13:52:55 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 321DE16159 for ; Mon, 14 Dec 2020 13:52:54 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id F3C0E45100 for ; Mon, 14 Dec 2020 13:52:53 +0100 (CET) Date: Mon, 14 Dec 2020 13:52:52 +0100 From: Wolfgang Bumiller To: Thomas Lamprecht Cc: Proxmox Backup Server development discussion , Dominik Csapak Message-ID: <20201214125252.py4shrxos24zrpqs@wobu-vie.proxmox.com> References: <20201211120859.17323-1-d.csapak@proxmox.com> <20201211120859.17323-2-d.csapak@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.018 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox 1/2] add tools/zero: add fast zero comparison code X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Dec 2020 12:52:55 -0000 Some testing & internal talk led to the decision to exclude this patch. apart from being incomplete (some alignment issues aren't handled), rustc itself is very capable of producing fast SSE code for this, if you know *how*: Assuming an `fn is_zero(buf: &[u8]) -> bool`: a) `buf.contains(&0)` compiles to a naive loop, slow b) `buf.iter().fold(0, |a, b| a | b) == 0` produces fast SSE code loading 128 bytes at a time (sort of) into xmm registers, (pretty much the code from this commit, but better), however, this doesn't stop at the first non-zero c) ``` buf .chunks(128) .map(|aa| aa.iter().fold(0, |a, b| a|b) != 0) .any(|a| a) ``` A compromise suggested by Fabian G. Much like case (b), the inner loop loads 128 bytes directly via sse instructions, but we also have the outer chunks to stop early On Mon, Dec 14, 2020 at 09:38:49AM +0100, Thomas Lamprecht wrote: > On 11.12.20 13:08, Dominik Csapak wrote: > > that can make use of see/avx instructions where available > > > > maybe some performance numbers can help to argue why we should add > that, maybe directly as small benchmark binary so different CPUs > could be compared? > > > this is mostly a direct translation of qemu's util/bufferiszero.c > > > > this is originally from Wolfgang Bumiller > > FYI, you could use the > > Originally-by: Wolfgang Bumiller > > git trailer for that, I saw it a few times used in other projects (e.g., > kernel)