From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 8DD51719A5 for ; Thu, 19 May 2022 15:50:24 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7C58C8B27 for ; Thu, 19 May 2022 15:49:54 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 90EBB8B0E for ; Thu, 19 May 2022 15:49:52 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 56E5643705; Thu, 19 May 2022 15:49:52 +0200 (CEST) Message-ID: Date: Thu, 19 May 2022 15:49:50 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:101.0) Gecko/20100101 Thunderbird/101.0 Content-Language: en-US To: Proxmox Backup Server development discussion , =?UTF-8?Q?Fabian_Gr=c3=bcnbichler?= References: <20220518112455.2393668-1-d.csapak@proxmox.com> <1652950355.dp00h7vxx4.astroid@nora.none> Cc: Wolfgang Bumiller , Thomas Lamprecht , Dietmar Maurer From: Dominik Csapak In-Reply-To: <1652950355.dp00h7vxx4.astroid@nora.none> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.475 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.717 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [pbs-devel] [RFC PATCH proxmox-backup] datastore: implement consitency tuning for datastores X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 May 2022 13:50:24 -0000 On 5/19/22 15:35, Fabian Grünbichler wrote: > On May 18, 2022 1:24 pm, Dominik Csapak wrote: >> currently, we don't (f)sync on chunk insertion (or at any point after >> that), which can lead to broken chunks in case of e.g. an unexpected >> powerloss. To fix that, offer a tuning option for datastores that >> controls the level of syncs it does: >> >> * None (old default): same as current state, no (f)syncs done at any point >> * Filesystem (new default): at the end of a backup, the datastore issues >> a syncfs(2) to the filesystem of the datastore >> * File: issues an fsync on each chunk as they get inserted >> (using our 'replace_file' helper) >> >> a small benchmark showed the following (times in mm:ss): >> setup: virtual pbs, 4 cores, 8GiB memory, ext4 on spinner >> >> size none filesystem file >> 2GiB (fits in ram) 00:13 0:41 01:00 >> 33GiB 05:21 05:31 13:45 >> >> so if the backup fits in memory, there is a large difference between all >> of the modes (expected), but as soon as it exceeds the memory size, >> the difference between not syncing and syncing the fs at the end becomes >> much smaller. >> >> i also tested on an nvme, but there the syncs basically made no difference >> >> Signed-off-by: Dominik Csapak >> --- >> it would be nice if anybody else tries to recreate the benchmarks on >> different setups, to verify (or disprove) my findings > > FWIW: > > randfile on tmpfs as source, backed up as fidx > randfile regenerated for every run, PBS restarted for every run > > PBS in VM (8GB ram, disks on zvols on spinner + special + log), datastore on ext4: > > SIZE: 4096 MODE: none Duration: 22.51s > SIZE: 4096 MODE: filesystem Duration: 28.11s > SIZE: 4096 MODE: file Duration: 54.47s > > SIZE: 16384 MODE: none Duration: 202.42s > SIZE: 16384 MODE: filesystem Duration: 275.36s > SIZE: 16384 MODE: file Duration: 311.97s > > same VM, datastore on single-disk ZFS pool: > > SIZE: 1024 MODE: none Duration: 5.03s > SIZE: 1024 MODE: file Duration: 22.91s > SIZE: 1024 MODE: filesystem Duration: 15.57s > > SIZE: 4096 MODE: none Duration: 41.02s > SIZE: 4096 MODE: file Duration: 135.94s > SIZE: 4096 MODE: filesystem Duration: 146.88s > > SIZE: 16384 MODE: none Duration: 336.10s > rest ended in tears cause of restricted resources in the VM > > PBS baremetal same as source, datastore on ZFS on spinner+special+log: > > SIZE: 1024 MODE: none Duration: 4.90s > SIZE: 1024 MODE: file Duration: 4.92s > SIZE: 1024 MODE: filesystem Duration: 4.94s > > SIZE: 4096 MODE: none Duration: 19.56s > SIZE: 4096 MODE: file Duration: 31.67s > SIZE: 4096 MODE: filesystem Duration: 38.54s > > SIZE: 16384 MODE: none Duration: 189.77s > SIZE: 16384 MODE: file Duration: 178.81s > SIZE: 16384 MODE: filesystem Duration: 159.26s > > ^^ this is rather unexpected, I suspect something messed with the 'none' > case here, so I re-ran it: > > SIZE: 1024 MODE: none Duration: 4.90s > SIZE: 1024 MODE: file Duration: 4.92s > SIZE: 1024 MODE: filesystem Duration: 4.98s > SIZE: 4096 MODE: none Duration: 19.77s > SIZE: 4096 MODE: file Duration: 19.68s > SIZE: 4096 MODE: filesystem Duration: 19.61s > SIZE: 16384 MODE: none Duration: 133.93s > SIZE: 16384 MODE: file Duration: 146.88s > SIZE: 16384 MODE: filesystem Duration: 152.94s > > and once more with ~30GB (ARC is just 16G): > > SIZE: 30000 MODE: none Duration: 368.58s > SIZE: 30000 MODE: file Duration: 292.05s (!!!) > SIZE: 30000 MODE: filesystem Duration: 431.73s > > repeated once more: > > SIZE: 30000 MODE: none Duration: 419.75s > SIZE: 30000 MODE: file Duration: 302.73s > SIZE: 30000 MODE: filesystem Duration: 409.07s > > so.. rather weird? possible noisy measurements though, as this is on my > workstation ;) > > PBS baremetal same as source, datastore on ZFS on NVME (no surprises > there): > > SIZE: 1024 MODE: file Duration: 4.92s > SIZE: 1024 MODE: filesystem Duration: 4.95s > SIZE: 1024 MODE: none Duration: 4.96s > > SIZE: 4096 MODE: file Duration: 19.69s > SIZE: 4096 MODE: filesystem Duration: 19.78s > SIZE: 4096 MODE: none Duration: 19.67s > > SIZE: 16384 MODE: file Duration: 81.39s > SIZE: 16384 MODE: filesystem Duration: 78.86s > SIZE: 16384 MODE: none Duration: 78.38s > > SIZE: 30000 MODE: none Duration: 142.65s > SIZE: 30000 MODE: file Duration: 143.43s > SIZE: 30000 MODE: filesystem Duration: 143.15s > > > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel > > so what i gather from this benchmark is that on ext4, many fsyncs are more expensive than a single syncfs, but on zfs, it's very close together, leaning toward many fsyncs to be faster ? (aside from the case where fsync is faster than not doing it ???) in any case, doing some kind of syncing *will* slow down backups one way or another (leaving the weird zfs case aside for the moment). so the question is if we make one of the new modes the default or not... i'd put them in there even if we leave the default, so the admin can decide how much crash-consistency the pbs has. any other input? @wolfgang, @thomas, @dietmar?