From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 6A55070EBC for ; Wed, 9 Jun 2021 15:18:58 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 680B42505B for ; Wed, 9 Jun 2021 15:18:58 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 588362503E for ; Wed, 9 Jun 2021 15:18:57 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 2DDBE46749 for ; Wed, 9 Jun 2021 15:18:57 +0200 (CEST) From: Wolfgang Bumiller To: pve-devel@lists.proxmox.com Date: Wed, 9 Jun 2021 15:18:44 +0200 Message-Id: <20210609131852.167416-1-w.bumiller@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.946 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH multiple] btrfs, file system for the brave X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2021 13:18:58 -0000 This is another take at btrfs storage support. I wouldn't exactly call it great, but I guess it works (although I did manage to break a few... Then again I also manged to do that with ZFS (it just took a few years longer there...)). This one's spread over quite a few repositories, so let's go through them in apply-order: * pve-common: One nice improvement since the last go around is that by now btrfs supports renameat2's `RENAME_EXCHANGE` flag. * PATCH 1/1: Syscalls/Tools: add renameat2 The idea here is to have a more robust "rollback" implementation, since "snapshots" in btrfs are really just losely connected subvolumes, and there is no direct rollback functionality. Instead, we simply clone the snapshot we want to roll back to (by making a writable snapshot), and then rotate the clone into place before cleaning up the now-old version. Without `RENAME_EXCHANGE` this rotation required 2 syscalls creating a small window where, if the process is stopped/killed, the volume we're working on would not live in its designated place, making it somewhat nasty to deal with. Now, the worst that happens is an extra left-over snapshot lying around. * pve-storage: * PATCH 1/4: fix find_free_disk_name invocations Just a non-issue I ran into (the parameter isn't actually used by our implementors currently, but it confused me ;-) ). * PATCH 2/4: add BTRFS storage plugin The main implementation with btrfs send/recv saved up for patch 4. (There's a note about `mkdir` vs `populate` etc., I intend to clean this up later, we had some off-list discussion about this already...) Currently, container subvolumes are only allowed to be unsized (size zero, like with our plain directory storage subvols), though we *could* enable quota support with little effort, but quota information is lost in send/recv operations, so we need to cover this in our import/export format separately, if we want to. (Although I have a feeling it wouldn't be nice for performance anyway...) * PATCH 3/4: update import/export storage API _Technically_ I *could* do without, but it would be quite inconvenient, and the information it adds to the methods is usually readily available, so I think this makes sense. * PATCH 4/4: btrfs: add 'btrfs' import/export format This requires a bit more elbow grease than ZFS, though, so I split this out into a separate patch. * pve-container: * PATCH 1/2: migration: fix snapshots boolean accounting (The `with_snapshots` parameter is otherways not set correctly since we handle the base volume last) * PATCH 2/2: enable btrfs support via subvolumes Some of this stuff should probably become a storage property... For container volumes which aren't _unsized_ this still allocates an ext4 formatted raw image. For size=0 volumes we'll have an actual btrfs subvolume. * qemu-server: * PATCH 1/1: allow migrating raw btrfs volumes Like in pve-container, some of this stuff should probably become a storage property... -- Big Terrifying Risky File System