From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 78C6274AE5 for ; Tue, 22 Jun 2021 14:18:34 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 6CC7726FBB for ; Tue, 22 Jun 2021 14:18:34 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id B3F5F26FA9 for ; Tue, 22 Jun 2021 14:18:33 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 833294593B for ; Tue, 22 Jun 2021 14:18:33 +0200 (CEST) From: Wolfgang Bumiller To: pve-devel@lists.proxmox.com Date: Tue, 22 Jun 2021 14:18:19 +0200 Message-Id: <20210622121828.84178-1-w.bumiller@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.834 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Jun 2021 12:18:34 -0000 Changes to v1: * Storage API gets a hard bump: (ver=9, age=0), due to the import method signature changes. * Added `nocow` file storage option as a performance knob. This causes raw files to be marked as `NOCOW` (`chattr +C`), which does 2 things: a) Disables checksumming: b) Allows the use of `O_DIRECT` without causing scrubs to spam checksum errors... Increases performance at the cost of data integrity. Note that to my knowledge this is not really worse than using any other non-checksumming file system (xfs, ext4), and if you use a single-disk setup with no redundancy, chances are that's all you need ;-) * Added `quotas` btrfs storage option. This requires quotas to be anbled on the file system (`btrfs quota enable /path/to/mountpoint`), and will allow creating "format=subvol" container disks with a non-zero size, instead of using an ext4 formatted raw file. For *now* this also disables send/recv (I'll work on a patch for that later). Other than that, this, uh, changes performance... (For small setups likely for the better, for bigger ones *potentially* for the worse.) * pve-container: use subvols on btrfs storages with the `quotas` option enabled NOTE: I kept the "storage lists" on the qemu & container side for now. We can still change this to become storage features later, but it seems this part of the code is actually in need of some more maintenance given the accumulation of features we have there. For instance, whether a volume is offline-migratable (the main checks touched by this series), would ideally also take the *target* storage into account. Eg. instead of a "feature" check, we could use `volume_transfer_formats()` (or a specialized method in `PVE::Storage` to check whether a volume which has snapshots can be migrated this way, iow. ask whether `storage_migrate()` with the given volume & storage parameters is supposed to succeed) Therefore the qemu & container parts (apart from the container change listed on top) are just rebased and otherwise unchanged. -- Original cover letter: This is another take at btrfs storage support. I wouldn't exactly call it great, but I guess it works (although I did manage to break a few... Then again I also manged to do that with ZFS (it just took a few years longer there...)). This one's spread over quite a few repositories, so let's go through them in apply-order: * pve-common: * pve-storage: * PATCH 1/4: fix find_free_disk_name invocations Just a non-issue I ran into (the parameter isn't actually used by our implementors currently, but it confused me ;-) ). * PATCH 2/4: add BTRFS storage plugin The main implementation with btrfs send/recv saved up for patch 4. (There's a note about `mkdir` vs `populate` etc., I intend to clean this up later, we had some off-list discussion about this already...) Currently, container subvolumes are only allowed to be unsized (size zero, like with our plain directory storage subvols), though we *could* enable quota support with little effort, but quota information is lost in send/recv operations, so we need to cover this in our import/export format separately, if we want to. (Although I have a feeling it wouldn't be nice for performance anyway...) * PATCH 3/4: update import/export storage API _Technically_ I *could* do without, but it would be quite inconvenient, and the information it adds to the methods is usually readily available, so I think this makes sense. * PATCH 4/4: btrfs: add 'btrfs' import/export format This requires a bit more elbow grease than ZFS, though, so I split this out into a separate patch. * pve-container: * PATCH 1/2: migration: fix snapshots boolean accounting (The `with_snapshots` parameter is otherways not set correctly since we handle the base volume last) * PATCH 2/2: enable btrfs support via subvolumes Some of this stuff should probably become a storage property... For container volumes which aren't _unsized_ this still allocates an ext4 formatted raw image. For size=0 volumes we'll have an actual btrfs subvolume. * qemu-server: * PATCH 1/1: allow migrating raw btrfs volumes Like in pve-container, some of this stuff should probably become a storage property... -- Big Terrifying Risky File System