* [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave
@ 2021-06-22 12:18 Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 1/5] add BTRFS storage plugin Wolfgang Bumiller
` (9 more replies)
0 siblings, 10 replies; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-22 12:18 UTC (permalink / raw)
To: pve-devel
Changes to v1:
* Storage API gets a hard bump: (ver=9, age=0), due to the
import method signature changes.
* Added `nocow` file storage option as a performance knob.
This causes raw files to be marked as `NOCOW`
(`chattr +C`), which does 2 things:
a) Disables checksumming:
b) Allows the use of `O_DIRECT` without causing scrubs
to spam checksum errors...
Increases performance at the cost of data integrity.
Note that to my knowledge this is not really worse than
using any other non-checksumming file system (xfs,
ext4), and if you use a single-disk setup with no
redundancy, chances are that's all you need ;-)
* Added `quotas` btrfs storage option.
This requires quotas to be anbled on the file system
(`btrfs quota enable /path/to/mountpoint`), and will
allow creating "format=subvol" container disks with a
non-zero size, instead of using an ext4 formatted raw
file.
For *now* this also disables send/recv (I'll work on a
patch for that later).
Other than that, this, uh, changes performance... (For
small setups likely for the better, for bigger ones
*potentially* for the worse.)
* pve-container: use subvols on btrfs storages with the
`quotas` option enabled
NOTE:
I kept the "storage lists" on the qemu & container side for
now. We can still change this to become storage features
later, but it seems this part of the code is actually in
need of some more maintenance given the accumulation of
features we have there.
For instance, whether a volume is offline-migratable (the
main checks touched by this series), would ideally also take
the *target* storage into account. Eg. instead of a
"feature" check, we could use `volume_transfer_formats()`
(or a specialized method in `PVE::Storage` to check whether
a volume which has snapshots can be migrated this way, iow.
ask whether `storage_migrate()` with the given volume &
storage parameters is supposed to succeed)
Therefore the qemu & container parts (apart from the
container change listed on top) are just rebased and
otherwise unchanged.
--
Original cover letter:
This is another take at btrfs storage support.
I wouldn't exactly call it great, but I guess it works (although I did
manage to break a few... Then again I also manged to do that with ZFS
(it just took a few years longer there...)).
This one's spread over quite a few repositories, so let's go through
them in apply-order:
* pve-common:
<snip>
* pve-storage:
* PATCH 1/4: fix find_free_disk_name invocations
Just a non-issue I ran into (the parameter isn't actually used by
our implementors currently, but it confused me ;-) ).
* PATCH 2/4: add BTRFS storage plugin
The main implementation with btrfs send/recv saved up for patch 4.
(There's a note about `mkdir` vs `populate` etc., I intend to clean
this up later, we had some off-list discussion about this
already...)
Currently, container subvolumes are only allowed to be unsized
(size zero, like with our plain directory storage subvols), though
we *could* enable quota support with little effort, but quota
information is lost in send/recv operations, so we need to cover
this in our import/export format separately, if we want to.
(Although I have a feeling it wouldn't be nice for performance
anyway...)
* PATCH 3/4: update import/export storage API
_Technically_ I *could* do without, but it would be quite
inconvenient, and the information it adds to the methods is usually
readily available, so I think this makes sense.
* PATCH 4/4: btrfs: add 'btrfs' import/export format
This requires a bit more elbow grease than ZFS, though, so I split
this out into a separate patch.
* pve-container:
* PATCH 1/2: migration: fix snapshots boolean accounting
(The `with_snapshots` parameter is otherways not set correctly
since we handle the base volume last)
* PATCH 2/2: enable btrfs support via subvolumes
Some of this stuff should probably become a storage property...
For container volumes which aren't _unsized_ this still allocates
an ext4 formatted raw image. For size=0 volumes we'll have an
actual btrfs subvolume.
* qemu-server:
* PATCH 1/1: allow migrating raw btrfs volumes
Like in pve-container, some of this stuff should probably become a
storage property...
--
Big Terrifying Risky File System
^ permalink raw reply [flat|nested] 19+ messages in thread
* [pve-devel] [PATCH v2 storage 1/5] add BTRFS storage plugin
2021-06-22 12:18 [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave Wolfgang Bumiller
@ 2021-06-22 12:18 ` Wolfgang Bumiller
2021-06-23 12:15 ` Fabian Grünbichler
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 2/5] bump storage API: update import/export methods Wolfgang Bumiller
` (8 subsequent siblings)
9 siblings, 1 reply; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-22 12:18 UTC (permalink / raw)
To: pve-devel
This is mostly the same as a directory storage, with 2 major
differences:
* 'subvol' volumes are actual btrfs subvolumes and therefore
allow snapshots
* 'raw' files are placed *into* a subvolume and therefore
also allow snapshots, the raw file for volume
`btrstore:100/vm-100-disk-1.raw` can be found under
`$path/images/100/vm-100-disk-1/disk.raw`
* in both cases, snapshots add an '@name' suffix to the
subvolume's directory name, so snapshot 'foo' of the above
would be found under
`$path/images/100/vm-100-disk-1@foo/disk.raw`
or for format "subvol":
`$path/images/100/subvol-100-disk-1.subvol@foo`
Note that qgroups aren't included in btrfs-send streams,
therefore for now we will only be using *unsized* subvolumes
for containers and place a regular raw+ext4 file for sized
containers.
We could extend the import/export stream format to include
the information at the front (similar to how we do the
"tar+size" format, but we need to include the size of all
the contained snapshots as well, since they can technically
change). (But before enabling quotas we should do some
performance testing on bigger file systems with multiple
snapshots as there are quite a few reports of the fs slowing
down considerably in such scenarios).
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
Changes to v1:
raw files are created manually with sysopen/truncate and
ioctls for setting the NOCOW bit have been added (this
are made optional later in the series)
PVE/Storage.pm | 2 +
PVE/Storage/BTRFSPlugin.pm | 634 +++++++++++++++++++++++++++++++++++++
PVE/Storage/Makefile | 1 +
3 files changed, 637 insertions(+)
create mode 100644 PVE/Storage/BTRFSPlugin.pm
diff --git a/PVE/Storage.pm b/PVE/Storage.pm
index ea887a4..e0d6f44 100755
--- a/PVE/Storage.pm
+++ b/PVE/Storage.pm
@@ -38,6 +38,7 @@ use PVE::Storage::GlusterfsPlugin;
use PVE::Storage::ZFSPoolPlugin;
use PVE::Storage::ZFSPlugin;
use PVE::Storage::PBSPlugin;
+use PVE::Storage::BTRFSPlugin;
# Storage API version. Increment it on changes in storage API interface.
use constant APIVER => 8;
@@ -60,6 +61,7 @@ PVE::Storage::GlusterfsPlugin->register();
PVE::Storage::ZFSPoolPlugin->register();
PVE::Storage::ZFSPlugin->register();
PVE::Storage::PBSPlugin->register();
+PVE::Storage::BTRFSPlugin->register();
# load third-party plugins
if ( -d '/usr/share/perl5/PVE/Storage/Custom' ) {
diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
new file mode 100644
index 0000000..370d848
--- /dev/null
+++ b/PVE/Storage/BTRFSPlugin.pm
@@ -0,0 +1,634 @@
+package PVE::Storage::BTRFSPlugin;
+
+use strict;
+use warnings;
+
+use base qw(PVE::Storage::Plugin);
+
+use Fcntl qw(S_ISDIR O_WRONLY O_CREAT O_EXCL);
+use File::Basename qw(dirname);
+use File::Path qw(mkpath);
+use IO::Dir;
+
+use PVE::Tools qw(run_command);
+
+use PVE::Storage::DirPlugin;
+
+use constant {
+ BTRFS_FIRST_FREE_OBJECTID => 256,
+ FS_NOCOW_FL => 0x00800000,
+ FS_IOC_GETFLAGS => 0x40086602,
+ FS_IOC_SETFLAGS => 0x80086601,
+};
+
+# Configuration (similar to DirPlugin)
+
+sub type {
+ return 'btrfs';
+}
+
+sub plugindata {
+ return {
+ content => [
+ {
+ images => 1,
+ rootdir => 1,
+ vztmpl => 1,
+ iso => 1,
+ backup => 1,
+ snippets => 1,
+ none => 1,
+ },
+ { images => 1, rootdir => 1 },
+ ],
+ format => [ { raw => 1, qcow2 => 1, vmdk => 1, subvol => 1 }, 'raw', ],
+ };
+}
+
+sub options {
+ return {
+ path => { fixed => 1 },
+ nodes => { optional => 1 },
+ shared => { optional => 1 },
+ disable => { optional => 1 },
+ maxfiles => { optional => 1 },
+ content => { optional => 1 },
+ format => { optional => 1 },
+ is_mountpoint => { optional => 1 },
+ # TODO: The new variant of mkdir with `populate` vs `create`...
+ };
+}
+
+# Storage implementation
+#
+# We use the same volume names are directory plugins, but map *raw* disk image file names into a
+# subdirectory.
+#
+# `vm-VMID-disk-ID.raw`
+# -> `images/VMID/vm-VMID-disk-ID/disk.raw`
+# where the `vm-VMID-disk-ID/` subdirectory is a btrfs subvolume
+
+# Reuse `DirPlugin`'s `check_config`. This simply checks for invalid paths.
+sub check_config {
+ my ($self, $sectionId, $config, $create, $skipSchemaCheck) = @_;
+ return PVE::Storage::DirPlugin::check_config($self, $sectionId, $config, $create, $skipSchemaCheck);
+}
+
+sub activate_storage {
+ my ($class, $storeid, $scfg, $cache) = @_;
+ return PVE::Storage::DirPlugin::activate_storage($class, $storeid, $scfg, $cache);
+}
+
+sub status {
+ my ($class, $storeid, $scfg, $cache) = @_;
+ return PVE::Storage::DirPlugin::status($class, $storeid, $scfg, $cache);
+}
+
+# TODO: sub get_volume_notes {}
+
+# TODO: sub update_volume_notes {}
+
+# croak would not include the caller from within this module
+sub __error {
+ my ($msg) = @_;
+ my (undef, $f, $n) = caller(1);
+ die "$msg at $f: $n\n";
+}
+
+# Given a name (eg. `vm-VMID-disk-ID.raw`), take the part up to the format suffix as the name of
+# the subdirectory (subvolume).
+sub raw_name_to_dir($) {
+ my ($raw) = @_;
+
+ # For the subvolume directory Strip the `.<format>` suffix:
+ if ($raw =~ /^(.*)\.raw$/) {
+ return $1;
+ }
+
+ __error "internal error: bad disk name: $raw";
+}
+
+sub raw_file_to_subvol($) {
+ my ($file) = @_;
+
+ if ($file =~ m|^(.*)/disk\.raw$|) {
+ return "$1";
+ }
+
+ __error "internal error: bad raw path: $file";
+}
+
+sub filesystem_path {
+ my ($class, $scfg, $volname, $snapname) = @_;
+
+ my ($vtype, $name, $vmid, undef, undef, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ my $path = $class->get_subdir($scfg, $vtype);
+
+ $path .= "/$vmid" if $vtype eq 'images';
+
+ if ($format eq 'raw') {
+ my $dir = raw_name_to_dir($name);
+ if ($snapname) {
+ $dir .= "\@$snapname";
+ }
+ $path .= "/$dir/disk.raw";
+ } elsif ($format eq 'subvol') {
+ $path .= "/$name";
+ if ($snapname) {
+ $path .= "\@$snapname";
+ }
+ } else {
+ $path .= "/$name";
+ }
+
+ return wantarray ? ($path, $vmid, $vtype) : $path;
+}
+
+sub btrfs_cmd {
+ my ($class, $cmd, $outfunc) = @_;
+
+ my $msg = '';
+ my $func;
+ if (defined($outfunc)) {
+ $func = sub {
+ my $part = &$outfunc(@_);
+ $msg .= $part if defined($part);
+ };
+ } else {
+ $func = sub { $msg .= "$_[0]\n" };
+ }
+ run_command(['btrfs', '-q', @$cmd], errmsg => 'btrfs error', outfunc => $func);
+
+ return $msg;
+}
+
+sub btrfs_get_subvol_id {
+ my ($class, $path) = @_;
+ my $info = $class->btrfs_cmd(['subvolume', 'show', '--', $path]);
+ if ($info !~ /^\s*(?:Object|Subvolume) ID:\s*(\d+)$/m) {
+ die "failed to get btrfs subvolume ID from: $info\n";
+ }
+ return $1;
+}
+
+my sub chattr : prototype($$$) {
+ my ($fh, $mask, $xor) = @_;
+
+ my $flags = pack('L!', 0);
+ ioctl($fh, FS_IOC_GETFLAGS, $flags) or die "FS_IOC_GETFLAGS failed - $!\n";
+ $flags = pack('L!', (unpack('L!', $flags) & $mask) ^ $xor);
+ ioctl($fh, FS_IOC_SETFLAGS, $flags) or die "FS_IOC_SETFLAGS failed - $!\n";
+ return 1;
+}
+
+sub create_base {
+ my ($class, $storeid, $scfg, $volname) = @_;
+
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ my $newname = $name;
+ $newname =~ s/^vm-/base-/;
+
+ # If we're not working with a 'raw' file, which is the only thing that's "different" for btrfs,
+ # or a subvolume, we forward to the DirPlugin
+ if ($format ne 'raw' && $format ne 'subvol') {
+ return PVE::Storage::DirPlugin::create_base(@_);
+ }
+
+ my $path = $class->filesystem_path($scfg, $volname);
+ my $newvolname = $basename ? "$basevmid/$basename/$vmid/$newname" : "$vmid/$newname";
+ my $newpath = $class->filesystem_path($scfg, $newvolname);
+
+ my $subvol = $path;
+ my $newsubvol = $newpath;
+ if ($format eq 'raw') {
+ $subvol = raw_file_to_subvol($subvol);
+ $newsubvol = raw_file_to_subvol($newsubvol);
+ }
+
+ rename($subvol, $newsubvol)
+ || die "rename '$subvol' to '$newsubvol' failed - $!\n";
+ eval { $class->btrfs_cmd(['property', 'set', $newsubvol, 'ro', 'true']) };
+ warn $@ if $@;
+
+ return $newvolname;
+}
+
+sub clone_image {
+ my ($class, $scfg, $storeid, $volname, $vmid, $snap) = @_;
+
+ my ($vtype, $basename, $basevmid, undef, undef, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ # If we're not working with a 'raw' file, which is the only thing that's "different" for btrfs,
+ # or a subvolume, we forward to the DirPlugin
+ if ($format ne 'raw' && $format ne 'subvol') {
+ return PVE::Storage::DirPlugin::clone_image(@_);
+ }
+
+ my $imagedir = $class->get_subdir($scfg, 'images');
+ $imagedir .= "/$vmid";
+ mkpath $imagedir;
+
+ my $path = $class->filesystem_path($scfg, $volname);
+ my $newname = $class->find_free_diskname($storeid, $scfg, $vmid, $format, 1);
+
+ # For btrfs subvolumes we don't actually need the "link":
+ #my $newvolname = "$basevmid/$basename/$vmid/$newname";
+ my $newvolname = "$vmid/$newname";
+ my $newpath = $class->filesystem_path($scfg, $newvolname);
+
+ my $subvol = $path;
+ my $newsubvol = $newpath;
+ if ($format eq 'raw') {
+ $subvol = raw_file_to_subvol($subvol);
+ $newsubvol = raw_file_to_subvol($newsubvol);
+ }
+
+ $class->btrfs_cmd(['subvolume', 'snapshot', '--', $subvol, $newsubvol]);
+
+ return $newvolname;
+}
+
+sub alloc_image {
+ my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
+
+ if ($fmt ne 'raw' && $fmt ne 'subvol') {
+ return PVE::Storage::DirPlugin::alloc_image(@_);
+ }
+
+ # From Plugin.pm:
+
+ my $imagedir = $class->get_subdir($scfg, 'images') . "/$vmid";
+
+ mkpath $imagedir;
+
+ $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt, 1) if !$name;
+
+ my (undef, $tmpfmt) = PVE::Storage::Plugin::parse_name_dir($name);
+
+ die "illegal name '$name' - wrong extension for format ('$tmpfmt != '$fmt')\n"
+ if $tmpfmt ne $fmt;
+
+ # End copy from Plugin.pm
+
+ my $subvol = "$imagedir/$name";
+ # .raw is not part of the directory name
+ $subvol =~ s/\.raw$//;
+
+ die "disk image '$subvol' already exists\n" if -e $subvol;
+
+ my $path;
+ if ($fmt eq 'raw') {
+ $path = "$subvol/disk.raw";
+ }
+
+ if ($fmt eq 'subvol' && !!$size) {
+ # NOTE: `btrfs send/recv` actually drops quota information so supporting subvolumes with
+ # quotas doesn't play nice with send/recv.
+ die "btrfs quotas are currently not supported, use an unsized subvolume or a raw file\n";
+ }
+
+ $class->btrfs_cmd(['subvolume', 'create', '--', $subvol]);
+
+ eval {
+ if ($fmt eq 'subvol') {
+ # Nothing to do for now...
+
+ # This is how we *would* do it:
+ # # Use the subvol's default 0/$id qgroup
+ # eval {
+ # # This call should happen at storage creation instead and therefore governed by a
+ # # configuration option!
+ # # $class->btrfs_cmd(['quota', 'enable', $subvol]);
+ # my $id = $class->btrfs_get_subvol_id($subvol);
+ # $class->btrfs_cmd(['qgroup', 'limit', "${size}k", "0/$id", $subvol]);
+ # };
+ } elsif ($fmt eq 'raw') {
+ sysopen my $fh, $path, O_WRONLY | O_CREAT | O_EXCL
+ or die "failed to create raw file '$path' - $!\n";
+ chattr($fh, ~FS_NOCOW_FL, FS_NOCOW_FL);
+ truncate($fh, $size * 1024)
+ or die "failed to set file size for '$path' - $!\n";
+ close($fh);
+ } else {
+ die "internal format error (format = $fmt)\n";
+ }
+ };
+
+ if (my $err = $@) {
+ eval { $class->btrfs_cmd(['subvolume', 'delete', '--', $subvol]); };
+ warn $@ if $@;
+ die $err;
+ }
+
+ return "$vmid/$name";
+}
+
+# Same as btrfsprogs does:
+my sub path_is_subvolume : prototype($) {
+ my ($path) = @_;
+ my @stat = stat($path)
+ or die "stat failed on '$path' - $!\n";
+ my ($ino, $mode) = @stat[1, 2];
+ return S_ISDIR($mode) && $ino == BTRFS_FIRST_FREE_OBJECTID;
+}
+
+my $BTRFS_VOL_REGEX = qr/((?:vm|base|subvol)-\d+-disk-\d+(?:\.subvol)?)(?:\@(\S+))$/;
+
+# Calls `$code->($volume, $name, $snapshot)` for each subvol in a directory matching our volume
+# regex.
+my sub foreach_subvol : prototype($$) {
+ my ($dir, $code) = @_;
+
+ dir_glob_foreach($dir, $BTRFS_VOL_REGEX, sub {
+ my ($volume, $name, $snapshot) = ($1, $2, $3);
+ return if !path_is_subvolume("$dir/$volume");
+ $code->($volume, $name, $snapshot);
+ })
+}
+
+sub free_image {
+ my ($class, $storeid, $scfg, $volname, $isBase, $_format) = @_;
+
+ my (undef, undef, $vmid, undef, undef, undef, $format) =
+ $class->parse_volname($volname);
+
+ if ($format ne 'subvol' && $format ne 'raw') {
+ return PVE::Storage::DirPlugin::free_image(@_);
+ }
+
+ my $path = $class->filesystem_path($scfg, $volname);
+
+ my $subvol = $path;
+ if ($format eq 'raw') {
+ $subvol = raw_file_to_subvol($path);
+ }
+
+ my $dir = dirname($subvol);
+ my @snapshot_vols;
+ foreach_subvol($dir, sub {
+ my ($volume, $name, $snapshot) = @_;
+ return if !defined $snapshot;
+ push @snapshot_vols, "$dir/$volume";
+ });
+
+ $class->btrfs_cmd(['subvolume', 'delete', '--', @snapshot_vols, $subvol]);
+ # try to cleanup directory to not clutter storage with empty $vmid dirs if
+ # all images from a guest got deleted
+ rmdir($dir);
+
+ return undef;
+}
+
+# Currently not used because quotas clash with send/recv.
+# my sub btrfs_subvol_quota {
+# my ($class, $path) = @_;
+# my $id = '0/' . $class->btrfs_get_subvol_id($path);
+# my $search = qr/^\Q$id\E\s+(\d)+\s+\d+\s+(\d+)\s*$/;
+# my ($used, $size);
+# $class->btrfs_cmd(['qgroup', 'show', '--raw', '-rf', '--', $path], sub {
+# return if defined($size);
+# if ($_[0] =~ $search) {
+# ($used, $size) = ($1, $2);
+# }
+# });
+# if (!defined($size)) {
+# # syslog should include more information:
+# syslog('err', "failed to get subvolume size for: $path (id $id)");
+# # UI should only see the last path component:
+# $path =~ s|^.*/||;
+# die "failed to get subvolume size for $path\n";
+# }
+# return wantarray ? ($used, $size) : $size;
+# }
+
+sub volume_size_info {
+ my ($class, $scfg, $storeid, $volname, $timeout) = @_;
+
+ my $path = $class->filesystem_path($scfg, $volname);
+
+ my $format = ($class->parse_volname($volname))[6];
+
+ if ($format eq 'subvol') {
+ my $ctime = (stat($path))[10];
+ my ($used, $size) = (0, 0);
+ #my ($used, $size) = btrfs_subvol_quota($class, $path); # uses wantarray
+ return wantarray ? ($size, 'subvol', $used, undef, $ctime) : 1;
+ }
+
+ return PVE::Storage::Plugin::file_size_info($path, $timeout);
+}
+
+sub volume_resize {
+ my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
+
+ my $format = ($class->parse_volname($volname))[6];
+ if ($format eq 'subvol') {
+ my $path = $class->filesystem_path($scfg, $volname);
+ my $id = '0/' . $class->btrfs_get_subvol_id($path);
+ $class->btrfs_cmd(['qgroup', 'limit', '--', "${size}k", "0/$id", $path]);
+ return undef;
+ }
+
+ return PVE::Storage::Plugin::volume_resize(@_);
+}
+
+sub volume_snapshot {
+ my ($class, $scfg, $storeid, $volname, $snap) = @_;
+
+ my ($name, $vmid, $format) = ($class->parse_volname($volname))[1,2,6];
+ if ($format ne 'subvol' && $format ne 'raw') {
+ return PVE::Storage::Plugin::volume_snapshot(@_);
+ }
+
+ my $path = $class->filesystem_path($scfg, $volname);
+ my $snap_path = $class->filesystem_path($scfg, $volname, $snap);
+
+ if ($format eq 'raw') {
+ $path = raw_file_to_subvol($path);
+ $snap_path = raw_file_to_subvol($snap_path);
+ }
+
+ my $snapshot_dir = $class->get_subdir($scfg, 'images') . "/$vmid";
+ mkpath $snapshot_dir;
+
+ $class->btrfs_cmd(['subvolume', 'snapshot', '-r', '--', $path, $snap_path]);
+ return undef;
+}
+
+sub volume_rollback_is_possible {
+ my ($class, $scfg, $storeid, $volname, $snap) = @_;
+
+ return 1;
+}
+
+sub volume_snapshot_rollback {
+ my ($class, $scfg, $storeid, $volname, $snap) = @_;
+
+ my ($name, $format) = ($class->parse_volname($volname))[1,6];
+
+ if ($format ne 'subvol' && $format ne 'raw') {
+ return PVE::Storage::Plugin::volume_snapshot_rollback(@_);
+ }
+
+ my $path = $class->filesystem_path($scfg, $volname);
+ my $snap_path = $class->filesystem_path($scfg, $volname, $snap);
+
+ if ($format eq 'raw') {
+ $path = raw_file_to_subvol($path);
+ $snap_path = raw_file_to_subvol($snap_path);
+ }
+
+ # Simple version would be:
+ # rename old to temp
+ # create new
+ # on error rename temp back
+ # But for atomicity in case the rename after create-failure *also* fails, we create the new
+ # subvol first, then use RENAME_EXCHANGE,
+ my $tmp_path = "$path.tmp.$$";
+ $class->btrfs_cmd(['subvolume', 'snapshot', '--', $snap_path, $tmp_path]);
+ # The paths are absolute, so pass -1 as file descriptors.
+ my $ok = PVE::Tools::renameat2(-1, $tmp_path, -1, $path, &PVE::Tools::RENAME_EXCHANGE);
+
+ eval { $class->btrfs_cmd(['subvolume', 'delete', '--', $tmp_path]) };
+ warn "failed to remove '$tmp_path' subvolume: $@" if $@;
+
+ if (!$ok) {
+ die "failed to rotate '$tmp_path' into place at '$path' - $!\n";
+ }
+
+ return undef;
+}
+
+sub volume_snapshot_delete {
+ my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
+
+ my ($name, $vmid, $format) = ($class->parse_volname($volname))[1,2,6];
+
+ if ($format ne 'subvol' && $format ne 'raw') {
+ return PVE::Storage::Plugin::volume_snapshot_delete(@_);
+ }
+
+ my $path = $class->filesystem_path($scfg, $volname, $snap);
+
+ if ($format eq 'raw') {
+ $path = raw_file_to_subvol($path);
+ }
+
+ $class->btrfs_cmd(['subvolume', 'delete', '--', $path]);
+
+ return undef;
+}
+
+sub volume_has_feature {
+ my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
+
+ my $features = {
+ snapshot => {
+ current => { qcow2 => 1, raw => 1, subvol => 1 },
+ snap => { qcow2 => 1, raw => 1, subvol => 1 }
+ },
+ clone => {
+ base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
+ current => { raw => 1 },
+ snap => { raw => 1 },
+ },
+ template => { current => { qcow2 => 1, raw => 1, vmdk => 1, subvol => 1 } },
+ copy => {
+ base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
+ current => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
+ snap => { qcow2 => 1, raw => 1, subvol => 1 },
+ },
+ sparseinit => { base => {qcow2 => 1, raw => 1, vmdk => 1 },
+ current => {qcow2 => 1, raw => 1, vmdk => 1 } },
+ };
+
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ my $key = undef;
+ if ($snapname) {
+ $key = 'snap';
+ } else {
+ $key = $isBase ? 'base' : 'current';
+ }
+
+ return 1 if defined($features->{$feature}->{$key}->{$format});
+
+ return undef;
+}
+
+sub list_images {
+ my ($class, $storeid, $scfg, $vmid, $vollist, $cache) = @_;
+ my $imagedir = $class->get_subdir($scfg, 'images');
+
+ my $res = [];
+
+ # Copied from Plugin.pm, with file_size_info calls adapted:
+ foreach my $fn (<$imagedir/[0-9][0-9]*/*>) {
+ # different to in Plugin.pm the regex below also excludes '@' as valid file name
+ next if $fn !~ m@^(/.+/(\d+)/([^/\@.]+(?:\.(qcow2|vmdk|subvol))?))$@;
+ $fn = $1; # untaint
+
+ my $owner = $2;
+ my $name = $3;
+ my $ext = $4;
+
+ next if !$vollist && defined($vmid) && ($owner ne $vmid);
+
+ my $volid = "$storeid:$owner/$name";
+ my ($size, $format, $used, $parent, $ctime);
+
+ if (!$ext) { # raw
+ $volid .= '.raw';
+ ($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info("$fn/disk.raw");
+ } elsif ($ext eq 'subvol') {
+ ($used, $size) = (0, 0);
+ #($used, $size) = btrfs_subvol_quota($class, $fn);
+ $format = 'subvol';
+ } else {
+ ($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info($fn);
+ }
+ next if !($format && defined($size));
+
+ if ($vollist) {
+ next if ! grep { $_ eq $volid } @$vollist;
+ }
+
+ my $info = {
+ volid => $volid, format => $format,
+ size => $size, vmid => $owner, used => $used, parent => $parent,
+ };
+
+ $info->{ctime} = $ctime if $ctime;
+
+ push @$res, $info;
+ }
+
+ return $res;
+}
+
+# For now we don't implement `btrfs send/recv` as it needs some updates to our import/export API
+# first!
+
+sub volume_export_formats {
+ return PVE::Storage::DirPlugin::volume_export_formats(@_);
+}
+
+sub volume_export {
+ return PVE::Storage::DirPlugin::volume_export(@_);
+}
+
+sub volume_import_formats {
+ return PVE::Storage::DirPlugin::volume_import_formats(@_);
+}
+
+sub volume_import {
+ return PVE::Storage::DirPlugin::volume_import(@_);
+}
+
+1
diff --git a/PVE/Storage/Makefile b/PVE/Storage/Makefile
index 91b9238..857c485 100644
--- a/PVE/Storage/Makefile
+++ b/PVE/Storage/Makefile
@@ -12,6 +12,7 @@ SOURCES= \
ZFSPoolPlugin.pm \
ZFSPlugin.pm \
PBSPlugin.pm \
+ BTRFSPlugin.pm \
LvmThinPlugin.pm
.PHONY: install
--
2.30.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [pve-devel] [PATCH v2 storage 2/5] bump storage API: update import/export methods
2021-06-22 12:18 [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 1/5] add BTRFS storage plugin Wolfgang Bumiller
@ 2021-06-22 12:18 ` Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 3/5] btrfs: add 'btrfs' import/export format Wolfgang Bumiller
` (7 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-22 12:18 UTC (permalink / raw)
To: pve-devel; +Cc: drbd-dev
Bumps APIVER to 9 and resets APIAGE to zero.
The import methods (volume_import, volume_import_formats):
These additionally get the '$snapshot' parameter which is
already present on the export side as an informational piece
to know which of the snapshots is the *current* one.
This parameter is inserted *in the middle* of the current
parameters, so the import & export format methods now have
the same signatures.
The current "disk" state will be set to this snapshot.
This, too, is required for our btrfs implementation.
`volume_import_formats` can obviously not make much
*use* of this parameter, but it'll still be useful to know
that the information is actually available in the import
call, so its presence will be checked in the btrfs
implementation.
Currently this is intended to be used for btrfs send/recv
support, which in theory could also get additional metadata
similar to how we do the "tar+size" format, however, we
currently only really use this within this repository in
storage_migrate() which has this information readily
available anyway.
On the export side (volume_export, volume_export_formats):
The `$with_snapshots` option is now "defined" to be an
ordered array of snapshots to include, as a hint for
storages which need this. (As of the next commit this is
only btrfs, and only when also specifying a base snapshot,
which is a case we can currently not run into except on the
command line interface.)
The current providers of the `with_snapshot` option will
still treat it as a boolean (since eg. for ZFS you cannot
really "skip" snapshots AFAIK).
This is mainly intended for storages which do not have a
strong association between snapshots and the originals, or
an ordering (eg. btrfs and lvm-thin allow creating
arbitrary snapshot trees, and with btrfs you can even
create a "circular" connection between subvolumes, also we
could consider reflink based copies snapshots on xfs in
the future maybe?)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
To be explicit about this: this is meant for pve-7 only, merging this to
pve-6 would cause unnecessary churn to users of custom plugins.
Changes to v1:
for a more consistent api, the snapshot parameter of
`volume_import_formats` has been moved from the end to the same
position as in the `volume_export_formats` method, so they have
matching signatures, since this is a major API change and
As a heads up:
Cc: drbd-dev@lists.linbit.com
This will affect the LINSTOR drbd plugin, so Ccing drbd-devel.
AFAICT the plugin doesn't override the affected methods, so it should be
enough to just bump the returned version number in the `api` sub for
`$apiver >= 9`.
ApiChangeLog | 25 ++++++++++++++++++
PVE/CLI/pvesm.pm | 23 +++++++++++++++--
PVE/Storage.pm | 50 ++++++++++++++++++++++++------------
PVE/Storage/LVMPlugin.pm | 6 ++---
PVE/Storage/Plugin.pm | 4 +--
PVE/Storage/ZFSPoolPlugin.pm | 6 ++---
6 files changed, 87 insertions(+), 27 deletions(-)
create mode 100644 ApiChangeLog
diff --git a/ApiChangeLog b/ApiChangeLog
new file mode 100644
index 0000000..8c119c5
--- /dev/null
+++ b/ApiChangeLog
@@ -0,0 +1,25 @@
+# API Versioning ChangeLog
+
+Our API versioning contains an `APIVER` and an `APIAGE`.
+The `APIAGE` is the number of versions we're backward compatible with. (iow. things got added
+without breaking anything unaware of it.)
+
+Future changes should be documented in here.
+
+## Version 9: (AGE resets to 0):
+
+* volume_import_formats gets a new parameter *inserted*:
+
+ Old signature:
+ sub($plugin, $scfg, $storeid, $volname, $base_snapshot, $with_snapshots)
+ New signature:
+ sub($plugin, $scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots)
+
+ This is now the same as `volume_export_formats`.
+
+ The same goes for calls to `PVE::Storage::volume_import_formats`, which now
+ takes a `$snapshot` parameter in the same place.
+
+* $with_snapshots *may* now be an array reference containing an ordered list of
+ snapshots, but *may* also just be a boolean, and the contained list *may* be
+ ignored, so it can still be treated as a boolean.
diff --git a/PVE/CLI/pvesm.pm b/PVE/CLI/pvesm.pm
index d28f1ba..4491107 100755
--- a/PVE/CLI/pvesm.pm
+++ b/PVE/CLI/pvesm.pm
@@ -276,12 +276,23 @@ __PACKAGE__->register_method ({
optional => 1,
default => 0,
},
+ 'snapshot-list' => {
+ description => "Ordered list of snapshots to transfer",
+ type => 'string',
+ format => 'string-list',
+ optional => 1,
+ },
},
},
returns => { type => 'null' },
code => sub {
my ($param) = @_;
+ my $with_snapshots = $param->{'with-snapshots'};
+ if (defined(my $list = $param->{'snapshot-list'})) {
+ $with_snapshots = PVE::Tools::split_list($list);
+ }
+
my $filename = $param->{filename};
my $outfh;
@@ -295,7 +306,7 @@ __PACKAGE__->register_method ({
eval {
my $cfg = PVE::Storage::config();
PVE::Storage::volume_export($cfg, $outfh, $param->{volume}, $param->{format},
- $param->{snapshot}, $param->{base}, $param->{'with-snapshots'});
+ $param->{snapshot}, $param->{base}, $with_snapshots);
};
my $err = $@;
if ($filename ne '-') {
@@ -361,6 +372,13 @@ __PACKAGE__->register_method ({
optional => 1,
default => 0,
},
+ snapshot => {
+ description => "The current-state snapshot if the stream contains snapshots",
+ type => 'string',
+ pattern => qr/[a-z0-9_\-]{1,40}/i,
+ maxLength => 40,
+ optional => 1,
+ },
},
},
returns => { type => 'string' },
@@ -436,7 +454,8 @@ __PACKAGE__->register_method ({
my $volume = $param->{volume};
my $delete = $param->{'delete-snapshot'};
my $imported_volid = PVE::Storage::volume_import($cfg, $infh, $volume, $param->{format},
- $param->{base}, $param->{'with-snapshots'}, $param->{'allow-rename'});
+ $param->{snapshot}, $param->{base}, $param->{'with-snapshots'},
+ $param->{'allow-rename'});
PVE::Storage::volume_snapshot_delete($cfg, $imported_volid, $delete)
if defined($delete);
return $imported_volid;
diff --git a/PVE/Storage.pm b/PVE/Storage.pm
index e0d6f44..5ca052f 100755
--- a/PVE/Storage.pm
+++ b/PVE/Storage.pm
@@ -41,11 +41,11 @@ use PVE::Storage::PBSPlugin;
use PVE::Storage::BTRFSPlugin;
# Storage API version. Increment it on changes in storage API interface.
-use constant APIVER => 8;
+use constant APIVER => 9;
# Age is the number of versions we're backward compatible with.
# This is like having 'current=APIVER' and age='APIAGE' in libtool,
# see https://www.gnu.org/software/libtool/manual/html_node/Libtool-versioning.html
-use constant APIAGE => 7;
+use constant APIAGE => 0;
# load standard plugins
PVE::Storage::DirPlugin->register();
@@ -714,7 +714,8 @@ sub storage_migrate {
my $send = ['pvesm', 'export', $volid, $format, '-', '-with-snapshots', $with_snapshots];
my $recv = [@$ssh, '--', 'pvesm', 'import', $target_volid, $format, $import_fn, '-with-snapshots', $with_snapshots];
if (defined($snapshot)) {
- push @$send, '-snapshot', $snapshot
+ push @$send, '-snapshot', $snapshot;
+ push @$recv, '-snapshot', $snapshot;
}
if ($migration_snapshot) {
push @$recv, '-delete-snapshot', $snapshot;
@@ -724,7 +725,7 @@ sub storage_migrate {
if (defined($base_snapshot)) {
# Check if the snapshot exists on the remote side:
push @$send, '-base', $base_snapshot;
- push @$recv, '-base', $base_snapshot;
+ push @$recv, '-base', $base_snapshot if $target_apiver >= 9;
}
my $new_volid;
@@ -1712,7 +1713,7 @@ sub prune_mark_backup_group {
}
}
-sub volume_export {
+sub volume_export : prototype($$$$$$$) {
my ($cfg, $fh, $volid, $format, $snapshot, $base_snapshot, $with_snapshots) = @_;
my ($storeid, $volname) = parse_volume_id($volid, 1);
@@ -1723,18 +1724,27 @@ sub volume_export {
$snapshot, $base_snapshot, $with_snapshots);
}
-sub volume_import {
- my ($cfg, $fh, $volid, $format, $base_snapshot, $with_snapshots, $allow_rename) = @_;
+sub volume_import : prototype($$$$$$$$) {
+ my ($cfg, $fh, $volid, $format, $snapshot, $base_snapshot, $with_snapshots, $allow_rename) = @_;
my ($storeid, $volname) = parse_volume_id($volid, 1);
die "cannot import into volume '$volid'\n" if !$storeid;
my $scfg = storage_config($cfg, $storeid);
my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
- return $plugin->volume_import($scfg, $storeid, $fh, $volname, $format,
- $base_snapshot, $with_snapshots, $allow_rename) // $volid;
-}
-
-sub volume_export_formats {
+ return $plugin->volume_import(
+ $scfg,
+ $storeid,
+ $fh,
+ $volname,
+ $format,
+ $snapshot,
+ $base_snapshot,
+ $with_snapshots,
+ $allow_rename,
+ ) // $volid;
+}
+
+sub volume_export_formats : prototype($$$$$) {
my ($cfg, $volid, $snapshot, $base_snapshot, $with_snapshots) = @_;
my ($storeid, $volname) = parse_volume_id($volid, 1);
@@ -1746,21 +1756,27 @@ sub volume_export_formats {
$with_snapshots);
}
-sub volume_import_formats {
- my ($cfg, $volid, $base_snapshot, $with_snapshots) = @_;
+sub volume_import_formats : prototype($$$$$) {
+ my ($cfg, $volid, $snapshot, $base_snapshot, $with_snapshots) = @_;
my ($storeid, $volname) = parse_volume_id($volid, 1);
return if !$storeid;
my $scfg = storage_config($cfg, $storeid);
my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
- return $plugin->volume_import_formats($scfg, $storeid, $volname,
- $base_snapshot, $with_snapshots);
+ return $plugin->volume_import_formats(
+ $scfg,
+ $storeid,
+ $volname,
+ $snapshot,
+ $base_snapshot,
+ $with_snapshots,
+ );
}
sub volume_transfer_formats {
my ($cfg, $src_volid, $dst_volid, $snapshot, $base_snapshot, $with_snapshots) = @_;
my @export_formats = volume_export_formats($cfg, $src_volid, $snapshot, $base_snapshot, $with_snapshots);
- my @import_formats = volume_import_formats($cfg, $dst_volid, $base_snapshot, $with_snapshots);
+ my @import_formats = volume_import_formats($cfg, $dst_volid, $snapshot, $base_snapshot, $with_snapshots);
my %import_hash = map { $_ => 1 } @import_formats;
my @common = grep { $import_hash{$_} } @export_formats;
return @common;
diff --git a/PVE/Storage/LVMPlugin.pm b/PVE/Storage/LVMPlugin.pm
index b85fb89..b5fa9d6 100644
--- a/PVE/Storage/LVMPlugin.pm
+++ b/PVE/Storage/LVMPlugin.pm
@@ -603,7 +603,7 @@ sub volume_has_feature {
sub volume_export_formats {
my ($class, $scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots) = @_;
return () if defined($snapshot); # lvm-thin only
- return volume_import_formats($class, $scfg, $storeid, $volname, $base_snapshot, $with_snapshots);
+ return volume_import_formats($class, $scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots);
}
sub volume_export {
@@ -627,14 +627,14 @@ sub volume_export {
}
sub volume_import_formats {
- my ($class, $scfg, $storeid, $volname, $base_snapshot, $with_snapshots) = @_;
+ my ($class, $scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots) = @_;
return () if $with_snapshots; # not supported
return () if defined($base_snapshot); # not supported
return ('raw+size');
}
sub volume_import {
- my ($class, $scfg, $storeid, $fh, $volname, $format, $base_snapshot, $with_snapshots, $allow_rename) = @_;
+ my ($class, $scfg, $storeid, $fh, $volname, $format, $snapshot, $base_snapshot, $with_snapshots, $allow_rename) = @_;
die "volume import format $format not available for $class\n"
if $format ne 'raw+size';
die "cannot import volumes together with their snapshots in $class\n"
diff --git a/PVE/Storage/Plugin.pm b/PVE/Storage/Plugin.pm
index d330845..8803e65 100644
--- a/PVE/Storage/Plugin.pm
+++ b/PVE/Storage/Plugin.pm
@@ -1402,7 +1402,7 @@ sub volume_export_formats {
# Import data from a stream, creating a new or replacing or adding to an existing volume.
sub volume_import {
- my ($class, $scfg, $storeid, $fh, $volname, $format, $base_snapshot, $with_snapshots, $allow_rename) = @_;
+ my ($class, $scfg, $storeid, $fh, $volname, $format, $snapshot, $base_snapshot, $with_snapshots, $allow_rename) = @_;
die "volume import format '$format' not available for $class\n"
if $format !~ /^(raw|tar|qcow2|vmdk)\+size$/;
@@ -1462,7 +1462,7 @@ sub volume_import {
}
sub volume_import_formats {
- my ($class, $scfg, $storeid, $volname, $base_snapshot, $with_snapshots) = @_;
+ my ($class, $scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots) = @_;
if ($scfg->{path} && !defined($base_snapshot)) {
my $format = ($class->parse_volname($volname))[6];
if ($with_snapshots) {
diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm
index 2e2abe3..c4be70f 100644
--- a/PVE/Storage/ZFSPoolPlugin.pm
+++ b/PVE/Storage/ZFSPoolPlugin.pm
@@ -747,7 +747,7 @@ sub volume_export_formats {
}
sub volume_import {
- my ($class, $scfg, $storeid, $fh, $volname, $format, $base_snapshot, $with_snapshots, $allow_rename) = @_;
+ my ($class, $scfg, $storeid, $fh, $volname, $format, $snapshot, $base_snapshot, $with_snapshots, $allow_rename) = @_;
die "unsupported import stream format for $class: $format\n"
if $format ne 'zfs';
@@ -784,9 +784,9 @@ sub volume_import {
}
sub volume_import_formats {
- my ($class, $scfg, $storeid, $volname, $base_snapshot, $with_snapshots) = @_;
+ my ($class, $scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots) = @_;
- return $class->volume_export_formats($scfg, $storeid, $volname, undef, $base_snapshot, $with_snapshots);
+ return $class->volume_export_formats($scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots);
}
1;
--
2.30.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [pve-devel] [PATCH v2 storage 3/5] btrfs: add 'btrfs' import/export format
2021-06-22 12:18 [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 1/5] add BTRFS storage plugin Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 2/5] bump storage API: update import/export methods Wolfgang Bumiller
@ 2021-06-22 12:18 ` Wolfgang Bumiller
2021-06-23 12:14 ` Fabian Grünbichler
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 4/5] btrfs: make NOCOW optional Wolfgang Bumiller
` (6 subsequent siblings)
9 siblings, 1 reply; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-22 12:18 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
Changes to v1:
* import api parameters reordered du to diff in patch 2/5
PVE/CLI/pvesm.pm | 2 +-
PVE/Storage.pm | 2 +-
PVE/Storage/BTRFSPlugin.pm | 248 +++++++++++++++++++++++++++++++++++--
3 files changed, 240 insertions(+), 12 deletions(-)
diff --git a/PVE/CLI/pvesm.pm b/PVE/CLI/pvesm.pm
index 4491107..668170a 100755
--- a/PVE/CLI/pvesm.pm
+++ b/PVE/CLI/pvesm.pm
@@ -30,7 +30,7 @@ use PVE::CLIHandler;
use base qw(PVE::CLIHandler);
-my $KNOWN_EXPORT_FORMATS = ['raw+size', 'tar+size', 'qcow2+size', 'vmdk+size', 'zfs'];
+my $KNOWN_EXPORT_FORMATS = ['raw+size', 'tar+size', 'qcow2+size', 'vmdk+size', 'zfs', 'btrfs'];
my $nodename = PVE::INotify::nodename();
diff --git a/PVE/Storage.pm b/PVE/Storage.pm
index 5ca052f..c45d339 100755
--- a/PVE/Storage.pm
+++ b/PVE/Storage.pm
@@ -690,7 +690,7 @@ sub storage_migrate {
my $migration_snapshot;
if (!defined($snapshot)) {
- if ($scfg->{type} eq 'zfspool') {
+ if ($scfg->{type} eq 'zfspool' || $scfg->{type} eq 'btrfs') {
$migration_snapshot = 1;
$snapshot = '__migration__';
}
diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
index 370d848..072dfe0 100644
--- a/PVE/Storage/BTRFSPlugin.pm
+++ b/PVE/Storage/BTRFSPlugin.pm
@@ -9,8 +9,9 @@ use Fcntl qw(S_ISDIR O_WRONLY O_CREAT O_EXCL);
use File::Basename qw(dirname);
use File::Path qw(mkpath);
use IO::Dir;
+use POSIX qw(EEXIST);
-use PVE::Tools qw(run_command);
+use PVE::Tools qw(run_command dir_glob_foreach);
use PVE::Storage::DirPlugin;
@@ -612,23 +613,250 @@ sub list_images {
return $res;
}
-# For now we don't implement `btrfs send/recv` as it needs some updates to our import/export API
-# first!
-
sub volume_export_formats {
- return PVE::Storage::DirPlugin::volume_export_formats(@_);
-}
+ my ($class, $scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots) = @_;
-sub volume_export {
- return PVE::Storage::DirPlugin::volume_export(@_);
+ # We can do whatever `DirPlugin` can do.
+ my @result = PVE::Storage::Plugin::volume_export_formats(@_);
+
+ # `btrfs send` only works on snapshots:
+ return @result if !defined $snapshot;
+
+ # Incremental stream with snapshots is only supported if the snapshots are listed (new api):
+ return @result if defined($base_snapshot) && $with_snapshots && ref($with_snapshots) ne 'ARRAY';
+
+ # Otherwise we do also support `with_snapshots`.
+
+ # Finally, `btrfs send` only works on formats where we actually use btrfs subvolumes:
+ my $format = ($class->parse_volname($volname))[6];
+ return @result if $format ne 'raw' && $format ne 'subvol';
+
+ return ('btrfs', @result);
}
sub volume_import_formats {
- return PVE::Storage::DirPlugin::volume_import_formats(@_);
+ my ($class, $scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots) = @_;
+
+ # Same as export-formats, beware the parameter order:
+ return volume_export_formats(
+ $class,
+ $scfg,
+ $storeid,
+ $volname,
+ $snapshot,
+ $base_snapshot,
+ $with_snapshots,
+ );
+}
+
+sub volume_export {
+ my (
+ $class,
+ $scfg,
+ $storeid,
+ $fh,
+ $volname,
+ $format,
+ $snapshot,
+ $base_snapshot,
+ $with_snapshots,
+ ) = @_;
+
+ if ($format ne 'btrfs') {
+ return PVE::Storage::Plugin::volume_export(@_);
+ }
+
+ die "format 'btrfs' only works on snapshots\n"
+ if !defined $snapshot;
+
+ die "'btrfs' format in incremental mode requires snapshots to be listed explicitly\n"
+ if defined($base_snapshot) && $with_snapshots && ref($with_snapshots) ne 'ARRAY';
+
+ my $volume_format = ($class->parse_volname($volname))[6];
+
+ die "btrfs-sending volumes of type $volume_format ('$volname') is not supported\n"
+ if $volume_format ne 'raw' && $volume_format ne 'subvol';
+
+ my $path = $class->path($scfg, $volname, $storeid);
+
+ if ($volume_format eq 'raw') {
+ $path = raw_file_to_subvol($path);
+ }
+
+ my $cmd = ['btrfs', '-q', 'send', '-e'];
+ if ($base_snapshot) {
+ my $base = $class->path($scfg, $volname, $storeid, $base_snapshot);
+ if ($volume_format eq 'raw') {
+ $base = raw_file_to_subvol($base);
+ }
+ push @$cmd, '-p', $base;
+ }
+ push @$cmd, '--';
+ if (ref($with_snapshots) eq 'ARRAY') {
+ push @$cmd, (map { "$path\@$_" } ($with_snapshots // [])->@*), $path;
+ } else {
+ dir_glob_foreach(dirname($path), $BTRFS_VOL_REGEX, sub {
+ push @$cmd, "$path\@$_[2]" if !(defined($snapshot) && $_[2] eq $snapshot);
+ });
+ }
+ $path .= "\@$snapshot" if defined($snapshot);
+ push @$cmd, $path;
+
+ run_command($cmd, output => '>&'.fileno($fh));
+ return;
}
sub volume_import {
- return PVE::Storage::DirPlugin::volume_import(@_);
+ my (
+ $class,
+ $scfg,
+ $storeid,
+ $fh,
+ $volname,
+ $format,
+ $snapshot,
+ $base_snapshot,
+ $with_snapshots,
+ $allow_rename,
+ ) = @_;
+
+ if ($format ne 'btrfs') {
+ return PVE::Storage::Plugin::volume_import(@_);
+ }
+
+ die "format 'btrfs' only works on snapshots\n"
+ if !defined $snapshot;
+
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $volume_format) =
+ $class->parse_volname($volname);
+
+ die "btrfs-receiving volumes of type $volume_format ('$volname') is not supported\n"
+ if $volume_format ne 'raw' && $volume_format ne 'subvol';
+
+ if (defined($base_snapshot)) {
+ my $path = $class->path($scfg, $volname, $storeid, $base_snapshot);
+ die "base snapshot '$base_snapshot' not found - no such directory '$path'\n"
+ if !path_is_subvolume($path);
+ }
+
+ my $destination = $class->filesystem_path($scfg, $volname);
+ if ($volume_format eq 'raw') {
+ $destination = raw_file_to_subvol($destination);
+ }
+
+ if (!defined($base_snapshot) && -e $destination) {
+ die "volume $volname already exists\n" if !$allow_rename;
+ $volname = $class->find_free_diskname($storeid, $scfg, $vmid, $volume_format, 1);
+ }
+
+ my $imagedir = $class->get_subdir($scfg, $vtype);
+ $imagedir .= "/$vmid" if $vtype eq 'images';
+
+ my $tmppath = "$imagedir/recv.$vmid.tmp";
+ mkdir($imagedir); # FIXME: if $scfg->{mkdir};
+ if (!mkdir($tmppath)) {
+ die "temp receive directory already exists at '$tmppath', incomplete concurrent import?\n"
+ if $! == EEXIST;
+ die "failed to create temporary receive directory at '$tmppath' - $!\n";
+ }
+
+ my $dh = IO::Dir->new($tmppath)
+ or die "failed to open temporary receive directory '$tmppath' - $!\n";
+ eval {
+ run_command(['btrfs', '-q', 'receive', '-e', '--', $tmppath], input => '<&'.fileno($fh));
+
+ # Analyze the received subvolumes;
+ my ($diskname, $found_snapshot, @snapshots);
+ $dh->rewind;
+ while (defined(my $entry = $dh->read)) {
+ next if $entry eq '.' || $entry eq '..';
+ next if $entry !~ /^$BTRFS_VOL_REGEX$/;
+ my ($cur_diskname, $cur_snapshot) = ($1, $2);
+
+ die "send stream included a non-snapshot subvolume\n"
+ if !defined($cur_snapshot);
+
+ if (!defined($diskname)) {
+ $diskname = $cur_diskname;
+ } else {
+ die "multiple disks contained in stream ('$diskname' vs '$cur_diskname')\n"
+ if $diskname ne $cur_diskname;
+ }
+
+ if ($cur_snapshot eq $snapshot) {
+ $found_snapshot = 1;
+ } else {
+ push @snapshots, $cur_snapshot;
+ }
+ }
+
+ die "send stream did not contain the expected current snapshot '$snapshot'\n"
+ if !$found_snapshot;
+
+ # Rotate the disk into place, first the current state:
+ # Note that read-only subvolumes cannot be moved into different directories, but for the
+ # "current" state we also want a writable copy, so start with that:
+ $class->btrfs_cmd(['property', 'set', "$tmppath/$diskname\@$snapshot", 'ro', 'false']);
+ PVE::Tools::renameat2(
+ -1,
+ "$tmppath/$diskname\@$snapshot",
+ -1,
+ $destination,
+ &PVE::Tools::RENAME_NOREPLACE,
+ ) or die "failed to move received snapshot '$tmppath/$diskname\@$snapshot'"
+ . " into place at '$destination' - $!\n";
+
+ # Now recreate the actual snapshot:
+ $class->btrfs_cmd([
+ 'subvolume',
+ 'snapshot',
+ '-r',
+ '--',
+ $destination,
+ "$destination\@$snapshot",
+ ]);
+
+ # Now go through the remaining snapshots (if any)
+ foreach my $snap (@snapshots) {
+ $class->btrfs_cmd(['property', 'set', "$tmppath/$diskname\@$snap", 'ro', 'false']);
+ PVE::Tools::renameat2(
+ -1,
+ "$tmppath/$diskname\@$snap",
+ -1,
+ "$destination\@$snap",
+ &PVE::Tools::RENAME_NOREPLACE,
+ ) or die "failed to move received snapshot '$tmppath/$diskname\@$snap'"
+ . " into place at '$destination\@$snap' - $!\n";
+ eval { $class->btrfs_cmd(['property', 'set', "$destination\@$snap", 'ro', 'true']) };
+ warn "failed to make $destination\@$snap read-only - $!\n" if $@;
+ }
+ };
+ my $err = $@;
+
+ eval {
+ # Cleanup all the received snapshots we did not move into place, so we can remove the temp
+ # directory.
+ if ($dh) {
+ $dh->rewind;
+ while (defined(my $entry = $dh->read)) {
+ next if $entry eq '.' || $entry eq '..';
+ eval { $class->btrfs_cmd(['subvolume', 'delete', '--', "$tmppath/$entry"]) };
+ warn $@ if $@;
+ }
+ $dh->close; undef $dh;
+ }
+ if (!rmdir($tmppath)) {
+ warn "failed to remove temporary directory '$tmppath' - $!\n"
+ }
+ };
+ warn $@ if $@;
+ if ($err) {
+ # clean up if the directory ended up being empty after an error
+ rmdir($tmppath);
+ die $err;
+ }
+
+ return "$storeid:$volname";
}
1
--
2.30.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [pve-devel] [PATCH v2 storage 4/5] btrfs: make NOCOW optional
2021-06-22 12:18 [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave Wolfgang Bumiller
` (2 preceding siblings ...)
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 3/5] btrfs: add 'btrfs' import/export format Wolfgang Bumiller
@ 2021-06-22 12:18 ` Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 5/5] btrfs: support quota-based subvols optionally Wolfgang Bumiller
` (5 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-22 12:18 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
New in this version.
It's a performance knob...
PVE/Storage/BTRFSPlugin.pm | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
index 072dfe0..1fe5db0 100644
--- a/PVE/Storage/BTRFSPlugin.pm
+++ b/PVE/Storage/BTRFSPlugin.pm
@@ -46,6 +46,19 @@ sub plugindata {
};
}
+sub properties {
+ return {
+ nocow => {
+ description => "Set the NOCOW flag on files."
+ . " Disables data checksumming and causes data errors to be unrecoverable from"
+ . " while allowing direct I/O. Only use this if data does not need to be any more"
+ . " safe than on a single ext4 formatted disk with no underlying raid system.",
+ type => 'boolean',
+ default => 0,
+ },
+ };
+}
+
sub options {
return {
path => { fixed => 1 },
@@ -56,6 +69,7 @@ sub options {
content => { optional => 1 },
format => { optional => 1 },
is_mountpoint => { optional => 1 },
+ nocow => { optional => 1 },
# TODO: The new variant of mkdir with `populate` vs `create`...
};
}
@@ -311,7 +325,7 @@ sub alloc_image {
} elsif ($fmt eq 'raw') {
sysopen my $fh, $path, O_WRONLY | O_CREAT | O_EXCL
or die "failed to create raw file '$path' - $!\n";
- chattr($fh, ~FS_NOCOW_FL, FS_NOCOW_FL);
+ chattr($fh, ~FS_NOCOW_FL, FS_NOCOW_FL) if $scfg->{nocow};
truncate($fh, $size * 1024)
or die "failed to set file size for '$path' - $!\n";
close($fh);
--
2.30.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [pve-devel] [PATCH v2 storage 5/5] btrfs: support quota-based subvols optionally
2021-06-22 12:18 [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave Wolfgang Bumiller
` (3 preceding siblings ...)
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 4/5] btrfs: make NOCOW optional Wolfgang Bumiller
@ 2021-06-22 12:18 ` Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 container 1/3] migration: fix snapshots boolean accounting Wolfgang Bumiller
` (4 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-22 12:18 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
New in this version. Disables `btrfs-send/recv` *for now*...
In my local tests, while merely enabling quotas on the file
system does make the fs measurably slower, it was still
faster than the ext4-raw image (and even faster than ZFS
subvolumes), and while I have no definitive data on this
yet, there *seems* to be a trend towards slower performance
with more space utilization, but that may just be noise as
well...
PVE/Storage/BTRFSPlugin.pm | 95 +++++++++++++++++++++++---------------
1 file changed, 59 insertions(+), 36 deletions(-)
diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
index 1fe5db0..d5357a3 100644
--- a/PVE/Storage/BTRFSPlugin.pm
+++ b/PVE/Storage/BTRFSPlugin.pm
@@ -56,6 +56,14 @@ sub properties {
type => 'boolean',
default => 0,
},
+ quotas => {
+ description => "If enabled, containers will use subvolumes directly, using quotas"
+ . " for space accounting, instead of using ext4 formatted images files."
+ . " While this removes a file system layer for containers, there is a risk of"
+ . " performance degrading over time.",
+ type => 'boolean',
+ default => 0,
+ },
};
}
@@ -70,6 +78,7 @@ sub options {
format => { optional => 1 },
is_mountpoint => { optional => 1 },
nocow => { optional => 1 },
+ quotas => { optional => 1 },
# TODO: The new variant of mkdir with `populate` vs `create`...
};
}
@@ -301,27 +310,22 @@ sub alloc_image {
$path = "$subvol/disk.raw";
}
- if ($fmt eq 'subvol' && !!$size) {
+ if ($fmt eq 'subvol' && !!$size && !$scfg->{quotas}) {
# NOTE: `btrfs send/recv` actually drops quota information so supporting subvolumes with
# quotas doesn't play nice with send/recv.
- die "btrfs quotas are currently not supported, use an unsized subvolume or a raw file\n";
+ die "btrfs quotas are currently disabled, use an unsized subvolume or a raw file\n";
}
$class->btrfs_cmd(['subvolume', 'create', '--', $subvol]);
eval {
if ($fmt eq 'subvol') {
- # Nothing to do for now...
-
- # This is how we *would* do it:
- # # Use the subvol's default 0/$id qgroup
- # eval {
- # # This call should happen at storage creation instead and therefore governed by a
- # # configuration option!
- # # $class->btrfs_cmd(['quota', 'enable', $subvol]);
- # my $id = $class->btrfs_get_subvol_id($subvol);
- # $class->btrfs_cmd(['qgroup', 'limit', "${size}k", "0/$id", $subvol]);
- # };
+ if (!!$size) {
+ eval {
+ my $id = $class->btrfs_get_subvol_id($subvol);
+ $class->btrfs_cmd(['qgroup', 'limit', "${size}k", "0/$id", $subvol]);
+ };
+ }
} elsif ($fmt eq 'raw') {
sysopen my $fh, $path, O_WRONLY | O_CREAT | O_EXCL
or die "failed to create raw file '$path' - $!\n";
@@ -399,27 +403,26 @@ sub free_image {
return undef;
}
-# Currently not used because quotas clash with send/recv.
-# my sub btrfs_subvol_quota {
-# my ($class, $path) = @_;
-# my $id = '0/' . $class->btrfs_get_subvol_id($path);
-# my $search = qr/^\Q$id\E\s+(\d)+\s+\d+\s+(\d+)\s*$/;
-# my ($used, $size);
-# $class->btrfs_cmd(['qgroup', 'show', '--raw', '-rf', '--', $path], sub {
-# return if defined($size);
-# if ($_[0] =~ $search) {
-# ($used, $size) = ($1, $2);
-# }
-# });
-# if (!defined($size)) {
-# # syslog should include more information:
-# syslog('err', "failed to get subvolume size for: $path (id $id)");
-# # UI should only see the last path component:
-# $path =~ s|^.*/||;
-# die "failed to get subvolume size for $path\n";
-# }
-# return wantarray ? ($used, $size) : $size;
-# }
+my sub btrfs_subvol_quota {
+ my ($class, $path) = @_;
+ my $id = '0/' . $class->btrfs_get_subvol_id($path);
+ my $search = qr/^\Q$id\E\s+(\d)+\s+\d+\s+(\d+)\s*$/;
+ my ($used, $size);
+ $class->btrfs_cmd(['qgroup', 'show', '--raw', '-rf', '--', $path], sub {
+ return if defined($size);
+ if ($_[0] =~ $search) {
+ ($used, $size) = ($1, $2);
+ }
+ });
+ if (!defined($size)) {
+ # syslog should include more information:
+ syslog('err', "failed to get subvolume size for: $path (id $id)");
+ # UI should only see the last path component:
+ $path =~ s|^.*/||;
+ die "failed to get subvolume size for $path\n";
+ }
+ return wantarray ? ($used, $size) : $size;
+}
sub volume_size_info {
my ($class, $scfg, $storeid, $volname, $timeout) = @_;
@@ -431,7 +434,9 @@ sub volume_size_info {
if ($format eq 'subvol') {
my $ctime = (stat($path))[10];
my ($used, $size) = (0, 0);
- #my ($used, $size) = btrfs_subvol_quota($class, $path); # uses wantarray
+ if ($scfg->{quotas}) {
+ ($used, $size) = btrfs_subvol_quota($class, $path);
+ }
return wantarray ? ($size, 'subvol', $used, undef, $ctime) : 1;
}
@@ -443,6 +448,9 @@ sub volume_resize {
my $format = ($class->parse_volname($volname))[6];
if ($format eq 'subvol') {
+ die "subvolumes can only be resized with quotas enabled\n"
+ if !$scfg->{quotas};
+
my $path = $class->filesystem_path($scfg, $volname);
my $id = '0/' . $class->btrfs_get_subvol_id($path);
$class->btrfs_cmd(['qgroup', 'limit', '--', "${size}k", "0/$id", $path]);
@@ -603,7 +611,9 @@ sub list_images {
($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info("$fn/disk.raw");
} elsif ($ext eq 'subvol') {
($used, $size) = (0, 0);
- #($used, $size) = btrfs_subvol_quota($class, $fn);
+ if ($scfg->{quotas}) {
+ ($used, $size) = btrfs_subvol_quota($class, $fn);
+ }
$format = 'subvol';
} else {
($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info($fn);
@@ -645,6 +655,9 @@ sub volume_export_formats {
my $format = ($class->parse_volname($volname))[6];
return @result if $format ne 'raw' && $format ne 'subvol';
+ # Currently we don't allow using btrfs-send/recv with quotas on subvolumes.
+ return @result if $scfg->{quotas} && $format eq 'subvol';
+
return ('btrfs', @result);
}
@@ -691,6 +704,11 @@ sub volume_export {
die "btrfs-sending volumes of type $volume_format ('$volname') is not supported\n"
if $volume_format ne 'raw' && $volume_format ne 'subvol';
+ if ($scfg->{quotas} && $format eq 'subvol') {
+ die "btrfs-sending volumes of type $volume_format ('$volname') with quotas enabled is"
+ . " currently not supported\n";
+ }
+
my $path = $class->path($scfg, $volname, $storeid);
if ($volume_format eq 'raw') {
@@ -747,6 +765,11 @@ sub volume_import {
die "btrfs-receiving volumes of type $volume_format ('$volname') is not supported\n"
if $volume_format ne 'raw' && $volume_format ne 'subvol';
+ if ($scfg->{quotas} && $format eq 'subvol') {
+ die "btrfs-receiving volumes of type $volume_format ('$volname') with quotas enabled is"
+ . " currently not supported\n";
+ }
+
if (defined($base_snapshot)) {
my $path = $class->path($scfg, $volname, $storeid, $base_snapshot);
die "base snapshot '$base_snapshot' not found - no such directory '$path'\n"
--
2.30.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [pve-devel] [PATCH v2 container 1/3] migration: fix snapshots boolean accounting
2021-06-22 12:18 [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave Wolfgang Bumiller
` (4 preceding siblings ...)
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 5/5] btrfs: support quota-based subvols optionally Wolfgang Bumiller
@ 2021-06-22 12:18 ` Wolfgang Bumiller
2021-06-23 10:59 ` Fabian Grünbichler
2021-06-22 12:18 ` [pve-devel] [PATCH v2 container 2/3] enable btrfs support via subvolumes Wolfgang Bumiller
` (3 subsequent siblings)
9 siblings, 1 reply; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-22 12:18 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
No changes to v1.
src/PVE/LXC/Migrate.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
index 3cd895d..ce1f7dd 100644
--- a/src/PVE/LXC/Migrate.pm
+++ b/src/PVE/LXC/Migrate.pm
@@ -146,7 +146,7 @@ sub phase1 {
}
$volhash->{$volid}->{ref} = defined($snapname) ? 'snapshot' : 'config';
- $volhash->{$volid}->{snapshots} = defined($snapname);
+ $volhash->{$volid}->{snapshots} = 1 if defined($snapname);
my ($path, $owner) = PVE::Storage::path($self->{storecfg}, $volid);
--
2.30.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [pve-devel] [PATCH v2 container 2/3] enable btrfs support via subvolumes
2021-06-22 12:18 [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave Wolfgang Bumiller
` (5 preceding siblings ...)
2021-06-22 12:18 ` [pve-devel] [PATCH v2 container 1/3] migration: fix snapshots boolean accounting Wolfgang Bumiller
@ 2021-06-22 12:18 ` Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 container 3/3] special case btrfs+quotas to use subvolumes Wolfgang Bumiller
` (2 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-22 12:18 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
Changes to v1:
Rebased & Comment fixup.
Note that I have not yet opted to switch these tests to
`volume_has_feature` based ones. There are still some
unresolved questions there.
src/PVE/LXC/Migrate.pm | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
index ce1f7dd..95562e4 100644
--- a/src/PVE/LXC/Migrate.pm
+++ b/src/PVE/LXC/Migrate.pm
@@ -155,8 +155,8 @@ sub phase1 {
if (defined($snapname)) {
# we cannot migrate shapshots on local storage
- # exceptions: 'zfspool'
- if (($scfg->{type} eq 'zfspool')) {
+ # exceptions: 'zfspool', 'btrfs'
+ if ($scfg->{type} eq 'zfspool' || $scfg->{type} eq 'btrfs') {
return;
}
die "non-migratable snapshot exists\n";
@@ -222,8 +222,9 @@ sub phase1 {
my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
my $scfg = PVE::Storage::storage_config($self->{storecfg}, $sid);
- my $migratable = ($scfg->{type} eq 'dir') || ($scfg->{type} eq 'zfspool') ||
- ($scfg->{type} eq 'lvmthin') || ($scfg->{type} eq 'lvm');
+ my $migratable = ($scfg->{type} eq 'dir') || ($scfg->{type} eq 'zfspool')
+ || ($scfg->{type} eq 'lvmthin') || ($scfg->{type} eq 'lvm')
+ || ($scfg->{type} eq 'btrfs');
die "storage type '$scfg->{type}' not supported\n"
if !$migratable;
--
2.30.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [pve-devel] [PATCH v2 container 3/3] special case btrfs+quotas to use subvolumes
2021-06-22 12:18 [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave Wolfgang Bumiller
` (6 preceding siblings ...)
2021-06-22 12:18 ` [pve-devel] [PATCH v2 container 2/3] enable btrfs support via subvolumes Wolfgang Bumiller
@ 2021-06-22 12:18 ` Wolfgang Bumiller
2021-06-23 13:51 ` Fabian Grünbichler
2021-06-22 12:18 ` [pve-devel] [PATCH v2 qemu-server] allow migrating raw btrfs volumes Wolfgang Bumiller
2021-06-23 20:19 ` [pve-devel] applied-partially: [PATCH v2 multiple] btrfs, file system for the brave Thomas Lamprecht
9 siblings, 1 reply; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-22 12:18 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
New in v2.
src/PVE/LXC.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index 0a8a532..fc06842 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -1893,7 +1893,7 @@ sub alloc_disk {
eval {
my $do_format = 0;
if ($scfg->{content}->{rootdir} && $scfg->{path}) {
- if ($size_kb > 0) {
+ if ($size_kb > 0 && !($scfg->{type} eq 'btrfs' && $scfg->{quotas})) {
$volid = PVE::Storage::vdisk_alloc($storecfg, $storage, $vmid, 'raw', undef, $size_kb);
$do_format = 1;
} else {
--
2.30.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [pve-devel] [PATCH v2 qemu-server] allow migrating raw btrfs volumes
2021-06-22 12:18 [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave Wolfgang Bumiller
` (7 preceding siblings ...)
2021-06-22 12:18 ` [pve-devel] [PATCH v2 container 3/3] special case btrfs+quotas to use subvolumes Wolfgang Bumiller
@ 2021-06-22 12:18 ` Wolfgang Bumiller
2021-06-23 10:51 ` [pve-devel] applied: " Thomas Lamprecht
2021-06-23 20:19 ` [pve-devel] applied-partially: [PATCH v2 multiple] btrfs, file system for the brave Thomas Lamprecht
9 siblings, 1 reply; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-22 12:18 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
No changes to v1.
Also: see comment on the container side.
PVE/QemuMigrate.pm | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 5f37890..a2dd11c 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -509,7 +509,10 @@ sub scan_local_volumes {
# exceptions: 'zfspool' or 'qcow2' files (on directory storage)
die "online storage migration not possible if snapshot exists\n" if $self->{running};
- if (!($scfg->{type} eq 'zfspool' || $local_volumes->{$volid}->{format} eq 'qcow2')) {
+ if (!($scfg->{type} eq 'zfspool'
+ || ($scfg->{type} eq 'btrfs' && $local_volumes->{$volid}->{format} eq 'raw')
+ || $local_volumes->{$volid}->{format} eq 'qcow2'
+ )) {
die "non-migratable snapshot exists\n";
}
}
@@ -560,7 +563,7 @@ sub scan_local_volumes {
my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
my $scfg = PVE::Storage::storage_config($storecfg, $sid);
- my $migratable = $scfg->{type} =~ /^(?:dir|zfspool|lvmthin|lvm)$/;
+ my $migratable = $scfg->{type} =~ /^(?:dir|btrfs|zfspool|lvmthin|lvm)$/;
die "can't migrate '$volid' - storage type '$scfg->{type}' not supported\n"
if !$migratable;
--
2.30.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [pve-devel] applied: [PATCH v2 qemu-server] allow migrating raw btrfs volumes
2021-06-22 12:18 ` [pve-devel] [PATCH v2 qemu-server] allow migrating raw btrfs volumes Wolfgang Bumiller
@ 2021-06-23 10:51 ` Thomas Lamprecht
0 siblings, 0 replies; 19+ messages in thread
From: Thomas Lamprecht @ 2021-06-23 10:51 UTC (permalink / raw)
To: Proxmox VE development discussion, Wolfgang Bumiller
On 22.06.21 14:18, Wolfgang Bumiller wrote:
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
> No changes to v1.
> Also: see comment on the container side.
>
> PVE/QemuMigrate.pm | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
>
applied this one already to avoid a re-bump if the storage plugin is going in,
and while that's not the case this does not hurts, it nowhere shows up.
thanks!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [pve-devel] [PATCH v2 container 1/3] migration: fix snapshots boolean accounting
2021-06-22 12:18 ` [pve-devel] [PATCH v2 container 1/3] migration: fix snapshots boolean accounting Wolfgang Bumiller
@ 2021-06-23 10:59 ` Fabian Grünbichler
2021-06-23 12:31 ` Wolfgang Bumiller
0 siblings, 1 reply; 19+ messages in thread
From: Fabian Grünbichler @ 2021-06-23 10:59 UTC (permalink / raw)
To: Proxmox VE development discussion
some indication on what was broken would be nice..
AFAICT, storage_migrate fixes this to a boolean anyway, and this is only
passed to storage_migrate, so.. !?
On June 22, 2021 2:18 pm, Wolfgang Bumiller wrote:
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
> No changes to v1.
>
> src/PVE/LXC/Migrate.pm | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
> index 3cd895d..ce1f7dd 100644
> --- a/src/PVE/LXC/Migrate.pm
> +++ b/src/PVE/LXC/Migrate.pm
> @@ -146,7 +146,7 @@ sub phase1 {
> }
>
> $volhash->{$volid}->{ref} = defined($snapname) ? 'snapshot' : 'config';
> - $volhash->{$volid}->{snapshots} = defined($snapname);
> + $volhash->{$volid}->{snapshots} = 1 if defined($snapname);
>
> my ($path, $owner) = PVE::Storage::path($self->{storecfg}, $volid);
>
> --
> 2.30.2
>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [pve-devel] [PATCH v2 storage 3/5] btrfs: add 'btrfs' import/export format
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 3/5] btrfs: add 'btrfs' import/export format Wolfgang Bumiller
@ 2021-06-23 12:14 ` Fabian Grünbichler
2021-06-23 12:30 ` Wolfgang Bumiller
0 siblings, 1 reply; 19+ messages in thread
From: Fabian Grünbichler @ 2021-06-23 12:14 UTC (permalink / raw)
To: Proxmox VE development discussion
On June 22, 2021 2:18 pm, Wolfgang Bumiller wrote:
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
> Changes to v1:
> * import api parameters reordered du to diff in patch 2/5
>
> PVE/CLI/pvesm.pm | 2 +-
> PVE/Storage.pm | 2 +-
> PVE/Storage/BTRFSPlugin.pm | 248 +++++++++++++++++++++++++++++++++++--
> 3 files changed, 240 insertions(+), 12 deletions(-)
>
> diff --git a/PVE/CLI/pvesm.pm b/PVE/CLI/pvesm.pm
> index 4491107..668170a 100755
> --- a/PVE/CLI/pvesm.pm
> +++ b/PVE/CLI/pvesm.pm
> @@ -30,7 +30,7 @@ use PVE::CLIHandler;
>
> use base qw(PVE::CLIHandler);
>
> -my $KNOWN_EXPORT_FORMATS = ['raw+size', 'tar+size', 'qcow2+size', 'vmdk+size', 'zfs'];
> +my $KNOWN_EXPORT_FORMATS = ['raw+size', 'tar+size', 'qcow2+size', 'vmdk+size', 'zfs', 'btrfs'];
>
> my $nodename = PVE::INotify::nodename();
>
> diff --git a/PVE/Storage.pm b/PVE/Storage.pm
> index 5ca052f..c45d339 100755
> --- a/PVE/Storage.pm
> +++ b/PVE/Storage.pm
> @@ -690,7 +690,7 @@ sub storage_migrate {
>
> my $migration_snapshot;
> if (!defined($snapshot)) {
> - if ($scfg->{type} eq 'zfspool') {
> + if ($scfg->{type} eq 'zfspool' || $scfg->{type} eq 'btrfs') {
> $migration_snapshot = 1;
> $snapshot = '__migration__';
> }
> diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
> index 370d848..072dfe0 100644
> --- a/PVE/Storage/BTRFSPlugin.pm
> +++ b/PVE/Storage/BTRFSPlugin.pm
> @@ -9,8 +9,9 @@ use Fcntl qw(S_ISDIR O_WRONLY O_CREAT O_EXCL);
> use File::Basename qw(dirname);
> use File::Path qw(mkpath);
> use IO::Dir;
> +use POSIX qw(EEXIST);
>
> -use PVE::Tools qw(run_command);
> +use PVE::Tools qw(run_command dir_glob_foreach);
>
> use PVE::Storage::DirPlugin;
>
> @@ -612,23 +613,250 @@ sub list_images {
> return $res;
> }
>
> -# For now we don't implement `btrfs send/recv` as it needs some updates to our import/export API
> -# first!
> -
> sub volume_export_formats {
> - return PVE::Storage::DirPlugin::volume_export_formats(@_);
> -}
> + my ($class, $scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots) = @_;
>
> -sub volume_export {
> - return PVE::Storage::DirPlugin::volume_export(@_);
> + # We can do whatever `DirPlugin` can do.
> + my @result = PVE::Storage::Plugin::volume_export_formats(@_);
> +
> + # `btrfs send` only works on snapshots:
> + return @result if !defined $snapshot;
> +
> + # Incremental stream with snapshots is only supported if the snapshots are listed (new api):
> + return @result if defined($base_snapshot) && $with_snapshots && ref($with_snapshots) ne 'ARRAY';
> +
> + # Otherwise we do also support `with_snapshots`.
> +
> + # Finally, `btrfs send` only works on formats where we actually use btrfs subvolumes:
> + my $format = ($class->parse_volname($volname))[6];
> + return @result if $format ne 'raw' && $format ne 'subvol';
> +
> + return ('btrfs', @result);
> }
>
> sub volume_import_formats {
> - return PVE::Storage::DirPlugin::volume_import_formats(@_);
> + my ($class, $scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots) = @_;
> +
> + # Same as export-formats, beware the parameter order:
> + return volume_export_formats(
> + $class,
> + $scfg,
> + $storeid,
> + $volname,
> + $snapshot,
> + $base_snapshot,
> + $with_snapshots,
> + );
> +}
> +
> +sub volume_export {
> + my (
> + $class,
> + $scfg,
> + $storeid,
> + $fh,
> + $volname,
> + $format,
> + $snapshot,
> + $base_snapshot,
> + $with_snapshots,
> + ) = @_;
> +
> + if ($format ne 'btrfs') {
> + return PVE::Storage::Plugin::volume_export(@_);
> + }
> +
> + die "format 'btrfs' only works on snapshots\n"
> + if !defined $snapshot;
> +
> + die "'btrfs' format in incremental mode requires snapshots to be listed explicitly\n"
> + if defined($base_snapshot) && $with_snapshots && ref($with_snapshots) ne 'ARRAY';
> +
> + my $volume_format = ($class->parse_volname($volname))[6];
> +
> + die "btrfs-sending volumes of type $volume_format ('$volname') is not supported\n"
> + if $volume_format ne 'raw' && $volume_format ne 'subvol';
the three checks here are the same as in volume_export, maybe they could
go into a single export-helper?
> +
> + my $path = $class->path($scfg, $volname, $storeid);
> +
> + if ($volume_format eq 'raw') {
> + $path = raw_file_to_subvol($path);
> + }
> +
> + my $cmd = ['btrfs', '-q', 'send', '-e'];
> + if ($base_snapshot) {
> + my $base = $class->path($scfg, $volname, $storeid, $base_snapshot);
> + if ($volume_format eq 'raw') {
> + $base = raw_file_to_subvol($base);
> + }
> + push @$cmd, '-p', $base;
> + }
> + push @$cmd, '--';
> + if (ref($with_snapshots) eq 'ARRAY') {
> + push @$cmd, (map { "$path\@$_" } ($with_snapshots // [])->@*), $path;
> + } else {
> + dir_glob_foreach(dirname($path), $BTRFS_VOL_REGEX, sub {
> + push @$cmd, "$path\@$_[2]" if !(defined($snapshot) && $_[2] eq $snapshot);
> + });
> + }
> + $path .= "\@$snapshot" if defined($snapshot);
> + push @$cmd, $path;
> +
> + run_command($cmd, output => '>&'.fileno($fh));
> + return;
> }
>
> sub volume_import {
> - return PVE::Storage::DirPlugin::volume_import(@_);
> + my (
> + $class,
> + $scfg,
> + $storeid,
> + $fh,
> + $volname,
> + $format,
> + $snapshot,
> + $base_snapshot,
> + $with_snapshots,
> + $allow_rename,
> + ) = @_;
> +
> + if ($format ne 'btrfs') {
> + return PVE::Storage::Plugin::volume_import(@_);
> + }
> +
> + die "format 'btrfs' only works on snapshots\n"
> + if !defined $snapshot;
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $volume_format) =
> + $class->parse_volname($volname);
> +
> + die "btrfs-receiving volumes of type $volume_format ('$volname') is not supported\n"
> + if $volume_format ne 'raw' && $volume_format ne 'subvol';
> +
> + if (defined($base_snapshot)) {
> + my $path = $class->path($scfg, $volname, $storeid, $base_snapshot);
> + die "base snapshot '$base_snapshot' not found - no such directory '$path'\n"
> + if !path_is_subvolume($path);
> + }
> +
> + my $destination = $class->filesystem_path($scfg, $volname);
> + if ($volume_format eq 'raw') {
> + $destination = raw_file_to_subvol($destination);
> + }
> +
> + if (!defined($base_snapshot) && -e $destination) {
> + die "volume $volname already exists\n" if !$allow_rename;
> + $volname = $class->find_free_diskname($storeid, $scfg, $vmid, $volume_format, 1);
> + }
> +
> + my $imagedir = $class->get_subdir($scfg, $vtype);
> + $imagedir .= "/$vmid" if $vtype eq 'images';
> +
> + my $tmppath = "$imagedir/recv.$vmid.tmp";
> + mkdir($imagedir); # FIXME: if $scfg->{mkdir};
> + if (!mkdir($tmppath)) {
> + die "temp receive directory already exists at '$tmppath', incomplete concurrent import?\n"
> + if $! == EEXIST;
> + die "failed to create temporary receive directory at '$tmppath' - $!\n";
> + }
> +
> + my $dh = IO::Dir->new($tmppath)
> + or die "failed to open temporary receive directory '$tmppath' - $!\n";
> + eval {
> + run_command(['btrfs', '-q', 'receive', '-e', '--', $tmppath], input => '<&'.fileno($fh));
> +
> + # Analyze the received subvolumes;
> + my ($diskname, $found_snapshot, @snapshots);
> + $dh->rewind;
> + while (defined(my $entry = $dh->read)) {
> + next if $entry eq '.' || $entry eq '..';
> + next if $entry !~ /^$BTRFS_VOL_REGEX$/;
> + my ($cur_diskname, $cur_snapshot) = ($1, $2);
> +
> + die "send stream included a non-snapshot subvolume\n"
> + if !defined($cur_snapshot);
> +
> + if (!defined($diskname)) {
> + $diskname = $cur_diskname;
> + } else {
> + die "multiple disks contained in stream ('$diskname' vs '$cur_diskname')\n"
> + if $diskname ne $cur_diskname;
> + }
> +
> + if ($cur_snapshot eq $snapshot) {
> + $found_snapshot = 1;
> + } else {
> + push @snapshots, $cur_snapshot;
> + }
> + }
> +
> + die "send stream did not contain the expected current snapshot '$snapshot'\n"
> + if !$found_snapshot;
> +
> + # Rotate the disk into place, first the current state:
> + # Note that read-only subvolumes cannot be moved into different directories, but for the
> + # "current" state we also want a writable copy, so start with that:
> + $class->btrfs_cmd(['property', 'set', "$tmppath/$diskname\@$snapshot", 'ro', 'false']);
> + PVE::Tools::renameat2(
> + -1,
> + "$tmppath/$diskname\@$snapshot",
> + -1,
> + $destination,
> + &PVE::Tools::RENAME_NOREPLACE,
> + ) or die "failed to move received snapshot '$tmppath/$diskname\@$snapshot'"
> + . " into place at '$destination' - $!\n";
> +
> + # Now recreate the actual snapshot:
> + $class->btrfs_cmd([
> + 'subvolume',
> + 'snapshot',
> + '-r',
> + '--',
> + $destination,
> + "$destination\@$snapshot",
> + ]);
> +
> + # Now go through the remaining snapshots (if any)
> + foreach my $snap (@snapshots) {
> + $class->btrfs_cmd(['property', 'set', "$tmppath/$diskname\@$snap", 'ro', 'false']);
> + PVE::Tools::renameat2(
> + -1,
> + "$tmppath/$diskname\@$snap",
> + -1,
> + "$destination\@$snap",
> + &PVE::Tools::RENAME_NOREPLACE,
> + ) or die "failed to move received snapshot '$tmppath/$diskname\@$snap'"
> + . " into place at '$destination\@$snap' - $!\n";
> + eval { $class->btrfs_cmd(['property', 'set', "$destination\@$snap", 'ro', 'true']) };
> + warn "failed to make $destination\@$snap read-only - $!\n" if $@;
> + }
> + };
> + my $err = $@;
> +
> + eval {
> + # Cleanup all the received snapshots we did not move into place, so we can remove the temp
> + # directory.
> + if ($dh) {
> + $dh->rewind;
> + while (defined(my $entry = $dh->read)) {
> + next if $entry eq '.' || $entry eq '..';
> + eval { $class->btrfs_cmd(['subvolume', 'delete', '--', "$tmppath/$entry"]) };
> + warn $@ if $@;
> + }
> + $dh->close; undef $dh;
> + }
> + if (!rmdir($tmppath)) {
> + warn "failed to remove temporary directory '$tmppath' - $!\n"
> + }
> + };
> + warn $@ if $@;
> + if ($err) {
> + # clean up if the directory ended up being empty after an error
> + rmdir($tmppath);
> + die $err;
> + }
I am a bit wary about this (also w.r.t. future btrfs features) - a
privileged container can do quite a lot of stuff with btrfs
- create snapshots from within container
- set default subvolume
- set properties
- ... (the above is what I found in a few minutes without much BTRFS
experience)
unprivileged containers can do the following
- create new subvols
- create new snapshots
- set ro property
(same caveat of not looking too much at what else is possible, lots of
stuff does not work)
how do we guarantee that the send stream is not affected in a dangerous
fashion by an attacker inside the container?
> +
> + return "$storeid:$volname";
> }
>
> 1
> --
> 2.30.2
>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [pve-devel] [PATCH v2 storage 1/5] add BTRFS storage plugin
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 1/5] add BTRFS storage plugin Wolfgang Bumiller
@ 2021-06-23 12:15 ` Fabian Grünbichler
2021-06-23 12:43 ` Wolfgang Bumiller
0 siblings, 1 reply; 19+ messages in thread
From: Fabian Grünbichler @ 2021-06-23 12:15 UTC (permalink / raw)
To: Proxmox VE development discussion
On June 22, 2021 2:18 pm, Wolfgang Bumiller wrote:
> This is mostly the same as a directory storage, with 2 major
> differences:
>
> * 'subvol' volumes are actual btrfs subvolumes and therefore
> allow snapshots
> * 'raw' files are placed *into* a subvolume and therefore
> also allow snapshots, the raw file for volume
> `btrstore:100/vm-100-disk-1.raw` can be found under
> `$path/images/100/vm-100-disk-1/disk.raw`
> * in both cases, snapshots add an '@name' suffix to the
> subvolume's directory name, so snapshot 'foo' of the above
> would be found under
> `$path/images/100/vm-100-disk-1@foo/disk.raw`
> or for format "subvol":
> `$path/images/100/subvol-100-disk-1.subvol@foo`
>
> Note that qgroups aren't included in btrfs-send streams,
> therefore for now we will only be using *unsized* subvolumes
> for containers and place a regular raw+ext4 file for sized
> containers.
> We could extend the import/export stream format to include
> the information at the front (similar to how we do the
> "tar+size" format, but we need to include the size of all
> the contained snapshots as well, since they can technically
> change). (But before enabling quotas we should do some
> performance testing on bigger file systems with multiple
> snapshots as there are quite a few reports of the fs slowing
> down considerably in such scenarios).
>
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
> Changes to v1:
> raw files are created manually with sysopen/truncate and
> ioctls for setting the NOCOW bit have been added (this
> are made optional later in the series)
>
>
> PVE/Storage.pm | 2 +
> PVE/Storage/BTRFSPlugin.pm | 634 +++++++++++++++++++++++++++++++++++++
> PVE/Storage/Makefile | 1 +
> 3 files changed, 637 insertions(+)
> create mode 100644 PVE/Storage/BTRFSPlugin.pm
>
> diff --git a/PVE/Storage.pm b/PVE/Storage.pm
> index ea887a4..e0d6f44 100755
> --- a/PVE/Storage.pm
> +++ b/PVE/Storage.pm
> @@ -38,6 +38,7 @@ use PVE::Storage::GlusterfsPlugin;
> use PVE::Storage::ZFSPoolPlugin;
> use PVE::Storage::ZFSPlugin;
> use PVE::Storage::PBSPlugin;
> +use PVE::Storage::BTRFSPlugin;
>
> # Storage API version. Increment it on changes in storage API interface.
> use constant APIVER => 8;
> @@ -60,6 +61,7 @@ PVE::Storage::GlusterfsPlugin->register();
> PVE::Storage::ZFSPoolPlugin->register();
> PVE::Storage::ZFSPlugin->register();
> PVE::Storage::PBSPlugin->register();
> +PVE::Storage::BTRFSPlugin->register();
>
> # load third-party plugins
> if ( -d '/usr/share/perl5/PVE/Storage/Custom' ) {
> diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
> new file mode 100644
> index 0000000..370d848
> --- /dev/null
> +++ b/PVE/Storage/BTRFSPlugin.pm
> @@ -0,0 +1,634 @@
> +package PVE::Storage::BTRFSPlugin;
> +
> +use strict;
> +use warnings;
> +
> +use base qw(PVE::Storage::Plugin);
> +
> +use Fcntl qw(S_ISDIR O_WRONLY O_CREAT O_EXCL);
> +use File::Basename qw(dirname);
> +use File::Path qw(mkpath);
> +use IO::Dir;
> +
> +use PVE::Tools qw(run_command);
> +
> +use PVE::Storage::DirPlugin;
> +
> +use constant {
> + BTRFS_FIRST_FREE_OBJECTID => 256,
> + FS_NOCOW_FL => 0x00800000,
> + FS_IOC_GETFLAGS => 0x40086602,
> + FS_IOC_SETFLAGS => 0x80086601,
> +};
> +
> +# Configuration (similar to DirPlugin)
> +
> +sub type {
> + return 'btrfs';
> +}
> +
> +sub plugindata {
> + return {
> + content => [
> + {
> + images => 1,
> + rootdir => 1,
> + vztmpl => 1,
> + iso => 1,
> + backup => 1,
> + snippets => 1,
> + none => 1,
> + },
> + { images => 1, rootdir => 1 },
> + ],
> + format => [ { raw => 1, qcow2 => 1, vmdk => 1, subvol => 1 }, 'raw', ],
I am not really convinced we need this (or rather, that it doesn't
cause more confusion than convenience. if I want to use btrfs without
any btrfs features, I can already use any btrfs-backed dir as dir
storage (just like with ZFS, or $storage-of-my-choice-providing-a-directory).
e.g., having a qcow2 file on btrfs storage support snapshots but not use
btrfs functionality for that seems... weird?
> + };
> +}
> +
> +sub options {
> + return {
> + path => { fixed => 1 },
> + nodes => { optional => 1 },
> + shared => { optional => 1 },
> + disable => { optional => 1 },
> + maxfiles => { optional => 1 },
> + content => { optional => 1 },
> + format => { optional => 1 },
> + is_mountpoint => { optional => 1 },
> + # TODO: The new variant of mkdir with `populate` vs `create`...
> + };
> +}
> +
> +# Storage implementation
> +#
> +# We use the same volume names are directory plugins, but map *raw* disk image file names into a
> +# subdirectory.
> +#
> +# `vm-VMID-disk-ID.raw`
> +# -> `images/VMID/vm-VMID-disk-ID/disk.raw`
> +# where the `vm-VMID-disk-ID/` subdirectory is a btrfs subvolume
> +
> +# Reuse `DirPlugin`'s `check_config`. This simply checks for invalid paths.
> +sub check_config {
> + my ($self, $sectionId, $config, $create, $skipSchemaCheck) = @_;
> + return PVE::Storage::DirPlugin::check_config($self, $sectionId, $config, $create, $skipSchemaCheck);
> +}
> +
> +sub activate_storage {
> + my ($class, $storeid, $scfg, $cache) = @_;
> + return PVE::Storage::DirPlugin::activate_storage($class, $storeid, $scfg, $cache);
shouldn't this check that we can actually do btrfs stuff on $path?
> +}
> +
> +sub status {
> + my ($class, $storeid, $scfg, $cache) = @_;
> + return PVE::Storage::DirPlugin::status($class, $storeid, $scfg, $cache);
> +}
> +
> +# TODO: sub get_volume_notes {}
> +
> +# TODO: sub update_volume_notes {}
> +
> +# croak would not include the caller from within this module
> +sub __error {
> + my ($msg) = @_;
> + my (undef, $f, $n) = caller(1);
> + die "$msg at $f: $n\n";
> +}
this is kind of a new style of error handling - might be worthy of a
mention somewhere ;)
> +
> +# Given a name (eg. `vm-VMID-disk-ID.raw`), take the part up to the format suffix as the name of
> +# the subdirectory (subvolume).
> +sub raw_name_to_dir($) {
> + my ($raw) = @_;
> +
> + # For the subvolume directory Strip the `.<format>` suffix:
> + if ($raw =~ /^(.*)\.raw$/) {
> + return $1;
> + }
> +
> + __error "internal error: bad disk name: $raw";
> +}
> +
> +sub raw_file_to_subvol($) {
> + my ($file) = @_;
> +
> + if ($file =~ m|^(.*)/disk\.raw$|) {
> + return "$1";
> + }
> +
> + __error "internal error: bad raw path: $file";
> +}
> +
> +sub filesystem_path {
> + my ($class, $scfg, $volname, $snapname) = @_;
> +
> + my ($vtype, $name, $vmid, undef, undef, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + my $path = $class->get_subdir($scfg, $vtype);
> +
> + $path .= "/$vmid" if $vtype eq 'images';
> +
> + if ($format eq 'raw') {
> + my $dir = raw_name_to_dir($name);
> + if ($snapname) {
> + $dir .= "\@$snapname";
> + }
> + $path .= "/$dir/disk.raw";
> + } elsif ($format eq 'subvol') {
> + $path .= "/$name";
> + if ($snapname) {
> + $path .= "\@$snapname";
> + }
> + } else {
> + $path .= "/$name";
> + }
> +
> + return wantarray ? ($path, $vmid, $vtype) : $path;
> +}
> +
> +sub btrfs_cmd {
> + my ($class, $cmd, $outfunc) = @_;
> +
> + my $msg = '';
> + my $func;
> + if (defined($outfunc)) {
> + $func = sub {
> + my $part = &$outfunc(@_);
> + $msg .= $part if defined($part);
> + };
> + } else {
> + $func = sub { $msg .= "$_[0]\n" };
> + }
> + run_command(['btrfs', '-q', @$cmd], errmsg => 'btrfs error', outfunc => $func);
> +
> + return $msg;
> +}
> +
> +sub btrfs_get_subvol_id {
> + my ($class, $path) = @_;
> + my $info = $class->btrfs_cmd(['subvolume', 'show', '--', $path]);
> + if ($info !~ /^\s*(?:Object|Subvolume) ID:\s*(\d+)$/m) {
> + die "failed to get btrfs subvolume ID from: $info\n";
ugh at yet another parser for CLI tool output.. (not your fault I
know..)
> + }
> + return $1;
> +}
> +
> +my sub chattr : prototype($$$) {
> + my ($fh, $mask, $xor) = @_;
> +
> + my $flags = pack('L!', 0);
> + ioctl($fh, FS_IOC_GETFLAGS, $flags) or die "FS_IOC_GETFLAGS failed - $!\n";
> + $flags = pack('L!', (unpack('L!', $flags) & $mask) ^ $xor);
> + ioctl($fh, FS_IOC_SETFLAGS, $flags) or die "FS_IOC_SETFLAGS failed - $!\n";
> + return 1;
> +}
> +
> +sub create_base {
> + my ($class, $storeid, $scfg, $volname) = @_;
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + my $newname = $name;
> + $newname =~ s/^vm-/base-/;
> +
> + # If we're not working with a 'raw' file, which is the only thing that's "different" for btrfs,
> + # or a subvolume, we forward to the DirPlugin
> + if ($format ne 'raw' && $format ne 'subvol') {
> + return PVE::Storage::DirPlugin::create_base(@_);
> + }
see above..
> +
> + my $path = $class->filesystem_path($scfg, $volname);
> + my $newvolname = $basename ? "$basevmid/$basename/$vmid/$newname" : "$vmid/$newname";
> + my $newpath = $class->filesystem_path($scfg, $newvolname);
> +
> + my $subvol = $path;
> + my $newsubvol = $newpath;
> + if ($format eq 'raw') {
> + $subvol = raw_file_to_subvol($subvol);
> + $newsubvol = raw_file_to_subvol($newsubvol);
> + }
> +
> + rename($subvol, $newsubvol)
> + || die "rename '$subvol' to '$newsubvol' failed - $!\n";
> + eval { $class->btrfs_cmd(['property', 'set', $newsubvol, 'ro', 'true']) };
> + warn $@ if $@;
> +
> + return $newvolname;
> +}
> +
> +sub clone_image {
> + my ($class, $scfg, $storeid, $volname, $vmid, $snap) = @_;
> +
> + my ($vtype, $basename, $basevmid, undef, undef, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + # If we're not working with a 'raw' file, which is the only thing that's "different" for btrfs,
> + # or a subvolume, we forward to the DirPlugin
> + if ($format ne 'raw' && $format ne 'subvol') {
> + return PVE::Storage::DirPlugin::clone_image(@_);
> + }
see above..
> + my $imagedir = $class->get_subdir($scfg, 'images');
> + $imagedir .= "/$vmid";
> + mkpath $imagedir;
> +
> + my $path = $class->filesystem_path($scfg, $volname);
> + my $newname = $class->find_free_diskname($storeid, $scfg, $vmid, $format, 1);
> +
> + # For btrfs subvolumes we don't actually need the "link":
> + #my $newvolname = "$basevmid/$basename/$vmid/$newname";
> + my $newvolname = "$vmid/$newname";
> + my $newpath = $class->filesystem_path($scfg, $newvolname);
> +
> + my $subvol = $path;
> + my $newsubvol = $newpath;
> + if ($format eq 'raw') {
> + $subvol = raw_file_to_subvol($subvol);
> + $newsubvol = raw_file_to_subvol($newsubvol);
> + }
so actually these are two blocks for getting at $subvol and $newsubvol,
one for raw one for subvol. structuring it as such would make it more
readable IMHO..
> +
> + $class->btrfs_cmd(['subvolume', 'snapshot', '--', $subvol, $newsubvol]);
> +
> + return $newvolname;
> +}
> +
> +sub alloc_image {
> + my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
> +
> + if ($fmt ne 'raw' && $fmt ne 'subvol') {
> + return PVE::Storage::DirPlugin::alloc_image(@_);
> + }
> +
> + # From Plugin.pm:
> +
> + my $imagedir = $class->get_subdir($scfg, 'images') . "/$vmid";
> +
> + mkpath $imagedir;
> +
> + $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt, 1) if !$name;
> +
> + my (undef, $tmpfmt) = PVE::Storage::Plugin::parse_name_dir($name);
> +
> + die "illegal name '$name' - wrong extension for format ('$tmpfmt != '$fmt')\n"
> + if $tmpfmt ne $fmt;
> +
> + # End copy from Plugin.pm
> +
> + my $subvol = "$imagedir/$name";
> + # .raw is not part of the directory name
> + $subvol =~ s/\.raw$//;
why not use your helpers for this (raw_.._to_..)
in a similar vain, there are three "add '/disk.raw' to path" sites
that might warrant a helper..
> +
> + die "disk image '$subvol' already exists\n" if -e $subvol;
> +
> + my $path;
> + if ($fmt eq 'raw') {
> + $path = "$subvol/disk.raw";
> + }
> +
> + if ($fmt eq 'subvol' && !!$size) {
> + # NOTE: `btrfs send/recv` actually drops quota information so supporting subvolumes with
> + # quotas doesn't play nice with send/recv.
> + die "btrfs quotas are currently not supported, use an unsized subvolume or a raw file\n";
> + }
> +
> + $class->btrfs_cmd(['subvolume', 'create', '--', $subvol]);
> +
> + eval {
> + if ($fmt eq 'subvol') {
> + # Nothing to do for now...
> +
> + # This is how we *would* do it:
> + # # Use the subvol's default 0/$id qgroup
> + # eval {
> + # # This call should happen at storage creation instead and therefore governed by a
> + # # configuration option!
> + # # $class->btrfs_cmd(['quota', 'enable', $subvol]);
> + # my $id = $class->btrfs_get_subvol_id($subvol);
> + # $class->btrfs_cmd(['qgroup', 'limit', "${size}k", "0/$id", $subvol]);
> + # };
> + } elsif ($fmt eq 'raw') {
> + sysopen my $fh, $path, O_WRONLY | O_CREAT | O_EXCL
> + or die "failed to create raw file '$path' - $!\n";
> + chattr($fh, ~FS_NOCOW_FL, FS_NOCOW_FL);
> + truncate($fh, $size * 1024)
> + or die "failed to set file size for '$path' - $!\n";
> + close($fh);
> + } else {
> + die "internal format error (format = $fmt)\n";
> + }
> + };
> +
> + if (my $err = $@) {
> + eval { $class->btrfs_cmd(['subvolume', 'delete', '--', $subvol]); };
> + warn $@ if $@;
> + die $err;
> + }
> +
> + return "$vmid/$name";
> +}
> +
> +# Same as btrfsprogs does:
> +my sub path_is_subvolume : prototype($) {
> + my ($path) = @_;
> + my @stat = stat($path)
> + or die "stat failed on '$path' - $!\n";
> + my ($ino, $mode) = @stat[1, 2];
> + return S_ISDIR($mode) && $ino == BTRFS_FIRST_FREE_OBJECTID;
> +}
> +
> +my $BTRFS_VOL_REGEX = qr/((?:vm|base|subvol)-\d+-disk-\d+(?:\.subvol)?)(?:\@(\S+))$/;
> +
> +# Calls `$code->($volume, $name, $snapshot)` for each subvol in a directory matching our volume
> +# regex.
> +my sub foreach_subvol : prototype($$) {
> + my ($dir, $code) = @_;
> +
> + dir_glob_foreach($dir, $BTRFS_VOL_REGEX, sub {
> + my ($volume, $name, $snapshot) = ($1, $2, $3);
> + return if !path_is_subvolume("$dir/$volume");
> + $code->($volume, $name, $snapshot);
> + })
> +}
> +
> +sub free_image {
> + my ($class, $storeid, $scfg, $volname, $isBase, $_format) = @_;
> +
> + my (undef, undef, $vmid, undef, undef, undef, $format) =
> + $class->parse_volname($volname);
> +
> + if ($format ne 'subvol' && $format ne 'raw') {
> + return PVE::Storage::DirPlugin::free_image(@_);
> + }
> +
> + my $path = $class->filesystem_path($scfg, $volname);
> +
> + my $subvol = $path;
> + if ($format eq 'raw') {
> + $subvol = raw_file_to_subvol($path);
> + }
> +
> + my $dir = dirname($subvol);
> + my @snapshot_vols;
> + foreach_subvol($dir, sub {
> + my ($volume, $name, $snapshot) = @_;
> + return if !defined $snapshot;
> + push @snapshot_vols, "$dir/$volume";
> + });
> +
> + $class->btrfs_cmd(['subvolume', 'delete', '--', @snapshot_vols, $subvol]);
> + # try to cleanup directory to not clutter storage with empty $vmid dirs if
> + # all images from a guest got deleted
> + rmdir($dir);
> +
> + return undef;
> +}
> +
> +# Currently not used because quotas clash with send/recv.
> +# my sub btrfs_subvol_quota {
> +# my ($class, $path) = @_;
> +# my $id = '0/' . $class->btrfs_get_subvol_id($path);
> +# my $search = qr/^\Q$id\E\s+(\d)+\s+\d+\s+(\d+)\s*$/;
> +# my ($used, $size);
> +# $class->btrfs_cmd(['qgroup', 'show', '--raw', '-rf', '--', $path], sub {
> +# return if defined($size);
> +# if ($_[0] =~ $search) {
> +# ($used, $size) = ($1, $2);
> +# }
> +# });
> +# if (!defined($size)) {
> +# # syslog should include more information:
> +# syslog('err', "failed to get subvolume size for: $path (id $id)");
> +# # UI should only see the last path component:
> +# $path =~ s|^.*/||;
> +# die "failed to get subvolume size for $path\n";
> +# }
> +# return wantarray ? ($used, $size) : $size;
> +# }
> +
> +sub volume_size_info {
> + my ($class, $scfg, $storeid, $volname, $timeout) = @_;
> +
> + my $path = $class->filesystem_path($scfg, $volname);
> +
> + my $format = ($class->parse_volname($volname))[6];
> +
> + if ($format eq 'subvol') {
> + my $ctime = (stat($path))[10];
> + my ($used, $size) = (0, 0);
> + #my ($used, $size) = btrfs_subvol_quota($class, $path); # uses wantarray
> + return wantarray ? ($size, 'subvol', $used, undef, $ctime) : 1;
> + }
> +
> + return PVE::Storage::Plugin::file_size_info($path, $timeout);
> +}
> +
> +sub volume_resize {
> + my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
> +
> + my $format = ($class->parse_volname($volname))[6];
> + if ($format eq 'subvol') {
> + my $path = $class->filesystem_path($scfg, $volname);
> + my $id = '0/' . $class->btrfs_get_subvol_id($path);
> + $class->btrfs_cmd(['qgroup', 'limit', '--', "${size}k", "0/$id", $path]);
> + return undef;
> + }
> +
> + return PVE::Storage::Plugin::volume_resize(@_);
> +}
> +
> +sub volume_snapshot {
> + my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +
> + my ($name, $vmid, $format) = ($class->parse_volname($volname))[1,2,6];
> + if ($format ne 'subvol' && $format ne 'raw') {
> + return PVE::Storage::Plugin::volume_snapshot(@_);
> + }
> +
> + my $path = $class->filesystem_path($scfg, $volname);
> + my $snap_path = $class->filesystem_path($scfg, $volname, $snap);
> +
> + if ($format eq 'raw') {
> + $path = raw_file_to_subvol($path);
> + $snap_path = raw_file_to_subvol($snap_path);
> + }
this is a repeating pattern (volname + optional snapshot name => path(s)
based on format)
> +
> + my $snapshot_dir = $class->get_subdir($scfg, 'images') . "/$vmid";
> + mkpath $snapshot_dir;
> +
> + $class->btrfs_cmd(['subvolume', 'snapshot', '-r', '--', $path, $snap_path]);
> + return undef;
> +}
> +
> +sub volume_rollback_is_possible {
> + my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +
> + return 1;
snapshot creation was delegated to Plugin.pm for non-raw and non-subvol,
so if we don't drop that part the same logic would need to apply here..
> +}
> +
> +sub volume_snapshot_rollback {
> + my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +
> + my ($name, $format) = ($class->parse_volname($volname))[1,6];
> +
> + if ($format ne 'subvol' && $format ne 'raw') {
> + return PVE::Storage::Plugin::volume_snapshot_rollback(@_);
> + }
> +
> + my $path = $class->filesystem_path($scfg, $volname);
> + my $snap_path = $class->filesystem_path($scfg, $volname, $snap);
> +
> + if ($format eq 'raw') {
> + $path = raw_file_to_subvol($path);
> + $snap_path = raw_file_to_subvol($snap_path);
> + }
> +
> + # Simple version would be:
> + # rename old to temp
> + # create new
> + # on error rename temp back
> + # But for atomicity in case the rename after create-failure *also* fails, we create the new
> + # subvol first, then use RENAME_EXCHANGE,
> + my $tmp_path = "$path.tmp.$$";
> + $class->btrfs_cmd(['subvolume', 'snapshot', '--', $snap_path, $tmp_path]);
> + # The paths are absolute, so pass -1 as file descriptors.
> + my $ok = PVE::Tools::renameat2(-1, $tmp_path, -1, $path, &PVE::Tools::RENAME_EXCHANGE);
> +
> + eval { $class->btrfs_cmd(['subvolume', 'delete', '--', $tmp_path]) };
> + warn "failed to remove '$tmp_path' subvolume: $@" if $@;
> +
> + if (!$ok) {
> + die "failed to rotate '$tmp_path' into place at '$path' - $!\n";
> + }
> +
> + return undef;
> +}
> +
> +sub volume_snapshot_delete {
> + my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
> +
> + my ($name, $vmid, $format) = ($class->parse_volname($volname))[1,2,6];
> +
> + if ($format ne 'subvol' && $format ne 'raw') {
> + return PVE::Storage::Plugin::volume_snapshot_delete(@_);
> + }
> +
> + my $path = $class->filesystem_path($scfg, $volname, $snap);
> +
> + if ($format eq 'raw') {
> + $path = raw_file_to_subvol($path);
> + }
> +
> + $class->btrfs_cmd(['subvolume', 'delete', '--', $path]);
> +
> + return undef;
> +}
> +
> +sub volume_has_feature {
> + my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
> +
> + my $features = {
> + snapshot => {
> + current => { qcow2 => 1, raw => 1, subvol => 1 },
> + snap => { qcow2 => 1, raw => 1, subvol => 1 }
> + },
> + clone => {
> + base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> + current => { raw => 1 },
> + snap => { raw => 1 },
> + },
> + template => { current => { qcow2 => 1, raw => 1, vmdk => 1, subvol => 1 } },
> + copy => {
> + base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> + current => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> + snap => { qcow2 => 1, raw => 1, subvol => 1 },
> + },
> + sparseinit => { base => {qcow2 => 1, raw => 1, vmdk => 1 },
> + current => {qcow2 => 1, raw => 1, vmdk => 1 } },
> + };
some of that is duplicated from the regular plugin, so if we don't drop
the qcow2/.. support here we need to forward those based on $format..
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + my $key = undef;
> + if ($snapname) {
> + $key = 'snap';
> + } else {
> + $key = $isBase ? 'base' : 'current';
> + }
> +
> + return 1 if defined($features->{$feature}->{$key}->{$format});
> +
> + return undef;
> +}
> +
> +sub list_images {
> + my ($class, $storeid, $scfg, $vmid, $vollist, $cache) = @_;
> + my $imagedir = $class->get_subdir($scfg, 'images');
> +
> + my $res = [];
> +
> + # Copied from Plugin.pm, with file_size_info calls adapted:
> + foreach my $fn (<$imagedir/[0-9][0-9]*/*>) {
> + # different to in Plugin.pm the regex below also excludes '@' as valid file name
> + next if $fn !~ m@^(/.+/(\d+)/([^/\@.]+(?:\.(qcow2|vmdk|subvol))?))$@;
> + $fn = $1; # untaint
> +
> + my $owner = $2;
> + my $name = $3;
> + my $ext = $4;
> +
> + next if !$vollist && defined($vmid) && ($owner ne $vmid);
> +
> + my $volid = "$storeid:$owner/$name";
> + my ($size, $format, $used, $parent, $ctime);
> +
> + if (!$ext) { # raw
> + $volid .= '.raw';
> + ($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info("$fn/disk.raw");
> + } elsif ($ext eq 'subvol') {
> + ($used, $size) = (0, 0);
> + #($used, $size) = btrfs_subvol_quota($class, $fn);
> + $format = 'subvol';
> + } else {
> + ($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info($fn);
> + }
> + next if !($format && defined($size));
> +
> + if ($vollist) {
> + next if ! grep { $_ eq $volid } @$vollist;
> + }
> +
> + my $info = {
> + volid => $volid, format => $format,
> + size => $size, vmid => $owner, used => $used, parent => $parent,
> + };
> +
> + $info->{ctime} = $ctime if $ctime;
> +
> + push @$res, $info;
> + }
> +
> + return $res;
> +}
> +
> +# For now we don't implement `btrfs send/recv` as it needs some updates to our import/export API
> +# first!
> +
> +sub volume_export_formats {
> + return PVE::Storage::DirPlugin::volume_export_formats(@_);
> +}
> +
> +sub volume_export {
> + return PVE::Storage::DirPlugin::volume_export(@_);
> +}
> +
> +sub volume_import_formats {
> + return PVE::Storage::DirPlugin::volume_import_formats(@_);
> +}
> +
> +sub volume_import {
> + return PVE::Storage::DirPlugin::volume_import(@_);
> +}
> +
> +1
> diff --git a/PVE/Storage/Makefile b/PVE/Storage/Makefile
> index 91b9238..857c485 100644
> --- a/PVE/Storage/Makefile
> +++ b/PVE/Storage/Makefile
> @@ -12,6 +12,7 @@ SOURCES= \
> ZFSPoolPlugin.pm \
> ZFSPlugin.pm \
> PBSPlugin.pm \
> + BTRFSPlugin.pm \
> LvmThinPlugin.pm
>
> .PHONY: install
> --
> 2.30.2
>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [pve-devel] [PATCH v2 storage 3/5] btrfs: add 'btrfs' import/export format
2021-06-23 12:14 ` Fabian Grünbichler
@ 2021-06-23 12:30 ` Wolfgang Bumiller
0 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-23 12:30 UTC (permalink / raw)
To: Fabian Grünbichler; +Cc: Proxmox VE development discussion
On Wed, Jun 23, 2021 at 02:14:48PM +0200, Fabian Grünbichler wrote:
> On June 22, 2021 2:18 pm, Wolfgang Bumiller wrote:
> > Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
(...)
> > +sub volume_export {
> > + my (
> > + $class,
> > + $scfg,
> > + $storeid,
> > + $fh,
> > + $volname,
> > + $format,
> > + $snapshot,
> > + $base_snapshot,
> > + $with_snapshots,
> > + ) = @_;
> > +
> > + if ($format ne 'btrfs') {
> > + return PVE::Storage::Plugin::volume_export(@_);
> > + }
> > +
> > + die "format 'btrfs' only works on snapshots\n"
> > + if !defined $snapshot;
> > +
> > + die "'btrfs' format in incremental mode requires snapshots to be listed explicitly\n"
> > + if defined($base_snapshot) && $with_snapshots && ref($with_snapshots) ne 'ARRAY';
> > +
> > + my $volume_format = ($class->parse_volname($volname))[6];
> > +
> > + die "btrfs-sending volumes of type $volume_format ('$volname') is not supported\n"
> > + if $volume_format ne 'raw' && $volume_format ne 'subvol';
>
> the three checks here are the same as in volume_export, maybe they could
> go into a single export-helper?
we get nicer errors this way, though (and more easily placed in the code
when a user runs into them, since we don't use Carp ;-) )
(...)
> > +
> > + # Now go through the remaining snapshots (if any)
> > + foreach my $snap (@snapshots) {
> > + $class->btrfs_cmd(['property', 'set', "$tmppath/$diskname\@$snap", 'ro', 'false']);
> > + PVE::Tools::renameat2(
> > + -1,
> > + "$tmppath/$diskname\@$snap",
> > + -1,
> > + "$destination\@$snap",
> > + &PVE::Tools::RENAME_NOREPLACE,
> > + ) or die "failed to move received snapshot '$tmppath/$diskname\@$snap'"
> > + . " into place at '$destination\@$snap' - $!\n";
> > + eval { $class->btrfs_cmd(['property', 'set', "$destination\@$snap", 'ro', 'true']) };
> > + warn "failed to make $destination\@$snap read-only - $!\n" if $@;
> > + }
> > + };
> > + my $err = $@;
> > +
> > + eval {
> > + # Cleanup all the received snapshots we did not move into place, so we can remove the temp
> > + # directory.
> > + if ($dh) {
> > + $dh->rewind;
> > + while (defined(my $entry = $dh->read)) {
> > + next if $entry eq '.' || $entry eq '..';
> > + eval { $class->btrfs_cmd(['subvolume', 'delete', '--', "$tmppath/$entry"]) };
> > + warn $@ if $@;
> > + }
> > + $dh->close; undef $dh;
> > + }
> > + if (!rmdir($tmppath)) {
> > + warn "failed to remove temporary directory '$tmppath' - $!\n"
> > + }
> > + };
> > + warn $@ if $@;
> > + if ($err) {
> > + # clean up if the directory ended up being empty after an error
> > + rmdir($tmppath);
> > + die $err;
> > + }
>
> I am a bit wary about this (also w.r.t. future btrfs features) - a
> privileged container can do quite a lot of stuff with btrfs
>
> - create snapshots from within container
> - set default subvolume
> - set properties
> - ... (the above is what I found in a few minutes without much BTRFS
> experience)
>
> unprivileged containers can do the following
> - create new subvols
> - create new snapshots
> - set ro property
>
> (same caveat of not looking too much at what else is possible, lots of
> stuff does not work)
>
> how do we guarantee that the send stream is not affected in a dangerous
> fashion by an attacker inside the container?
Shouldn't be dangerous sine it's not recursive anyway.
However, we could just by default disable the corresponding ioctls via
seccomp for unprivileged containers...
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [pve-devel] [PATCH v2 container 1/3] migration: fix snapshots boolean accounting
2021-06-23 10:59 ` Fabian Grünbichler
@ 2021-06-23 12:31 ` Wolfgang Bumiller
0 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-23 12:31 UTC (permalink / raw)
To: Fabian Grünbichler; +Cc: Proxmox VE development discussion
On Wed, Jun 23, 2021 at 12:59:58PM +0200, Fabian Grünbichler wrote:
> some indication on what was broken would be nice..
>
> AFAICT, storage_migrate fixes this to a boolean anyway, and this is only
> passed to storage_migrate, so.. !?
The type is not the problem. It's assigned - as in overwritten - in each
check, and it's called last for the current state, so this was always
set to false.
>
> On June 22, 2021 2:18 pm, Wolfgang Bumiller wrote:
> > Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> > ---
> > No changes to v1.
> >
> > src/PVE/LXC/Migrate.pm | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
> > index 3cd895d..ce1f7dd 100644
> > --- a/src/PVE/LXC/Migrate.pm
> > +++ b/src/PVE/LXC/Migrate.pm
> > @@ -146,7 +146,7 @@ sub phase1 {
> > }
> >
> > $volhash->{$volid}->{ref} = defined($snapname) ? 'snapshot' : 'config';
> > - $volhash->{$volid}->{snapshots} = defined($snapname);
> > + $volhash->{$volid}->{snapshots} = 1 if defined($snapname);
> >
> > my ($path, $owner) = PVE::Storage::path($self->{storecfg}, $volid);
> >
> > --
> > 2.30.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [pve-devel] [PATCH v2 storage 1/5] add BTRFS storage plugin
2021-06-23 12:15 ` Fabian Grünbichler
@ 2021-06-23 12:43 ` Wolfgang Bumiller
0 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Bumiller @ 2021-06-23 12:43 UTC (permalink / raw)
To: Fabian Grünbichler; +Cc: Proxmox VE development discussion
On Wed, Jun 23, 2021 at 02:15:00PM +0200, Fabian Grünbichler wrote:
> On June 22, 2021 2:18 pm, Wolfgang Bumiller wrote:
(...)
> > +sub plugindata {
> > + return {
> > + content => [
> > + {
> > + images => 1,
> > + rootdir => 1,
> > + vztmpl => 1,
> > + iso => 1,
> > + backup => 1,
> > + snippets => 1,
> > + none => 1,
> > + },
> > + { images => 1, rootdir => 1 },
> > + ],
> > + format => [ { raw => 1, qcow2 => 1, vmdk => 1, subvol => 1 }, 'raw', ],
>
> I am not really convinced we need this (or rather, that it doesn't
> cause more confusion than convenience. if I want to use btrfs without
> any btrfs features, I can already use any btrfs-backed dir as dir
> storage (just like with ZFS, or $storage-of-my-choice-providing-a-directory).
>
> e.g., having a qcow2 file on btrfs storage support snapshots but not use
> btrfs functionality for that seems... weird?
It's a directory storage after all.
But I really don't mind much if we just remove that...
>
> > + };
> > +}
> > +
> > +sub options {
> > + return {
> > + path => { fixed => 1 },
> > + nodes => { optional => 1 },
> > + shared => { optional => 1 },
> > + disable => { optional => 1 },
> > + maxfiles => { optional => 1 },
> > + content => { optional => 1 },
> > + format => { optional => 1 },
> > + is_mountpoint => { optional => 1 },
> > + # TODO: The new variant of mkdir with `populate` vs `create`...
> > + };
> > +}
> > +
> > +# Storage implementation
> > +#
> > +# We use the same volume names are directory plugins, but map *raw* disk image file names into a
> > +# subdirectory.
> > +#
> > +# `vm-VMID-disk-ID.raw`
> > +# -> `images/VMID/vm-VMID-disk-ID/disk.raw`
> > +# where the `vm-VMID-disk-ID/` subdirectory is a btrfs subvolume
> > +
> > +# Reuse `DirPlugin`'s `check_config`. This simply checks for invalid paths.
> > +sub check_config {
> > + my ($self, $sectionId, $config, $create, $skipSchemaCheck) = @_;
> > + return PVE::Storage::DirPlugin::check_config($self, $sectionId, $config, $create, $skipSchemaCheck);
> > +}
> > +
> > +sub activate_storage {
> > + my ($class, $storeid, $scfg, $cache) = @_;
> > + return PVE::Storage::DirPlugin::activate_storage($class, $storeid, $scfg, $cache);
>
> shouldn't this check that we can actually do btrfs stuff on $path?
Right. I had a method for this at some point but hadn't used it here.
Should be easily added in a follow up.
>
> > +}
> > +
> > +sub status {
> > + my ($class, $storeid, $scfg, $cache) = @_;
> > + return PVE::Storage::DirPlugin::status($class, $storeid, $scfg, $cache);
> > +}
> > +
> > +# TODO: sub get_volume_notes {}
> > +
> > +# TODO: sub update_volume_notes {}
> > +
> > +# croak would not include the caller from within this module
> > +sub __error {
> > + my ($msg) = @_;
> > + my (undef, $f, $n) = caller(1);
> > + die "$msg at $f: $n\n";
> > +}
>
> this is kind of a new style of error handling - might be worthy of a
> mention somewhere ;)
debug helper... shouldn't be possible to reach this... I should probably
prefix the 2 subs below with a `my` though...
(I mostly ran into such stuff while playing with different ways of
naming/organising snapshot directories and not seeing *where* stuff
happened was annoying...)
>
> > +
> > +# Given a name (eg. `vm-VMID-disk-ID.raw`), take the part up to the format suffix as the name of
> > +# the subdirectory (subvolume).
> > +sub raw_name_to_dir($) {
> > + my ($raw) = @_;
> > +
> > + # For the subvolume directory Strip the `.<format>` suffix:
> > + if ($raw =~ /^(.*)\.raw$/) {
> > + return $1;
> > + }
> > +
> > + __error "internal error: bad disk name: $raw";
> > +}
> > +
> > +sub raw_file_to_subvol($) {
> > + my ($file) = @_;
> > +
> > + if ($file =~ m|^(.*)/disk\.raw$|) {
> > + return "$1";
> > + }
> > +
> > + __error "internal error: bad raw path: $file";
> > +}
> > +
(...)
> > +
> > +sub btrfs_get_subvol_id {
> > + my ($class, $path) = @_;
> > + my $info = $class->btrfs_cmd(['subvolume', 'show', '--', $path]);
> > + if ($info !~ /^\s*(?:Object|Subvolume) ID:\s*(\d+)$/m) {
> > + die "failed to get btrfs subvolume ID from: $info\n";
>
> ugh at yet another parser for CLI tool output.. (not your fault I
> know..)
This one I can actually replace. I just am not sure if I want to do it
all in pure perl or add some btrfs helper to the emerging `pve-rs` lib,
and since this code already existed in the old series, I postponed that
decision.
(...)
> > + my $imagedir = $class->get_subdir($scfg, 'images');
> > + $imagedir .= "/$vmid";
> > + mkpath $imagedir;
> > +
> > + my $path = $class->filesystem_path($scfg, $volname);
> > + my $newname = $class->find_free_diskname($storeid, $scfg, $vmid, $format, 1);
> > +
> > + # For btrfs subvolumes we don't actually need the "link":
> > + #my $newvolname = "$basevmid/$basename/$vmid/$newname";
> > + my $newvolname = "$vmid/$newname";
> > + my $newpath = $class->filesystem_path($scfg, $newvolname);
> > +
> > + my $subvol = $path;
> > + my $newsubvol = $newpath;
> > + if ($format eq 'raw') {
> > + $subvol = raw_file_to_subvol($subvol);
> > + $newsubvol = raw_file_to_subvol($newsubvol);
> > + }
>
> so actually these are two blocks for getting at $subvol and $newsubvol,
> one for raw one for subvol. structuring it as such would make it more
> readable IMHO..
Yeah I suppose those lines could look nicer. Also happens another
time...
>
> > +
> > + $class->btrfs_cmd(['subvolume', 'snapshot', '--', $subvol, $newsubvol]);
> > +
> > + return $newvolname;
> > +}
> > +
> > +sub alloc_image {
> > + my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
> > +
> > + if ($fmt ne 'raw' && $fmt ne 'subvol') {
> > + return PVE::Storage::DirPlugin::alloc_image(@_);
> > + }
> > +
> > + # From Plugin.pm:
> > +
> > + my $imagedir = $class->get_subdir($scfg, 'images') . "/$vmid";
> > +
> > + mkpath $imagedir;
> > +
> > + $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt, 1) if !$name;
> > +
> > + my (undef, $tmpfmt) = PVE::Storage::Plugin::parse_name_dir($name);
> > +
> > + die "illegal name '$name' - wrong extension for format ('$tmpfmt != '$fmt')\n"
> > + if $tmpfmt ne $fmt;
> > +
> > + # End copy from Plugin.pm
> > +
> > + my $subvol = "$imagedir/$name";
> > + # .raw is not part of the directory name
> > + $subvol =~ s/\.raw$//;
>
> why not use your helpers for this (raw_.._to_..)
Here it can refer to a `subvol` as well which the helper doesn't allow,
but sure, could be improved.
> in a similar vain, there are three "add '/disk.raw' to path" sites
> that might warrant a helper..
ok
(...)
> > +sub volume_snapshot {
> > + my ($class, $scfg, $storeid, $volname, $snap) = @_;
> > +
> > + my ($name, $vmid, $format) = ($class->parse_volname($volname))[1,2,6];
> > + if ($format ne 'subvol' && $format ne 'raw') {
> > + return PVE::Storage::Plugin::volume_snapshot(@_);
> > + }
> > +
> > + my $path = $class->filesystem_path($scfg, $volname);
> > + my $snap_path = $class->filesystem_path($scfg, $volname, $snap);
> > +
> > + if ($format eq 'raw') {
> > + $path = raw_file_to_subvol($path);
> > + $snap_path = raw_file_to_subvol($snap_path);
> > + }
>
> this is a repeating pattern (volname + optional snapshot name => path(s)
> based on format)
yup, mentioned it above
>
> > +
> > + my $snapshot_dir = $class->get_subdir($scfg, 'images') . "/$vmid";
> > + mkpath $snapshot_dir;
> > +
> > + $class->btrfs_cmd(['subvolume', 'snapshot', '-r', '--', $path, $snap_path]);
> > + return undef;
> > +}
> > +
> > +sub volume_rollback_is_possible {
> > + my ($class, $scfg, $storeid, $volname, $snap) = @_;
> > +
> > + return 1;
>
> snapshot creation was delegated to Plugin.pm for non-raw and non-subvol,
> so if we don't drop that part the same logic would need to apply here..
Ah, good catch.
(...)
> > +sub volume_has_feature {
> > + my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
> > +
> > + my $features = {
> > + snapshot => {
> > + current => { qcow2 => 1, raw => 1, subvol => 1 },
> > + snap => { qcow2 => 1, raw => 1, subvol => 1 }
> > + },
> > + clone => {
> > + base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> > + current => { raw => 1 },
> > + snap => { raw => 1 },
> > + },
> > + template => { current => { qcow2 => 1, raw => 1, vmdk => 1, subvol => 1 } },
> > + copy => {
> > + base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> > + current => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> > + snap => { qcow2 => 1, raw => 1, subvol => 1 },
> > + },
> > + sparseinit => { base => {qcow2 => 1, raw => 1, vmdk => 1 },
> > + current => {qcow2 => 1, raw => 1, vmdk => 1 } },
> > + };
>
> some of that is duplicated from the regular plugin, so if we don't drop
> the qcow2/.. support here we need to forward those based on $format..
good point
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [pve-devel] [PATCH v2 container 3/3] special case btrfs+quotas to use subvolumes
2021-06-22 12:18 ` [pve-devel] [PATCH v2 container 3/3] special case btrfs+quotas to use subvolumes Wolfgang Bumiller
@ 2021-06-23 13:51 ` Fabian Grünbichler
0 siblings, 0 replies; 19+ messages in thread
From: Fabian Grünbichler @ 2021-06-23 13:51 UTC (permalink / raw)
To: Proxmox VE development discussion; +Cc: Wolfgang Bumiller
On June 22, 2021 2:18 pm, Wolfgang Bumiller wrote:
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
> New in v2.
>
> src/PVE/LXC.pm | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
> index 0a8a532..fc06842 100644
> --- a/src/PVE/LXC.pm
> +++ b/src/PVE/LXC.pm
> @@ -1893,7 +1893,7 @@ sub alloc_disk {
> eval {
> my $do_format = 0;
> if ($scfg->{content}->{rootdir} && $scfg->{path}) {
> - if ($size_kb > 0) {
> + if ($size_kb > 0 && !($scfg->{type} eq 'btrfs' && $scfg->{quotas})) {
this is actually not enough.. btrfs with quotas now creates subvols with
size 0 (== no quota), and then everything fails because the storage
plugin expects a quota to be there but there is none. even with that
fixed up, at least resize_disk is still broken:
$ pct resize 123 mp0 +1G
btrfs error: ERROR: invalid qgroupid or subvolume path: 0/0/286
the actual qgroupid should be 0/286..
> $volid = PVE::Storage::vdisk_alloc($storecfg, $storage, $vmid, 'raw', undef, $size_kb);
> $do_format = 1;
> } else {
> --
> 2.30.2
>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [pve-devel] applied-partially: [PATCH v2 multiple] btrfs, file system for the brave
2021-06-22 12:18 [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave Wolfgang Bumiller
` (8 preceding siblings ...)
2021-06-22 12:18 ` [pve-devel] [PATCH v2 qemu-server] allow migrating raw btrfs volumes Wolfgang Bumiller
@ 2021-06-23 20:19 ` Thomas Lamprecht
9 siblings, 0 replies; 19+ messages in thread
From: Thomas Lamprecht @ 2021-06-23 20:19 UTC (permalink / raw)
To: Proxmox VE development discussion, Wolfgang Bumiller
applied all from pve-container, and all but 5/9 from pve-storage, Fabian reported
that he run into quite some issues using subvols, most of them exposed by that patch.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-06-23 20:19 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-22 12:18 [pve-devel] [PATCH v2 multiple] btrfs, file system for the brave Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 1/5] add BTRFS storage plugin Wolfgang Bumiller
2021-06-23 12:15 ` Fabian Grünbichler
2021-06-23 12:43 ` Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 2/5] bump storage API: update import/export methods Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 3/5] btrfs: add 'btrfs' import/export format Wolfgang Bumiller
2021-06-23 12:14 ` Fabian Grünbichler
2021-06-23 12:30 ` Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 4/5] btrfs: make NOCOW optional Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 storage 5/5] btrfs: support quota-based subvols optionally Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 container 1/3] migration: fix snapshots boolean accounting Wolfgang Bumiller
2021-06-23 10:59 ` Fabian Grünbichler
2021-06-23 12:31 ` Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 container 2/3] enable btrfs support via subvolumes Wolfgang Bumiller
2021-06-22 12:18 ` [pve-devel] [PATCH v2 container 3/3] special case btrfs+quotas to use subvolumes Wolfgang Bumiller
2021-06-23 13:51 ` Fabian Grünbichler
2021-06-22 12:18 ` [pve-devel] [PATCH v2 qemu-server] allow migrating raw btrfs volumes Wolfgang Bumiller
2021-06-23 10:51 ` [pve-devel] applied: " Thomas Lamprecht
2021-06-23 20:19 ` [pve-devel] applied-partially: [PATCH v2 multiple] btrfs, file system for the brave Thomas Lamprecht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox