public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH multiple] btrfs, file system for the brave
@ 2021-06-09 13:18 Wolfgang Bumiller
  2021-06-09 13:18 ` [pve-devel] [PATCH common] Syscalls/Tools: add renameat2 Wolfgang Bumiller
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Wolfgang Bumiller @ 2021-06-09 13:18 UTC (permalink / raw)
  To: pve-devel

This is another take at btrfs storage support.
I wouldn't exactly call it great, but I guess it works (although I did
manage to break a few... Then again I also manged to do that with ZFS
(it just took a few years longer there...)).

This one's spread over quite a few repositories, so let's go through
them in apply-order:

* pve-common:

  One nice improvement since the last go around is that by now btrfs
  supports renameat2's `RENAME_EXCHANGE` flag.

  * PATCH 1/1: Syscalls/Tools: add renameat2

    The idea here is to have a more robust "rollback" implementation,
    since "snapshots" in btrfs are really just losely connected
    subvolumes, and there is no direct rollback functionality.  Instead,
    we simply clone the snapshot we want to roll back to (by making a
    writable snapshot), and then rotate the clone into place before
    cleaning up the now-old version.
    Without `RENAME_EXCHANGE` this rotation required 2 syscalls
    creating a small window where, if the process is stopped/killed,
    the volume we're working on would not live in its designated place,
    making it somewhat nasty to deal with. Now, the worst that happens
    is an extra left-over snapshot lying around.

* pve-storage:

  * PATCH 1/4: fix find_free_disk_name invocations

    Just a non-issue I ran into (the parameter isn't actually used by
    our implementors currently, but it confused me ;-) ).

  * PATCH 2/4: add BTRFS storage plugin

    The main implementation with btrfs send/recv saved up for patch 4.
    (There's a note about `mkdir` vs `populate` etc., I intend to clean
    this up later, we had some off-list discussion about this
    already...)

    Currently, container subvolumes are only allowed to be unsized
    (size zero, like with our plain directory storage subvols), though
    we *could* enable quota support with little effort, but quota
    information is lost in send/recv operations, so we need to cover
    this in our import/export format separately, if we want to.
    (Although I have a feeling it wouldn't be nice for performance
    anyway...)

  * PATCH 3/4: update import/export storage API

    _Technically_ I *could* do without, but it would be quite
    inconvenient, and the information it adds to the methods is usually
    readily available, so I think this makes sense.

  * PATCH 4/4: btrfs: add 'btrfs' import/export format

    This requires a bit more elbow grease than ZFS, though, so I split
    this out into a separate patch.

* pve-container:

  * PATCH 1/2: migration: fix snapshots boolean accounting

    (The `with_snapshots` parameter is otherways not set correctly
    since we handle the base volume last)

  * PATCH 2/2: enable btrfs support via subvolumes

    Some of this stuff should probably become a storage property...
    For container volumes which aren't _unsized_ this still allocates
    an ext4 formatted raw image. For size=0 volumes we'll have an
    actual btrfs subvolume.

* qemu-server:

  * PATCH 1/1: allow migrating raw btrfs volumes

    Like in pve-container, some of this stuff should probably become a
    storage property...

--
Big Terrifying Risky File System




^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH common] Syscalls/Tools: add renameat2
  2021-06-09 13:18 [pve-devel] [PATCH multiple] btrfs, file system for the brave Wolfgang Bumiller
@ 2021-06-09 13:18 ` Wolfgang Bumiller
  2021-06-15 12:35   ` [pve-devel] applied: " Thomas Lamprecht
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 1/4] fix find_free_disk_name invocations Wolfgang Bumiller
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Wolfgang Bumiller @ 2021-06-09 13:18 UTC (permalink / raw)
  To: pve-devel

Mostly for the ability to atomically swap files.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
 src/PVE/Syscall.pm |  1 +
 src/PVE/Tools.pm   | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/src/PVE/Syscall.pm b/src/PVE/Syscall.pm
index 2d5019f..10e185d 100644
--- a/src/PVE/Syscall.pm
+++ b/src/PVE/Syscall.pm
@@ -17,6 +17,7 @@ BEGIN {
 	setresuid => &SYS_setresuid,
 	fchownat => &SYS_fchownat,
 	mount => &SYS_mount,
+	renameat2 => &SYS_renameat2,
 
 	# These use asm-generic, so they're the same across (sane) architectures. We use numbers
 	# since they're not in perl's syscall.ph yet...
diff --git a/src/PVE/Tools.pm b/src/PVE/Tools.pm
index 16ae3d2..63bf075 100644
--- a/src/PVE/Tools.pm
+++ b/src/PVE/Tools.pm
@@ -104,6 +104,11 @@ use constant {O_PATH    => 0x00200000,
 use constant {AT_EMPTY_PATH => 0x1000,
               AT_FDCWD => -100};
 
+# from <linux/fs.h>
+use constant {RENAME_NOREPLACE => (1 << 0),
+              RENAME_EXCHANGE  => (1 << 1),
+              RENAME_WHITEOUT  => (1 << 2)};
+
 sub run_with_timeout {
     my ($timeout, $code, @param) = @_;
 
@@ -1461,6 +1466,11 @@ sub fsync($) {
     return 0 == syscall(PVE::Syscall::fsync, $fileno);
 }
 
+sub renameat2($$$$$) {
+    my ($olddirfd, $oldpath, $newdirfd, $newpath, $flags) = @_;
+    return 0 == syscall(PVE::Syscall::renameat2, $olddirfd, $oldpath, $newdirfd, $newpath, $flags);
+}
+
 sub sync_mountpoint {
     my ($path) = @_;
     sysopen my $fd, $path, O_RDONLY|O_CLOEXEC or die "failed to open $path: $!\n";
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH storage 1/4] fix find_free_disk_name invocations
  2021-06-09 13:18 [pve-devel] [PATCH multiple] btrfs, file system for the brave Wolfgang Bumiller
  2021-06-09 13:18 ` [pve-devel] [PATCH common] Syscalls/Tools: add renameat2 Wolfgang Bumiller
@ 2021-06-09 13:18 ` Wolfgang Bumiller
  2021-06-15 12:36   ` [pve-devel] applied: " Thomas Lamprecht
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 2/4] add BTRFS storage plugin Wolfgang Bumiller
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Wolfgang Bumiller @ 2021-06-09 13:18 UTC (permalink / raw)
  To: pve-devel

The interface takes the storeid now, not the image dir.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
 PVE/Storage/GlusterfsPlugin.pm | 4 ++--
 PVE/Storage/Plugin.pm          | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/PVE/Storage/GlusterfsPlugin.pm b/PVE/Storage/GlusterfsPlugin.pm
index 5ec2f42..599bca2 100644
--- a/PVE/Storage/GlusterfsPlugin.pm
+++ b/PVE/Storage/GlusterfsPlugin.pm
@@ -215,7 +215,7 @@ sub clone_image {
 
     mkpath $imagedir;
 
-    my $name = $class->find_free_diskname($imagedir, $scfg, $vmid, "qcow2", 1);
+    my $name = $class->find_free_diskname($storeid, $scfg, $vmid, "qcow2", 1);
 
     warn "clone $volname: $vtype, $name, $vmid to $name (base=../$basevmid/$basename)\n";
 
@@ -243,7 +243,7 @@ sub alloc_image {
 
     mkpath $imagedir;
 
-    $name = $class->find_free_diskname($imagedir, $scfg, $vmid, $fmt, 1) if !$name;
+    $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt, 1) if !$name;
 
     my (undef, $tmpfmt) = parse_name_dir($name);
 
diff --git a/PVE/Storage/Plugin.pm b/PVE/Storage/Plugin.pm
index 4a10a1f..318d13a 100644
--- a/PVE/Storage/Plugin.pm
+++ b/PVE/Storage/Plugin.pm
@@ -695,7 +695,7 @@ sub clone_image {
 
     mkpath $imagedir;
 
-    my $name = $class->find_free_diskname($imagedir, $scfg, $vmid, "qcow2", 1);
+    my $name = $class->find_free_diskname($storeid, $scfg, $vmid, "qcow2", 1);
 
     warn "clone $volname: $vtype, $name, $vmid to $name (base=../$basevmid/$basename)\n";
 
@@ -727,7 +727,7 @@ sub alloc_image {
 
     mkpath $imagedir;
 
-    $name = $class->find_free_diskname($imagedir, $scfg, $vmid, $fmt, 1) if !$name;
+    $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt, 1) if !$name;
 
     my (undef, $tmpfmt) = parse_name_dir($name);
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH storage 2/4] add BTRFS storage plugin
  2021-06-09 13:18 [pve-devel] [PATCH multiple] btrfs, file system for the brave Wolfgang Bumiller
  2021-06-09 13:18 ` [pve-devel] [PATCH common] Syscalls/Tools: add renameat2 Wolfgang Bumiller
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 1/4] fix find_free_disk_name invocations Wolfgang Bumiller
@ 2021-06-09 13:18 ` Wolfgang Bumiller
  2021-06-10 12:40   ` Fabian Grünbichler
  2021-06-11 12:11   ` Fabian Ebner
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 3/4] update import/export storage API Wolfgang Bumiller
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 17+ messages in thread
From: Wolfgang Bumiller @ 2021-06-09 13:18 UTC (permalink / raw)
  To: pve-devel

This is mostly the same as a directory storage, with 2 major
differences:

* 'subvol' volumes are actual btrfs subvolumes and therefore
  allow snapshots
* 'raw' files are placed *into* a subvolume and therefore
  also allow snapshots, the raw file for volume
  `btrstore:100/vm-100-disk-1.raw` can be found under
  `$path/images/100/vm-100-disk-1/disk.raw`
* snapshots add an '@name' suffix to the subvolume's name,
  so snapshot 'foo' of the above would be found under
  `$path/images/100/vm-100-disk-1@foo/disk.raw`

Note that qgroups aren't included in btrfs-send streams,
therefore for now we will only be using *unsized* subvolumes
for containers and place a regular raw+ext4 file for sized
containers.
We could extend the import/export stream format to include
the information at the front (similar to how we do the
"tar+size" format, but we need to include the size of all
the contained snapshots as well, since they can technically
change). (But before enabling quotas we should do some
performance testing on bigger file systems with multiple
snapshots as there are quite a few reports of the fs slowing
down considerably in such scenarios).

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
 PVE/Storage.pm             |   2 +
 PVE/Storage/BTRFSPlugin.pm | 616 +++++++++++++++++++++++++++++++++++++
 PVE/Storage/Makefile       |   1 +
 3 files changed, 619 insertions(+)
 create mode 100644 PVE/Storage/BTRFSPlugin.pm

diff --git a/PVE/Storage.pm b/PVE/Storage.pm
index aa36bad..f9f8d16 100755
--- a/PVE/Storage.pm
+++ b/PVE/Storage.pm
@@ -38,6 +38,7 @@ use PVE::Storage::GlusterfsPlugin;
 use PVE::Storage::ZFSPoolPlugin;
 use PVE::Storage::ZFSPlugin;
 use PVE::Storage::PBSPlugin;
+use PVE::Storage::BTRFSPlugin;
 
 # Storage API version. Increment it on changes in storage API interface.
 use constant APIVER => 8;
@@ -60,6 +61,7 @@ PVE::Storage::GlusterfsPlugin->register();
 PVE::Storage::ZFSPoolPlugin->register();
 PVE::Storage::ZFSPlugin->register();
 PVE::Storage::PBSPlugin->register();
+PVE::Storage::BTRFSPlugin->register();
 
 # load third-party plugins
 if ( -d '/usr/share/perl5/PVE/Storage/Custom' ) {
diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
new file mode 100644
index 0000000..733be6e
--- /dev/null
+++ b/PVE/Storage/BTRFSPlugin.pm
@@ -0,0 +1,616 @@
+package PVE::Storage::BTRFSPlugin;
+
+use strict;
+use warnings;
+
+use base qw(PVE::Storage::Plugin);
+
+use Fcntl qw(S_ISDIR);
+use File::Basename qw(dirname);
+use File::Path qw(mkpath);
+use IO::Dir;
+
+use PVE::Tools qw(run_command);
+
+use PVE::Storage::DirPlugin;
+
+use constant {
+    BTRFS_FIRST_FREE_OBJECTID => 256,
+};
+
+# Configuration (similar to DirPlugin)
+
+sub type {
+    return 'btrfs';
+}
+
+sub plugindata {
+    return {
+	content => [
+	    {
+		images => 1,
+		rootdir => 1,
+		vztmpl => 1,
+		iso => 1,
+		backup => 1,
+		snippets => 1,
+		none => 1,
+	    },
+	    { images => 1, rootdir => 1 },
+	],
+	format => [ { raw => 1, qcow2 => 1, vmdk => 1, subvol => 1 }, 'raw', ],
+    };
+}
+
+sub options {
+    return {
+	path => { fixed => 1 },
+	nodes => { optional => 1 },
+	shared => { optional => 1 },
+	disable => { optional => 1 },
+	maxfiles => { optional => 1 },
+	content => { optional => 1 },
+	format => { optional => 1 },
+	is_mountpoint => { optional => 1 },
+	# TODO: The new variant of mkdir with  `populate` vs `create`...
+    };
+}
+
+# Storage implementation
+#
+# We use the same volume names are directory plugins, but map *raw* disk image file names into a
+# subdirectory.
+#
+# `vm-VMID-disk-ID.raw`
+#   -> `images/VMID/vm-VMID-disk-ID/disk.raw`
+#   where the `vm-VMID-disk-ID/` subdirectory is a btrfs subvolume
+
+# Reuse `DirPlugin`'s `check_config`. This simply checks for invalid paths.
+sub check_config {
+    my ($self, $sectionId, $config, $create, $skipSchemaCheck) = @_;
+    return PVE::Storage::DirPlugin::check_config($self, $sectionId, $config, $create, $skipSchemaCheck);
+}
+
+sub activate_storage {
+    my ($class, $storeid, $scfg, $cache) = @_;
+    return PVE::Storage::DirPlugin::activate_storage($class, $storeid, $scfg, $cache);
+}
+
+sub status {
+    my ($class, $storeid, $scfg, $cache) = @_;
+    return PVE::Storage::DirPlugin::status($class, $storeid, $scfg, $cache);
+}
+
+# TODO: sub get_volume_notes {}
+
+# TODO: sub update_volume_notes {}
+
+# croak would not include the caller from within this module
+sub __error {
+    my ($msg) = @_;
+    my (undef, $f, $n) = caller(1);
+    die "$msg at $f: $n\n";
+}
+
+# Given a name (eg. `vm-VMID-disk-ID.raw`), take the part up to the format suffix as the name of
+# the subdirectory (subvolume).
+sub raw_name_to_dir($) {
+    my ($raw) = @_;
+
+    # For the subvolume directory Strip the `.<format>` suffix:
+    if ($raw =~ /^(.*)\.raw$/) {
+	return $1;
+    }
+
+    __error "internal error: bad disk name: $raw";
+}
+
+sub raw_file_to_subvol($) {
+    my ($file) = @_;
+
+    if ($file =~ m|^(.*)/disk\.raw$|) {
+	return "$1";
+    }
+
+    __error "internal error: bad raw path: $file";
+}
+
+sub filesystem_path {
+    my ($class, $scfg, $volname, $snapname) = @_;
+
+    my ($vtype, $name, $vmid, undef, undef, $isBase, $format) =
+	$class->parse_volname($volname);
+
+    my $path = $class->get_subdir($scfg, $vtype);
+
+    $path .= "/$vmid" if $vtype eq 'images';
+
+    if ($format eq 'raw') {
+	my $dir = raw_name_to_dir($name);
+	if ($snapname) {
+	    $dir .= "\@$snapname";
+	}
+	$path .= "/$dir/disk.raw";
+    } elsif ($format eq 'subvol') {
+	$path .= "/$name";
+	if ($snapname) {
+	    $path .= "\@$snapname";
+	}
+    } else {
+	$path .= "/$name";
+    }
+
+    return wantarray ? ($path, $vmid, $vtype) : $path;
+}
+
+sub btrfs_cmd {
+    my ($class, $cmd, $outfunc) = @_;
+
+    my $msg = '';
+    my $func;
+    if (defined($outfunc)) {
+	$func = sub {
+	    my $part = &$outfunc(@_);
+	    $msg .= $part if defined($part);
+	};
+    } else {
+	$func = sub { $msg .= "$_[0]\n" };
+    }
+    run_command(['btrfs', '-q', @$cmd], errmsg => 'btrfs error', outfunc => $func);
+
+    return $msg;
+}
+
+sub btrfs_get_subvol_id {
+    my ($class, $path) = @_;
+    my $info = $class->btrfs_cmd(['subvolume', 'show', '--', $path]);
+    if ($info !~ /^\s*(?:Object|Subvolume) ID:\s*(\d+)$/m) {
+	die "failed to get btrfs subvolume ID from: $info\n";
+    }
+    return $1;
+}
+
+sub create_base {
+    my ($class, $storeid, $scfg, $volname) = @_;
+
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+	$class->parse_volname($volname);
+
+    my $newname = $name;
+    $newname =~ s/^vm-/base-/;
+
+    # If we're not working with a 'raw' file, which is the only thing that's "different" for btrfs,
+    # or a subvolume, we forward to the DirPlugin
+    if ($format ne 'raw' && $format ne 'subvol') {
+	return PVE::Storage::DirPlugin::create_base(@_);
+    }
+
+    my $path = $class->filesystem_path($scfg, $volname);
+    my $newvolname = $basename ? "$basevmid/$basename/$vmid/$newname" : "$vmid/$newname";
+    my $newpath = $class->filesystem_path($scfg, $newvolname);
+
+    my $subvol = $path;
+    my $newsubvol = $newpath;
+    if ($format eq 'raw') {
+	$subvol = raw_file_to_subvol($subvol);
+	$newsubvol = raw_file_to_subvol($newsubvol);
+    }
+
+    rename($subvol, $newsubvol)
+	|| die "rename '$subvol' to '$newsubvol' failed - $!\n";
+    eval { $class->btrfs_cmd(['property', 'set', $newsubvol, 'ro', 'true']) };
+    warn $@ if $@;
+
+    return $newvolname;
+}
+
+sub clone_image {
+    my ($class, $scfg, $storeid, $volname, $vmid, $snap) = @_;
+
+    my ($vtype, $basename, $basevmid, undef, undef, $isBase, $format) =
+	$class->parse_volname($volname);
+
+    # If we're not working with a 'raw' file, which is the only thing that's "different" for btrfs,
+    # or a subvolume, we forward to the DirPlugin
+    if ($format ne 'raw' && $format ne 'subvol') {
+	return PVE::Storage::DirPlugin::clone_image(@_);
+    }
+
+    my $imagedir = $class->get_subdir($scfg, 'images');
+    $imagedir .= "/$vmid";
+    mkpath $imagedir;
+
+    my $path = $class->filesystem_path($scfg, $volname);
+    my $newname = $class->find_free_diskname($storeid, $scfg, $vmid, $format, 1);
+
+    # For btrfs subvolumes we don't actually need the "link":
+    #my $newvolname = "$basevmid/$basename/$vmid/$newname";
+    my $newvolname = "$vmid/$newname";
+    my $newpath = $class->filesystem_path($scfg, $newvolname);
+
+    my $subvol = $path;
+    my $newsubvol = $newpath;
+    if ($format eq 'raw') {
+	$subvol = raw_file_to_subvol($subvol);
+	$newsubvol = raw_file_to_subvol($newsubvol);
+    }
+
+    $class->btrfs_cmd(['subvolume', 'snapshot', '--', $subvol, $newsubvol]);
+
+    return $newvolname;
+}
+
+sub alloc_image {
+    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
+
+    if ($fmt ne 'raw' && $fmt ne 'subvol') {
+	return PVE::Storage::DirPlugin::alloc_image(@_);
+    }
+
+    # From Plugin.pm:
+
+    my $imagedir = $class->get_subdir($scfg, 'images') . "/$vmid";
+
+    mkpath $imagedir;
+
+    $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt, 1) if !$name;
+
+    my (undef, $tmpfmt) = PVE::Storage::Plugin::parse_name_dir($name);
+
+    die "illegal name '$name' - wrong extension for format ('$tmpfmt != '$fmt')\n"
+	if $tmpfmt ne $fmt;
+
+    # End copy from Plugin.pm
+
+    my $subvol = "$imagedir/$name";
+    # .raw is not part of the directory name
+    $subvol =~ s/\.raw$//;
+
+    die "disk image '$subvol' already exists\n" if -e $subvol;
+
+    my $path;
+    if ($fmt eq 'raw') {
+	$path = "$subvol/disk.raw";
+    }
+
+    $class->btrfs_cmd(['subvolume', 'create', '--', $subvol]);
+
+    if ($fmt eq 'subvol' && !!$size) {
+	# NOTE: `btrfs send/recv` actually drops quota information so supporting subvolumes with
+	# quotas doesn't play nice with send/recv.
+	die "btrfs quotas are currently not supported, use an unsized subvolume or a raw file\n";
+
+	# This is how we *would* do it:
+	# # Use the subvol's default 0/$id qgroup
+	# eval {
+	#     # This call should happen at storage creation instead and therefore governed by a
+	#     # configuration option!
+	#     # $class->btrfs_cmd(['quota', 'enable', $subvol]);
+	#     my $id = $class->btrfs_get_subvol_id($subvol);
+	#     $class->btrfs_cmd(['qgroup', 'limit', "${size}k", "0/$id", $subvol]);
+	# };
+    } elsif ($fmt eq 'raw') {
+	# From Plugin.pm (minus the qcow2 part):
+	my $cmd = ['/usr/bin/qemu-img', 'create'];
+
+	push @$cmd, '-f', $fmt, $path, "${size}K";
+
+	eval { run_command($cmd, errmsg => "unable to create image"); };
+    } else {
+	# be clear that after this if/else we're handling the result of an eval block:
+	$@ = undef;
+    }
+
+    if (my $err = $@) {
+	eval { $class->btrfs_cmd(['subvolume', 'delete', '--', $subvol]); };
+	warn $@ if $@;
+	die $err;
+    }
+
+    return "$vmid/$name";
+}
+
+# Same as btrfsprogs does:
+my sub path_is_subvolume : prototype($) {
+    my ($path) = @_;
+    my @stat = stat($path)
+	or die "stat failed on '$path' - $!\n";
+    my ($ino, $mode) = @stat[1, 2];
+    return S_ISDIR($mode) && $ino == BTRFS_FIRST_FREE_OBJECTID;
+}
+
+my $BTRFS_VOL_REGEX = qr/((?:vm|base|subvol)-\d+-disk-\d+(?:\.subvol)?)(?:\@(\S+))$/;
+
+# Calls `$code->($volume, $name, $snapshot)` for each subvol in a directory matching our volume
+# regex.
+my sub foreach_subvol : prototype($$) {
+    my ($dir, $code) = @_;
+
+    dir_glob_foreach($dir, $BTRFS_VOL_REGEX, sub {
+	my ($volume, $name, $snapshot) = ($1, $2, $3);
+	return if !path_is_subvolume("$dir/$volume");
+	$code->($volume, $name, $snapshot);
+    })
+}
+
+sub free_image {
+    my ($class, $storeid, $scfg, $volname, $isBase, $_format) = @_;
+
+    my (undef, undef, $vmid, undef, undef, undef, $format) =
+	$class->parse_volname($volname);
+
+    if ($format ne 'subvol' && $format ne 'raw') {
+	return PVE::Storage::DirPlugin::free_image(@_);
+    }
+
+    my $path = $class->filesystem_path($scfg, $volname);
+
+    my $subvol = $path;
+    if ($format eq 'raw') {
+	$subvol = raw_file_to_subvol($path);
+    }
+
+    my $dir = dirname($subvol);
+    my @snapshot_vols;
+    foreach_subvol($dir, sub {
+	my ($volume, $name, $snapshot) = @_;
+	return if !defined $snapshot;
+	push @snapshot_vols, "$dir/$volume";
+    });
+
+    $class->btrfs_cmd(['subvolume', 'delete', '--', @snapshot_vols, $subvol]);
+    # try to cleanup directory to not clutter storage with empty $vmid dirs if
+    # all images from a guest got deleted
+    rmdir($dir);
+
+    return undef;
+}
+
+# Currently not used because quotas clash with send/recv.
+# my sub btrfs_subvol_quota {
+#     my ($class, $path) = @_;
+#     my $id = '0/' . $class->btrfs_get_subvol_id($path);
+#     my $search = qr/^\Q$id\E\s+(\d)+\s+\d+\s+(\d+)\s*$/;
+#     my ($used, $size);
+#     $class->btrfs_cmd(['qgroup', 'show', '--raw', '-rf', '--', $path], sub {
+# 	return if defined($size);
+# 	if ($_[0] =~ $search) {
+# 	    ($used, $size) = ($1, $2);
+# 	}
+#     });
+#     if (!defined($size)) {
+# 	# syslog should include more information:
+# 	syslog('err', "failed to get subvolume size for: $path (id $id)");
+# 	# UI should only see the last path component:
+# 	$path =~ s|^.*/||;
+# 	die "failed to get subvolume size for $path\n";
+#     }
+#     return wantarray ? ($used, $size) : $size;
+# }
+
+sub volume_size_info {
+    my ($class, $scfg, $storeid, $volname, $timeout) = @_;
+
+    my $path = $class->filesystem_path($scfg, $volname);
+
+    my $format = ($class->parse_volname($volname))[6];
+
+    if ($format eq 'subvol') {
+	my $ctime = (stat($path))[10];
+	my ($used, $size) = (0, 0);
+	#my ($used, $size) = btrfs_subvol_quota($class, $path); # uses wantarray
+	return wantarray ? ($size, 'subvol', $used, undef, $ctime) : 1;
+    }
+
+    return PVE::Storage::Plugin::file_size_info($path, $timeout);
+}
+
+sub volume_resize {
+    my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
+
+    my $format = ($class->parse_volname($volname))[6];
+    if ($format eq 'subvol') {
+	my $path = $class->filesystem_path($scfg, $volname);
+	my $id = '0/' . $class->btrfs_get_subvol_id($path);
+	$class->btrfs_cmd(['qgroup', 'limit', '--', "${size}k", "0/$id", $path]);
+	return undef;
+    }
+
+    return PVE::Storage::Plugin::volume_resize(@_);
+}
+
+sub volume_snapshot {
+    my ($class, $scfg, $storeid, $volname, $snap) = @_;
+
+    my ($name, $vmid, $format) = ($class->parse_volname($volname))[1,2,6];
+    if ($format ne 'subvol' && $format ne 'raw') {
+	return PVE::Storage::Plugin::volume_snapshot(@_);
+    }
+
+    my $path = $class->filesystem_path($scfg, $volname);
+    my $snap_path = $class->filesystem_path($scfg, $volname, $snap);
+
+    if ($format eq 'raw') {
+	$path = raw_file_to_subvol($path);
+	$snap_path = raw_file_to_subvol($snap_path);
+    }
+
+    my $snapshot_dir = $class->get_subdir($scfg, 'images') . "/$vmid";
+    mkpath $snapshot_dir;
+
+    $class->btrfs_cmd(['subvolume', 'snapshot', '-r', '--', $path, $snap_path]);
+    return undef;
+}
+
+sub volume_rollback_is_possible {
+    my ($class, $scfg, $storeid, $volname, $snap) = @_; 
+
+    return 1; 
+}
+
+sub volume_snapshot_rollback {
+    my ($class, $scfg, $storeid, $volname, $snap) = @_;
+
+    my ($name, $format) = ($class->parse_volname($volname))[1,6];
+
+    if ($format ne 'subvol' && $format ne 'raw') {
+	return PVE::Storage::Plugin::volume_snapshot_rollback(@_);
+    }
+
+    my $path = $class->filesystem_path($scfg, $volname);
+    my $snap_path = $class->filesystem_path($scfg, $volname, $snap);
+
+    if ($format eq 'raw') {
+	$path = raw_file_to_subvol($path);
+	$snap_path = raw_file_to_subvol($snap_path);
+    }
+
+    # Simple version would be:
+    #   rename old to temp
+    #   create new
+    #   on error rename temp back
+    # But for atomicity in case the rename after create-failure *also* fails, we create the new
+    # subvol first, then use RENAME_EXCHANGE, 
+    my $tmp_path = "$path.tmp.$$";
+    $class->btrfs_cmd(['subvolume', 'snapshot', '--', $snap_path, $tmp_path]);
+    # The paths are absolute, so pass -1 as file descriptors.
+    my $ok = PVE::Tools::renameat2(-1, $tmp_path, -1, $path, &PVE::Tools::RENAME_EXCHANGE);
+
+    eval { $class->btrfs_cmd(['subvolume', 'delete', '--', $tmp_path]) };
+    warn "failed to remove '$tmp_path' subvolume: $@" if $@;
+
+    if (!$ok) {
+	die "failed to rotate '$tmp_path' into place at '$path' - $!\n";
+    }
+
+    return undef;
+}
+
+sub volume_snapshot_delete {
+    my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
+
+    my ($name, $vmid, $format) = ($class->parse_volname($volname))[1,2,6];
+
+    if ($format ne 'subvol' && $format ne 'raw') {
+	return PVE::Storage::Plugin::volume_snapshot_delete(@_);
+    }
+
+    my $path = $class->filesystem_path($scfg, $volname, $snap);
+
+    if ($format eq 'raw') {
+	$path = raw_file_to_subvol($path);
+    }
+
+    $class->btrfs_cmd(['subvolume', 'delete', '--', $path]);
+
+    return undef;
+}
+
+sub volume_has_feature {
+    my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
+
+    my $features = {
+	snapshot => {
+	    current => { qcow2 => 1, raw => 1, subvol => 1 },
+	    snap => { qcow2 => 1, raw => 1, subvol => 1 }
+	},
+	clone => {
+	    base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
+	    current => { raw => 1 },
+	    snap => { raw => 1 },
+	},
+	template => { current => { qcow2 => 1, raw => 1, vmdk => 1, subvol => 1 } },
+	clone => {
+	    base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
+	    current => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
+	    snap => { qcow2 => 1, raw => 1, subvol => 1 },
+	},
+	sparseinit => { base => {qcow2 => 1, raw => 1, vmdk => 1 },
+			current => {qcow2 => 1, raw => 1, vmdk => 1 } },
+    };
+
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+	$class->parse_volname($volname);
+
+    my $key = undef;
+    if ($snapname) {
+        $key = 'snap';
+    } else {
+        $key =  $isBase ? 'base' : 'current';
+    }
+
+    return 1 if defined($features->{$feature}->{$key}->{$format});
+
+    return undef;
+}
+
+sub list_images {
+    my ($class, $storeid, $scfg, $vmid, $vollist, $cache) = @_;
+    my $imagedir = $class->get_subdir($scfg, 'images');
+
+    my $res = [];
+
+    # Copied from Plugin.pm, with file_size_info calls adapted:
+    foreach my $fn (<$imagedir/[0-9][0-9]*/*>) {
+	# different to in Plugin.pm the regex below also excludes '@' as valid file name
+	next if $fn !~ m@^(/.+/(\d+)/([^/\@.]+(?:\.(qcow2|vmdk|subvol))?))$@;
+	$fn = $1; # untaint
+
+	my $owner = $2;
+	my $name = $3;
+	my $ext = $4;
+
+	next if !$vollist && defined($vmid) && ($owner ne $vmid);
+
+	my $volid = "$storeid:$owner/$name";
+	my ($size, $format, $used, $parent, $ctime);
+
+	if (!$ext) { # raw
+	    $volid .= '.raw';
+	    ($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info("$fn/disk.raw");
+	} elsif ($ext eq 'subvol') {
+	    ($used, $size) = (0, 0);
+	    #($used, $size) = btrfs_subvol_quota($class, $fn);
+	    $format = 'subvol';
+	} else {
+	    ($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info($fn);
+	}
+	next if !($format && defined($size));
+
+	if ($vollist) {
+	    next if ! grep { $_ eq $volid } @$vollist;
+	}
+
+	my $info = {
+	    volid => $volid, format => $format,
+	    size => $size, vmid => $owner, used => $used, parent => $parent,
+	};
+
+        $info->{ctime} = $ctime if $ctime;
+
+        push @$res, $info;
+    }
+
+    return $res;
+}
+
+# For now we don't implement `btrfs send/recv` as it needs some updates to our import/export API
+# first!
+
+sub volume_export_formats {
+    return PVE::Storage::DirPlugin::volume_export_formats(@_);
+}
+
+sub volume_export {
+    return PVE::Storage::DirPlugin::volume_export(@_);
+}
+
+sub volume_import_formats {
+    return PVE::Storage::DirPlugin::volume_import_formats(@_);
+}
+
+sub volume_import {
+    return PVE::Storage::DirPlugin::volume_import(@_);
+}
+
+1
diff --git a/PVE/Storage/Makefile b/PVE/Storage/Makefile
index 91b9238..857c485 100644
--- a/PVE/Storage/Makefile
+++ b/PVE/Storage/Makefile
@@ -12,6 +12,7 @@ SOURCES= \
 	ZFSPoolPlugin.pm \
 	ZFSPlugin.pm \
 	PBSPlugin.pm \
+	BTRFSPlugin.pm \
 	LvmThinPlugin.pm
 
 .PHONY: install
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH storage 3/4] update import/export storage API
  2021-06-09 13:18 [pve-devel] [PATCH multiple] btrfs, file system for the brave Wolfgang Bumiller
                   ` (2 preceding siblings ...)
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 2/4] add BTRFS storage plugin Wolfgang Bumiller
@ 2021-06-09 13:18 ` Wolfgang Bumiller
  2021-06-10 12:30   ` Fabian Grünbichler
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 4/4] btrfs: add 'btrfs' import/export format Wolfgang Bumiller
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Wolfgang Bumiller @ 2021-06-09 13:18 UTC (permalink / raw)
  To: pve-devel

This bumps APIVER, but also APIAGE, see below.

The import methods (volume_import, volume_import_formats):
  This additionally gets the '$snapshot' parameter which is
  already present on the export side as an informational
  piece to know which of the snapshots is the *current* one.
  The current "disk" state will be set to this snapshot.
  This, too, is required for our btrfs implementation.
    `volume_import_formats` can obviously not make much
  *use* of this parameter, but it'll still be useful to know
  that the information is actually available in the import
  call, so its presence will be checked in the btrfs
  implementation.

Currently this is intended to be used for btrfs send/recv
support, which in theory could also get additional metadata
similar to how we do the "tar+size" format, however, we
currently only really use this within this repository in
storage_migrate() which has this information readily
available anyway.

On the export side (volume_export, volume_export_formats):
  The `$with_snapshots` option can now also be an ordered
  array of snapshots to include, as a hint for storages
  which need this. (As of the next commit this is only
  btrfs, and only when also specifying a base snapshot,
  which is a case we can currently not run into except on
  the command line interface.)
    The current providers of the `with_snapshot` option will
  still treat it as a boolean (since eg. for ZFS you cannot
  really "skip" snapshots AFAIK).
    This is mainly intended for storages which do not have a
  strong association between snapshots and the originals, or
  an ordering (eg. btrfs and lvm-thin allow creating
  arbitrary snapshot trees, and with btrfs you can even
  create a "circular" connection between subvolumes, also we
  could consider reflink based copies snapshots on xfs in
  the future maybe?)

While `import_formats` and `export_formats` now take the same
parameters, I appended the `$snapshot` parameter to the end of
`import_formats` to the end for compatibility.
Therefore we also bump APIAGE, as apart from the btrfs
storage, none of our other storages need the new parameter.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
 PVE/CLI/pvesm.pm | 23 +++++++++++++++++++++--
 PVE/Storage.pm   | 48 ++++++++++++++++++++++++++++++++----------------
 2 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/PVE/CLI/pvesm.pm b/PVE/CLI/pvesm.pm
index d28f1ba..b22f759 100755
--- a/PVE/CLI/pvesm.pm
+++ b/PVE/CLI/pvesm.pm
@@ -276,12 +276,23 @@ __PACKAGE__->register_method ({
 		optional => 1,
 		default => 0,
 	    },
+	    'snapshot-list' => {
+		description => "Ordered list of snapshots to transfer",
+		type => 'string',
+		format => 'string-list',
+		optional => 1,
+	    },
 	},
     },
     returns => { type => 'null' },
     code => sub {
 	my ($param) = @_;
 
+	my $with_snapshots = $param->{'with-snapshots'};
+	if (defined(my $list = $param->{'snapshot-list'})) {
+	    $with_snapshots = PVE::Tools::split_list($list);
+	}
+
 	my $filename = $param->{filename};
 
 	my $outfh;
@@ -295,7 +306,7 @@ __PACKAGE__->register_method ({
 	eval {
 	    my $cfg = PVE::Storage::config();
 	    PVE::Storage::volume_export($cfg, $outfh, $param->{volume}, $param->{format},
-		$param->{snapshot}, $param->{base}, $param->{'with-snapshots'});
+		$param->{snapshot}, $param->{base}, $with_snapshots);
 	};
 	my $err = $@;
 	if ($filename ne '-') {
@@ -361,6 +372,13 @@ __PACKAGE__->register_method ({
 		optional => 1,
 		default => 0,
 	    },
+	    snapshot => {
+		description => "The current-state snapshot if the stream contains snapshots",
+		type => 'string',
+		pattern => qr/[a-z0-9_\-]{1,40}/i,
+		maxLength => 40,
+		optional => 1,
+	    },
 	},
     },
     returns => { type => 'string' },
@@ -436,7 +454,8 @@ __PACKAGE__->register_method ({
 	my $volume = $param->{volume};
 	my $delete = $param->{'delete-snapshot'};
 	my $imported_volid = PVE::Storage::volume_import($cfg, $infh, $volume, $param->{format},
-	    $param->{base}, $param->{'with-snapshots'}, $param->{'allow-rename'});
+	    $param->{base}, $param->{'with-snapshots'}, $param->{'allow-rename'},
+	    $param->{snapshot});
 	PVE::Storage::volume_snapshot_delete($cfg, $imported_volid, $delete)
 	    if defined($delete);
 	return $imported_volid;
diff --git a/PVE/Storage.pm b/PVE/Storage.pm
index f9f8d16..15fcedf 100755
--- a/PVE/Storage.pm
+++ b/PVE/Storage.pm
@@ -41,11 +41,11 @@ use PVE::Storage::PBSPlugin;
 use PVE::Storage::BTRFSPlugin;
 
 # Storage API version. Increment it on changes in storage API interface.
-use constant APIVER => 8;
+use constant APIVER => 9;
 # Age is the number of versions we're backward compatible with.
 # This is like having 'current=APIVER' and age='APIAGE' in libtool,
 # see https://www.gnu.org/software/libtool/manual/html_node/Libtool-versioning.html
-use constant APIAGE => 7;
+use constant APIAGE => 8;
 
 # load standard plugins
 PVE::Storage::DirPlugin->register();
@@ -716,7 +716,8 @@ sub storage_migrate {
     my $send = ['pvesm', 'export', $volid, $format, '-', '-with-snapshots', $with_snapshots];
     my $recv = [@$ssh, '--', 'pvesm', 'import', $target_volid, $format, $import_fn, '-with-snapshots', $with_snapshots];
     if (defined($snapshot)) {
-	push @$send, '-snapshot', $snapshot
+	push @$send, '-snapshot', $snapshot;
+	push @$recv, '-snapshot', $snapshot;
     }
     if ($migration_snapshot) {
 	push @$recv, '-delete-snapshot', $snapshot;
@@ -1716,7 +1717,7 @@ sub prune_mark_backup_group {
     }
 }
 
-sub volume_export {
+sub volume_export : prototype($$$$$$$) {
     my ($cfg, $fh, $volid, $format, $snapshot, $base_snapshot, $with_snapshots) = @_;
 
     my ($storeid, $volname) = parse_volume_id($volid, 1);
@@ -1727,18 +1728,27 @@ sub volume_export {
                                   $snapshot, $base_snapshot, $with_snapshots);
 }
 
-sub volume_import {
-    my ($cfg, $fh, $volid, $format, $base_snapshot, $with_snapshots, $allow_rename) = @_;
+sub volume_import : prototype($$$$$$$$) {
+    my ($cfg, $fh, $volid, $format, $base_snapshot, $with_snapshots, $allow_rename, $snapshot) = @_;
 
     my ($storeid, $volname) = parse_volume_id($volid, 1);
     die "cannot import into volume '$volid'\n" if !$storeid;
     my $scfg = storage_config($cfg, $storeid);
     my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
-    return $plugin->volume_import($scfg, $storeid, $fh, $volname, $format,
-                                  $base_snapshot, $with_snapshots, $allow_rename) // $volid;
-}
-
-sub volume_export_formats {
+    return $plugin->volume_import(
+	$scfg,
+	$storeid,
+	$fh,
+	$volname,
+	$format,
+	$base_snapshot,
+	$with_snapshots,
+	$allow_rename,
+	$snapshot,
+    ) // $volid;
+}
+
+sub volume_export_formats : prototype($$$$$) {
     my ($cfg, $volid, $snapshot, $base_snapshot, $with_snapshots) = @_;
 
     my ($storeid, $volname) = parse_volume_id($volid, 1);
@@ -1750,21 +1760,27 @@ sub volume_export_formats {
                                           $with_snapshots);
 }
 
-sub volume_import_formats {
-    my ($cfg, $volid, $base_snapshot, $with_snapshots) = @_;
+sub volume_import_formats : prototype($$$$$) {
+    my ($cfg, $volid, $base_snapshot, $with_snapshots, $snapshot) = @_;
 
     my ($storeid, $volname) = parse_volume_id($volid, 1);
     return if !$storeid;
     my $scfg = storage_config($cfg, $storeid);
     my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
-    return $plugin->volume_import_formats($scfg, $storeid, $volname,
-                                          $base_snapshot, $with_snapshots);
+    return $plugin->volume_import_formats(
+	$scfg,
+	$storeid,
+	$volname,
+	$base_snapshot,
+	$with_snapshots,
+	$snapshot,
+    );
 }
 
 sub volume_transfer_formats {
     my ($cfg, $src_volid, $dst_volid, $snapshot, $base_snapshot, $with_snapshots) = @_;
     my @export_formats = volume_export_formats($cfg, $src_volid, $snapshot, $base_snapshot, $with_snapshots);
-    my @import_formats = volume_import_formats($cfg, $dst_volid, $base_snapshot, $with_snapshots);
+    my @import_formats = volume_import_formats($cfg, $dst_volid, $base_snapshot, $with_snapshots, $snapshot);
     my %import_hash = map { $_ => 1 } @import_formats;
     my @common = grep { $import_hash{$_} } @export_formats;
     return @common;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH storage 4/4] btrfs: add 'btrfs' import/export format
  2021-06-09 13:18 [pve-devel] [PATCH multiple] btrfs, file system for the brave Wolfgang Bumiller
                   ` (3 preceding siblings ...)
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 3/4] update import/export storage API Wolfgang Bumiller
@ 2021-06-09 13:18 ` Wolfgang Bumiller
  2021-06-09 13:18 ` [pve-devel] [PATCH container 1/2] migration: fix snapshots boolean accounting Wolfgang Bumiller
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Wolfgang Bumiller @ 2021-06-09 13:18 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
 PVE/CLI/pvesm.pm           |   2 +-
 PVE/Storage.pm             |   2 +-
 PVE/Storage/BTRFSPlugin.pm | 248 +++++++++++++++++++++++++++++++++++--
 3 files changed, 240 insertions(+), 12 deletions(-)

diff --git a/PVE/CLI/pvesm.pm b/PVE/CLI/pvesm.pm
index b22f759..ef2f5e6 100755
--- a/PVE/CLI/pvesm.pm
+++ b/PVE/CLI/pvesm.pm
@@ -30,7 +30,7 @@ use PVE::CLIHandler;
 
 use base qw(PVE::CLIHandler);
 
-my $KNOWN_EXPORT_FORMATS = ['raw+size', 'tar+size', 'qcow2+size', 'vmdk+size', 'zfs'];
+my $KNOWN_EXPORT_FORMATS = ['raw+size', 'tar+size', 'qcow2+size', 'vmdk+size', 'zfs', 'btrfs'];
 
 my $nodename = PVE::INotify::nodename();
 
diff --git a/PVE/Storage.pm b/PVE/Storage.pm
index 15fcedf..6f5a033 100755
--- a/PVE/Storage.pm
+++ b/PVE/Storage.pm
@@ -692,7 +692,7 @@ sub storage_migrate {
 
     my $migration_snapshot;
     if (!defined($snapshot)) {
-	if ($scfg->{type} eq 'zfspool') {
+	if ($scfg->{type} eq 'zfspool' || $scfg->{type} eq 'btrfs') {
 	    $migration_snapshot = 1;
 	    $snapshot = '__migration__';
 	}
diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
index 733be6e..87e48c7 100644
--- a/PVE/Storage/BTRFSPlugin.pm
+++ b/PVE/Storage/BTRFSPlugin.pm
@@ -9,8 +9,9 @@ use Fcntl qw(S_ISDIR);
 use File::Basename qw(dirname);
 use File::Path qw(mkpath);
 use IO::Dir;
+use POSIX qw(EEXIST);
 
-use PVE::Tools qw(run_command);
+use PVE::Tools qw(run_command dir_glob_foreach);
 
 use PVE::Storage::DirPlugin;
 
@@ -594,23 +595,250 @@ sub list_images {
     return $res;
 }
 
-# For now we don't implement `btrfs send/recv` as it needs some updates to our import/export API
-# first!
-
 sub volume_export_formats {
-    return PVE::Storage::DirPlugin::volume_export_formats(@_);
-}
+    my ($class, $scfg, $storeid, $volname, $snapshot, $base_snapshot, $with_snapshots) = @_;
 
-sub volume_export {
-    return PVE::Storage::DirPlugin::volume_export(@_);
+    # We can do whatever `DirPlugin` can do.
+    my @result = PVE::Storage::Plugin::volume_export_formats(@_);
+
+    # `btrfs send` only works on snapshots:
+    return @result if !defined $snapshot;
+
+    # Incremental stream with snapshots is only supported if the snapshots are listed (new api):
+    return @result if defined($base_snapshot) && $with_snapshots && ref($with_snapshots) ne 'ARRAY';
+
+    # Otherwise we do also support `with_snapshots`.
+
+    # Finally, `btrfs send` only works on formats where we actually use btrfs subvolumes:
+    my $format = ($class->parse_volname($volname))[6];
+    return @result if $format ne 'raw' && $format ne 'subvol';
+
+    return ('btrfs', @result);
 }
 
 sub volume_import_formats {
-    return PVE::Storage::DirPlugin::volume_import_formats(@_);
+    my ($class, $scfg, $storeid, $volname, $base_snapshot, $with_snapshots, $snapshot) = @_;
+
+    # Same as export-formats, beware the parameter order:
+    return volume_export_formats(
+	$class,
+	$scfg,
+	$storeid,
+	$volname,
+	$snapshot,
+	$base_snapshot,
+	$with_snapshots,
+    );
+}
+
+sub volume_export {
+    my (
+	$class,
+	$scfg,
+	$storeid,
+	$fh,
+	$volname,
+	$format,
+	$snapshot,
+	$base_snapshot,
+	$with_snapshots,
+    ) = @_;
+
+    if ($format ne 'btrfs') {
+	return PVE::Storage::Plugin::volume_export(@_);
+    }
+
+    die "format 'btrfs' only works on snapshots\n"
+	if !defined $snapshot;
+
+    die "'btrfs' format in incremental mode requires snapshots to be listed explicitly\n"
+	if defined($base_snapshot) && $with_snapshots && ref($with_snapshots) ne 'ARRAY';
+
+    my $volume_format = ($class->parse_volname($volname))[6];
+
+    die "btrfs-sending volumes of type $volume_format ('$volname') is not supported\n"
+	if $volume_format ne 'raw' && $volume_format ne 'subvol';
+
+    my $path = $class->path($scfg, $volname, $storeid);
+
+    if ($volume_format eq 'raw') {
+	$path = raw_file_to_subvol($path);
+    }
+
+    my $cmd = ['btrfs', '-q', 'send', '-e'];
+    if ($base_snapshot) {
+	my $base = $class->path($scfg, $volname, $storeid, $base_snapshot);
+	if ($volume_format eq 'raw') {
+	    $base = raw_file_to_subvol($base);
+	}
+	push @$cmd, '-p', $base;
+    }
+    push @$cmd, '--';
+    if (ref($with_snapshots) eq 'ARRAY') {
+	push @$cmd, (map { "$path\@$_" } ($with_snapshots // [])->@*), $path;
+    } else {
+	dir_glob_foreach(dirname($path), $BTRFS_VOL_REGEX, sub {
+	    push @$cmd, "$path\@$_[2]" if !(defined($snapshot) && $_[2] eq $snapshot);
+	});
+    }
+    $path .= "\@$snapshot" if defined($snapshot);
+    push @$cmd, $path;
+
+    run_command($cmd, output => '>&'.fileno($fh));
+    return;
 }
 
 sub volume_import {
-    return PVE::Storage::DirPlugin::volume_import(@_);
+    my (
+	$class,
+	$scfg,
+	$storeid,
+	$fh,
+	$volname,
+	$format,
+	$base_snapshot,
+	$with_snapshots,
+	$allow_rename,
+	$snapshot,
+    ) = @_;
+
+    if ($format ne 'btrfs') {
+	return PVE::Storage::Plugin::volume_import(@_);
+    }
+
+    die "format 'btrfs' only works on snapshots\n"
+	if !defined $snapshot;
+
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $volume_format) =
+	$class->parse_volname($volname);
+
+    die "btrfs-receiving volumes of type $volume_format ('$volname') is not supported\n"
+	if $volume_format ne 'raw' && $volume_format ne 'subvol';
+
+    if (defined($base_snapshot)) {
+	my $path = $class->path($scfg, $volname, $storeid, $base_snapshot);
+	die "base snapshot '$base_snapshot' not found - no such directory '$path'\n"
+	    if !path_is_subvolume($path);
+    }
+
+    my $destination = $class->filesystem_path($scfg, $volname);
+    if ($volume_format eq 'raw') {
+	$destination = raw_file_to_subvol($destination);
+    }
+
+    if (!defined($base_snapshot) && -e $destination) {
+	die "volume $volname already exists\n" if !$allow_rename;
+	$volname = $class->find_free_diskname($storeid, $scfg, $vmid, $volume_format, 1);
+    }
+
+    my $imagedir = $class->get_subdir($scfg, $vtype);
+    $imagedir .= "/$vmid" if $vtype eq 'images';
+
+    my $tmppath = "$imagedir/recv.$vmid.tmp";
+    mkdir($imagedir); # FIXME: if $scfg->{mkdir};
+    if (!mkdir($tmppath)) {
+	die "temp receive directory already exists at '$tmppath', incomplete concurrent import?\n"
+	    if $! == EEXIST;
+	die "failed to create temporary receive directory at '$tmppath' - $!\n";
+    }
+
+    my $dh = IO::Dir->new($tmppath)
+	or die "failed to open temporary receive directory '$tmppath' - $!\n";
+    eval {
+	run_command(['btrfs', '-q', 'receive', '-e', '--', $tmppath], input => '<&'.fileno($fh));
+
+	# Analyze the received subvolumes;
+	my ($diskname, $found_snapshot, @snapshots);
+	$dh->rewind;
+	while (defined(my $entry = $dh->read)) {
+	    next if $entry eq '.' || $entry eq '..';
+	    next if $entry !~ /^$BTRFS_VOL_REGEX$/;
+	    my ($cur_diskname, $cur_snapshot) = ($1, $2);
+
+	    die "send stream included a non-snapshot subvolume\n"
+		if !defined($cur_snapshot);
+
+	    if (!defined($diskname)) {
+		$diskname = $cur_diskname;
+	    } else {
+		die "multiple disks contained in stream ('$diskname' vs '$cur_diskname')\n"
+		    if $diskname ne $cur_diskname;
+	    }
+
+	    if ($cur_snapshot eq $snapshot) {
+		$found_snapshot = 1;
+	    } else {
+		push @snapshots, $cur_snapshot;
+	    }
+	}
+
+	die "send stream did not contain the expected current snapshot '$snapshot'\n"
+	    if !$found_snapshot;
+
+	# Rotate the disk into place, first the current state:
+	# Note that read-only subvolumes cannot be moved into different directories, but for the
+	# "current" state we also want a writable copy, so start with that:
+	$class->btrfs_cmd(['property', 'set', "$tmppath/$diskname\@$snapshot", 'ro', 'false']);
+	PVE::Tools::renameat2(
+	    -1,
+	    "$tmppath/$diskname\@$snapshot",
+	    -1,
+	    $destination,
+	    &PVE::Tools::RENAME_NOREPLACE,
+	) or die "failed to move received snapshot '$tmppath/$diskname\@$snapshot'"
+	    . " into place at '$destination' - $!\n";
+
+	# Now recreate the actual snapshot:
+	$class->btrfs_cmd([
+	    'subvolume',
+	    'snapshot',
+	    '-r',
+	    '--',
+	    $destination,
+	    "$destination\@$snapshot",
+	]);
+
+	# Now go through the remaining snapshots (if any)
+	foreach my $snap (@snapshots) {
+	    $class->btrfs_cmd(['property', 'set', "$tmppath/$diskname\@$snap", 'ro', 'false']);
+	    PVE::Tools::renameat2(
+		-1,
+		"$tmppath/$diskname\@$snap",
+		-1,
+		"$destination\@$snap",
+		&PVE::Tools::RENAME_NOREPLACE,
+	    ) or die "failed to move received snapshot '$tmppath/$diskname\@$snap'"
+		. " into place at '$destination\@$snap' - $!\n";
+	    eval { $class->btrfs_cmd(['property', 'set', "$destination\@$snap", 'ro', 'true']) };
+	    warn "failed to make $destination\@$snap read-only - $!\n" if $@;
+	}
+    };
+    my $err = $@;
+
+    eval {
+	# Cleanup all the received snapshots we did not move into place, so we can remove the temp
+	# directory.
+	if ($dh) {
+	    $dh->rewind;
+	    while (defined(my $entry = $dh->read)) {
+		next if $entry eq '.' || $entry eq '..';
+		eval { $class->btrfs_cmd(['subvolume', 'delete', '--', "$tmppath/$entry"]) };
+		warn $@ if $@;
+	    }
+	    $dh->close; undef $dh;
+	}
+	if (!rmdir($tmppath)) {
+	    warn "failed to remove temporary directory '$tmppath' - $!\n"
+	}
+    };
+    warn $@ if $@;
+    if ($err) {
+	# clean up if the directory ended up being empty after an error
+	rmdir($tmppath);
+	die $err;
+    }
+
+    return "$storeid:$volname";
 }
 
 1
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH container 1/2] migration: fix snapshots boolean accounting
  2021-06-09 13:18 [pve-devel] [PATCH multiple] btrfs, file system for the brave Wolfgang Bumiller
                   ` (4 preceding siblings ...)
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 4/4] btrfs: add 'btrfs' import/export format Wolfgang Bumiller
@ 2021-06-09 13:18 ` Wolfgang Bumiller
  2021-06-09 13:18 ` [pve-devel] [PATCH container 2/2] enable btrfs support via subvolumes Wolfgang Bumiller
  2021-06-09 13:18 ` [pve-devel] [PATCH qemu-server] allow migrating raw btrfs volumes Wolfgang Bumiller
  7 siblings, 0 replies; 17+ messages in thread
From: Wolfgang Bumiller @ 2021-06-09 13:18 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
 src/PVE/LXC/Migrate.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
index b5917e9..4370a3d 100644
--- a/src/PVE/LXC/Migrate.pm
+++ b/src/PVE/LXC/Migrate.pm
@@ -144,7 +144,7 @@ sub phase1 {
 	}
 
 	$volhash->{$volid}->{ref} = defined($snapname) ? 'snapshot' : 'config';
-	$volhash->{$volid}->{snapshots} = defined($snapname);
+	$volhash->{$volid}->{snapshots} = 1 if defined($snapname);
 
 	my ($path, $owner) = PVE::Storage::path($self->{storecfg}, $volid);
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH container 2/2] enable btrfs support via subvolumes
  2021-06-09 13:18 [pve-devel] [PATCH multiple] btrfs, file system for the brave Wolfgang Bumiller
                   ` (5 preceding siblings ...)
  2021-06-09 13:18 ` [pve-devel] [PATCH container 1/2] migration: fix snapshots boolean accounting Wolfgang Bumiller
@ 2021-06-09 13:18 ` Wolfgang Bumiller
  2021-06-10 12:35   ` Fabian Grünbichler
  2021-06-09 13:18 ` [pve-devel] [PATCH qemu-server] allow migrating raw btrfs volumes Wolfgang Bumiller
  7 siblings, 1 reply; 17+ messages in thread
From: Wolfgang Bumiller @ 2021-06-09 13:18 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
 src/PVE/LXC.pm         | 4 +++-
 src/PVE/LXC/Migrate.pm | 7 ++++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index bb1cbdb..9407d24 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -1871,7 +1871,9 @@ sub alloc_disk {
 
     eval {
 	my $do_format = 0;
-	if ($scfg->{type} eq 'dir' || $scfg->{type} eq 'nfs' || $scfg->{type} eq 'cifs' ) {
+	if ($scfg->{type} eq 'dir' || $scfg->{type} eq 'nfs' || $scfg->{type} eq 'cifs'
+	    || $scfg->{type} eq 'btrfs'
+	) {
 	    if ($size_kb > 0) {
 		$volid = PVE::Storage::vdisk_alloc($storecfg, $storage, $vmid, 'raw',
 						   undef, $size_kb);
diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
index 4370a3d..15c6027 100644
--- a/src/PVE/LXC/Migrate.pm
+++ b/src/PVE/LXC/Migrate.pm
@@ -154,7 +154,7 @@ sub phase1 {
 	if (defined($snapname)) {
 	    # we cannot migrate shapshots on local storage
 	    # exceptions: 'zfspool'
-	    if (($scfg->{type} eq 'zfspool')) {
+	    if ($scfg->{type} eq 'zfspool' || $scfg->{type} eq 'btrfs') {
 		return;
 	    }
 	    die "non-migratable snapshot exists\n";
@@ -218,8 +218,9 @@ sub phase1 {
 	    my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
 	    my $scfg =  PVE::Storage::storage_config($self->{storecfg}, $sid);
 
-	    my $migratable = ($scfg->{type} eq 'dir') || ($scfg->{type} eq 'zfspool') ||
-		($scfg->{type} eq 'lvmthin') || ($scfg->{type} eq 'lvm');
+	    my $migratable = ($scfg->{type} eq 'dir') || ($scfg->{type} eq 'zfspool')
+		|| ($scfg->{type} eq 'lvmthin') || ($scfg->{type} eq 'lvm')
+		|| ($scfg->{type} eq 'btrfs');
 
 	    die "storage type '$scfg->{type}' not supported\n"
 		if !$migratable;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH qemu-server] allow migrating raw btrfs volumes
  2021-06-09 13:18 [pve-devel] [PATCH multiple] btrfs, file system for the brave Wolfgang Bumiller
                   ` (6 preceding siblings ...)
  2021-06-09 13:18 ` [pve-devel] [PATCH container 2/2] enable btrfs support via subvolumes Wolfgang Bumiller
@ 2021-06-09 13:18 ` Wolfgang Bumiller
  2021-06-10 12:35   ` Fabian Grünbichler
  7 siblings, 1 reply; 17+ messages in thread
From: Wolfgang Bumiller @ 2021-06-09 13:18 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
 PVE/QemuMigrate.pm | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 6375a15..5b07c20 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -502,7 +502,10 @@ sub scan_local_volumes {
 		# exceptions: 'zfspool' or 'qcow2' files (on directory storage)
 
 		die "online storage migration not possible if snapshot exists\n" if $self->{running};
-		if (!($scfg->{type} eq 'zfspool' || $local_volumes->{$volid}->{format} eq 'qcow2')) {
+		if (!($scfg->{type} eq 'zfspool'
+		    || ($scfg->{type} eq 'btrfs' && $local_volumes->{$volid}->{format} eq 'raw')
+		    || $local_volumes->{$volid}->{format} eq 'qcow2'
+		)) {
 		    die "non-migratable snapshot exists\n";
 		}
 	    }
@@ -553,7 +556,7 @@ sub scan_local_volumes {
 	    my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
 	    my $scfg =  PVE::Storage::storage_config($storecfg, $sid);
 
-	    my $migratable = $scfg->{type} =~ /^(?:dir|zfspool|lvmthin|lvm)$/;
+	    my $migratable = $scfg->{type} =~ /^(?:dir|btrfs|zfspool|lvmthin|lvm)$/;
 
 	    die "can't migrate '$volid' - storage type '$scfg->{type}' not supported\n"
 		if !$migratable;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH storage 3/4] update import/export storage API
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 3/4] update import/export storage API Wolfgang Bumiller
@ 2021-06-10 12:30   ` Fabian Grünbichler
  0 siblings, 0 replies; 17+ messages in thread
From: Fabian Grünbichler @ 2021-06-10 12:30 UTC (permalink / raw)
  To: Proxmox VE development discussion

On June 9, 2021 3:18 pm, Wolfgang Bumiller wrote:
> This bumps APIVER, but also APIAGE, see below.
> 
> The import methods (volume_import, volume_import_formats):
>   This additionally gets the '$snapshot' parameter which is
>   already present on the export side as an informational
>   piece to know which of the snapshots is the *current* one.
>   The current "disk" state will be set to this snapshot.
>   This, too, is required for our btrfs implementation.
>     `volume_import_formats` can obviously not make much
>   *use* of this parameter, but it'll still be useful to know
>   that the information is actually available in the import
>   call, so its presence will be checked in the btrfs
>   implementation.
> 
> Currently this is intended to be used for btrfs send/recv
> support, which in theory could also get additional metadata
> similar to how we do the "tar+size" format, however, we
> currently only really use this within this repository in
> storage_migrate() which has this information readily
> available anyway.
> 
> On the export side (volume_export, volume_export_formats):
>   The `$with_snapshots` option can now also be an ordered
>   array of snapshots to include, as a hint for storages
>   which need this. (As of the next commit this is only
>   btrfs, and only when also specifying a base snapshot,
>   which is a case we can currently not run into except on
>   the command line interface.)
>     The current providers of the `with_snapshot` option will
>   still treat it as a boolean (since eg. for ZFS you cannot
>   really "skip" snapshots AFAIK).
>     This is mainly intended for storages which do not have a
>   strong association between snapshots and the originals, or
>   an ordering (eg. btrfs and lvm-thin allow creating
>   arbitrary snapshot trees, and with btrfs you can even
>   create a "circular" connection between subvolumes, also we
>   could consider reflink based copies snapshots on xfs in
>   the future maybe?)
> 
> While `import_formats` and `export_formats` now take the same
> parameters, I appended the `$snapshot` parameter to the end of
> `import_formats` to the end for compatibility.

since we are currently in a window where we can do breaking changes, we 
could think about breaking for the sake of consistency. do we have any 
other cleanups planned that would cause us to reset APIAGE?

> Therefore we also bump APIAGE, as apart from the btrfs
> storage, none of our other storages need the new parameter.
> 
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
>  PVE/CLI/pvesm.pm | 23 +++++++++++++++++++++--
>  PVE/Storage.pm   | 48 ++++++++++++++++++++++++++++++++----------------
>  2 files changed, 53 insertions(+), 18 deletions(-)
> 
> diff --git a/PVE/CLI/pvesm.pm b/PVE/CLI/pvesm.pm
> index d28f1ba..b22f759 100755
> --- a/PVE/CLI/pvesm.pm
> +++ b/PVE/CLI/pvesm.pm
> @@ -276,12 +276,23 @@ __PACKAGE__->register_method ({
>  		optional => 1,
>  		default => 0,
>  	    },
> +	    'snapshot-list' => {
> +		description => "Ordered list of snapshots to transfer",
> +		type => 'string',
> +		format => 'string-list',
> +		optional => 1,
> +	    },
>  	},
>      },
>      returns => { type => 'null' },
>      code => sub {
>  	my ($param) = @_;
>  
> +	my $with_snapshots = $param->{'with-snapshots'};
> +	if (defined(my $list = $param->{'snapshot-list'})) {
> +	    $with_snapshots = PVE::Tools::split_list($list);
> +	}
> +
>  	my $filename = $param->{filename};
>  
>  	my $outfh;
> @@ -295,7 +306,7 @@ __PACKAGE__->register_method ({
>  	eval {
>  	    my $cfg = PVE::Storage::config();
>  	    PVE::Storage::volume_export($cfg, $outfh, $param->{volume}, $param->{format},
> -		$param->{snapshot}, $param->{base}, $param->{'with-snapshots'});
> +		$param->{snapshot}, $param->{base}, $with_snapshots);
>  	};
>  	my $err = $@;
>  	if ($filename ne '-') {
> @@ -361,6 +372,13 @@ __PACKAGE__->register_method ({
>  		optional => 1,
>  		default => 0,
>  	    },
> +	    snapshot => {
> +		description => "The current-state snapshot if the stream contains snapshots",
> +		type => 'string',
> +		pattern => qr/[a-z0-9_\-]{1,40}/i,
> +		maxLength => 40,
> +		optional => 1,
> +	    },
>  	},
>      },
>      returns => { type => 'string' },
> @@ -436,7 +454,8 @@ __PACKAGE__->register_method ({
>  	my $volume = $param->{volume};
>  	my $delete = $param->{'delete-snapshot'};
>  	my $imported_volid = PVE::Storage::volume_import($cfg, $infh, $volume, $param->{format},
> -	    $param->{base}, $param->{'with-snapshots'}, $param->{'allow-rename'});
> +	    $param->{base}, $param->{'with-snapshots'}, $param->{'allow-rename'},
> +	    $param->{snapshot});
>  	PVE::Storage::volume_snapshot_delete($cfg, $imported_volid, $delete)
>  	    if defined($delete);
>  	return $imported_volid;
> diff --git a/PVE/Storage.pm b/PVE/Storage.pm
> index f9f8d16..15fcedf 100755
> --- a/PVE/Storage.pm
> +++ b/PVE/Storage.pm
> @@ -41,11 +41,11 @@ use PVE::Storage::PBSPlugin;
>  use PVE::Storage::BTRFSPlugin;
>  
>  # Storage API version. Increment it on changes in storage API interface.
> -use constant APIVER => 8;
> +use constant APIVER => 9;
>  # Age is the number of versions we're backward compatible with.
>  # This is like having 'current=APIVER' and age='APIAGE' in libtool,
>  # see https://www.gnu.org/software/libtool/manual/html_node/Libtool-versioning.html
> -use constant APIAGE => 7;
> +use constant APIAGE => 8;
>  
>  # load standard plugins
>  PVE::Storage::DirPlugin->register();
> @@ -716,7 +716,8 @@ sub storage_migrate {
>      my $send = ['pvesm', 'export', $volid, $format, '-', '-with-snapshots', $with_snapshots];
>      my $recv = [@$ssh, '--', 'pvesm', 'import', $target_volid, $format, $import_fn, '-with-snapshots', $with_snapshots];
>      if (defined($snapshot)) {
> -	push @$send, '-snapshot', $snapshot
> +	push @$send, '-snapshot', $snapshot;
> +	push @$recv, '-snapshot', $snapshot;

this one breaks migration using storage_migrate and replication from 
updated to non-updated nodes.. we already introduced a mechanism to 
handle this (`pvesm apiinfo`) the last time we introduced a new 
parameter/breaking change on the receiving side (`--allow-rename`), 
which should likely be used here to stay compatible in both directions.

>      }
>      if ($migration_snapshot) {
>  	push @$recv, '-delete-snapshot', $snapshot;
> @@ -1716,7 +1717,7 @@ sub prune_mark_backup_group {
>      }
>  }
>  
> -sub volume_export {
> +sub volume_export : prototype($$$$$$$) {
>      my ($cfg, $fh, $volid, $format, $snapshot, $base_snapshot, $with_snapshots) = @_;
>  
>      my ($storeid, $volname) = parse_volume_id($volid, 1);
> @@ -1727,18 +1728,27 @@ sub volume_export {
>                                    $snapshot, $base_snapshot, $with_snapshots);
>  }
>  
> -sub volume_import {
> -    my ($cfg, $fh, $volid, $format, $base_snapshot, $with_snapshots, $allow_rename) = @_;
> +sub volume_import : prototype($$$$$$$$) {
> +    my ($cfg, $fh, $volid, $format, $base_snapshot, $with_snapshots, $allow_rename, $snapshot) = @_;
>  
>      my ($storeid, $volname) = parse_volume_id($volid, 1);
>      die "cannot import into volume '$volid'\n" if !$storeid;
>      my $scfg = storage_config($cfg, $storeid);
>      my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
> -    return $plugin->volume_import($scfg, $storeid, $fh, $volname, $format,
> -                                  $base_snapshot, $with_snapshots, $allow_rename) // $volid;
> -}
> -
> -sub volume_export_formats {
> +    return $plugin->volume_import(
> +	$scfg,
> +	$storeid,
> +	$fh,
> +	$volname,
> +	$format,
> +	$base_snapshot,
> +	$with_snapshots,
> +	$allow_rename,
> +	$snapshot,
> +    ) // $volid;
> +}
> +
> +sub volume_export_formats : prototype($$$$$) {
>      my ($cfg, $volid, $snapshot, $base_snapshot, $with_snapshots) = @_;
>  
>      my ($storeid, $volname) = parse_volume_id($volid, 1);
> @@ -1750,21 +1760,27 @@ sub volume_export_formats {
>                                            $with_snapshots);
>  }
>  
> -sub volume_import_formats {
> -    my ($cfg, $volid, $base_snapshot, $with_snapshots) = @_;
> +sub volume_import_formats : prototype($$$$$) {
> +    my ($cfg, $volid, $base_snapshot, $with_snapshots, $snapshot) = @_;
>  
>      my ($storeid, $volname) = parse_volume_id($volid, 1);
>      return if !$storeid;
>      my $scfg = storage_config($cfg, $storeid);
>      my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
> -    return $plugin->volume_import_formats($scfg, $storeid, $volname,
> -                                          $base_snapshot, $with_snapshots);
> +    return $plugin->volume_import_formats(
> +	$scfg,
> +	$storeid,
> +	$volname,
> +	$base_snapshot,
> +	$with_snapshots,
> +	$snapshot,
> +    );
>  }
>  
>  sub volume_transfer_formats {
>      my ($cfg, $src_volid, $dst_volid, $snapshot, $base_snapshot, $with_snapshots) = @_;
>      my @export_formats = volume_export_formats($cfg, $src_volid, $snapshot, $base_snapshot, $with_snapshots);
> -    my @import_formats = volume_import_formats($cfg, $dst_volid, $base_snapshot, $with_snapshots);
> +    my @import_formats = volume_import_formats($cfg, $dst_volid, $base_snapshot, $with_snapshots, $snapshot);
>      my %import_hash = map { $_ => 1 } @import_formats;
>      my @common = grep { $import_hash{$_} } @export_formats;
>      return @common;
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server] allow migrating raw btrfs volumes
  2021-06-09 13:18 ` [pve-devel] [PATCH qemu-server] allow migrating raw btrfs volumes Wolfgang Bumiller
@ 2021-06-10 12:35   ` Fabian Grünbichler
  0 siblings, 0 replies; 17+ messages in thread
From: Fabian Grünbichler @ 2021-06-10 12:35 UTC (permalink / raw)
  To: Proxmox VE development discussion

On June 9, 2021 3:18 pm, Wolfgang Bumiller wrote:
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
>  PVE/QemuMigrate.pm | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index 6375a15..5b07c20 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -502,7 +502,10 @@ sub scan_local_volumes {
>  		# exceptions: 'zfspool' or 'qcow2' files (on directory storage)
>  
>  		die "online storage migration not possible if snapshot exists\n" if $self->{running};
> -		if (!($scfg->{type} eq 'zfspool' || $local_volumes->{$volid}->{format} eq 'qcow2')) {
> +		if (!($scfg->{type} eq 'zfspool'
> +		    || ($scfg->{type} eq 'btrfs' && $local_volumes->{$volid}->{format} eq 'raw')
> +		    || $local_volumes->{$volid}->{format} eq 'qcow2'
> +		)) {
>  		    die "non-migratable snapshot exists\n";
>  		}
>  	    }
> @@ -553,7 +556,7 @@ sub scan_local_volumes {
>  	    my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
>  	    my $scfg =  PVE::Storage::storage_config($storecfg, $sid);
>  
> -	    my $migratable = $scfg->{type} =~ /^(?:dir|zfspool|lvmthin|lvm)$/;
> +	    my $migratable = $scfg->{type} =~ /^(?:dir|btrfs|zfspool|lvmthin|lvm)$/;

also stumbled upon this when doing the remote migrate stuff - AFAICT, 
this is basically a leftover thing from when migration was more limited. 
atm this is basically every built-in storage type that is not shared 
(and for remote migration, we'd add most shared storage types as well).

I'd either drop it altogether, or move it to the storage plugin 
definition (possible two booleans - one for NBD-based qemu migration, 
one for storage migration?).

>  
>  	    die "can't migrate '$volid' - storage type '$scfg->{type}' not supported\n"
>  		if !$migratable;
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH container 2/2] enable btrfs support via subvolumes
  2021-06-09 13:18 ` [pve-devel] [PATCH container 2/2] enable btrfs support via subvolumes Wolfgang Bumiller
@ 2021-06-10 12:35   ` Fabian Grünbichler
  0 siblings, 0 replies; 17+ messages in thread
From: Fabian Grünbichler @ 2021-06-10 12:35 UTC (permalink / raw)
  To: Proxmox VE development discussion

On June 9, 2021 3:18 pm, Wolfgang Bumiller wrote:
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
>  src/PVE/LXC.pm         | 4 +++-
>  src/PVE/LXC/Migrate.pm | 7 ++++---
>  2 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
> index bb1cbdb..9407d24 100644
> --- a/src/PVE/LXC.pm
> +++ b/src/PVE/LXC.pm
> @@ -1871,7 +1871,9 @@ sub alloc_disk {
>  
>      eval {
>  	my $do_format = 0;
> -	if ($scfg->{type} eq 'dir' || $scfg->{type} eq 'nfs' || $scfg->{type} eq 'cifs' ) {
> +	if ($scfg->{type} eq 'dir' || $scfg->{type} eq 'nfs' || $scfg->{type} eq 'cifs'
> +	    || $scfg->{type} eq 'btrfs'
> +	) {
>  	    if ($size_kb > 0) {
>  		$volid = PVE::Storage::vdisk_alloc($storecfg, $storage, $vmid, 'raw',
>  						   undef, $size_kb);
> diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
> index 4370a3d..15c6027 100644
> --- a/src/PVE/LXC/Migrate.pm
> +++ b/src/PVE/LXC/Migrate.pm
> @@ -154,7 +154,7 @@ sub phase1 {
>  	if (defined($snapname)) {
>  	    # we cannot migrate shapshots on local storage
>  	    # exceptions: 'zfspool'

comment needs update ;)

> -	    if (($scfg->{type} eq 'zfspool')) {
> +	    if ($scfg->{type} eq 'zfspool' || $scfg->{type} eq 'btrfs') {
>  		return;
>  	    }
>  	    die "non-migratable snapshot exists\n";
> @@ -218,8 +218,9 @@ sub phase1 {
>  	    my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
>  	    my $scfg =  PVE::Storage::storage_config($self->{storecfg}, $sid);
>  
> -	    my $migratable = ($scfg->{type} eq 'dir') || ($scfg->{type} eq 'zfspool') ||
> -		($scfg->{type} eq 'lvmthin') || ($scfg->{type} eq 'lvm');
> +	    my $migratable = ($scfg->{type} eq 'dir') || ($scfg->{type} eq 'zfspool')
> +		|| ($scfg->{type} eq 'lvmthin') || ($scfg->{type} eq 'lvm')
> +		|| ($scfg->{type} eq 'btrfs');

same as for qemu-server, IMHO we should think about dropping this (or 
moving it to storage plugins as a flag).

>  
>  	    die "storage type '$scfg->{type}' not supported\n"
>  		if !$migratable;
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH storage 2/4] add BTRFS storage plugin
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 2/4] add BTRFS storage plugin Wolfgang Bumiller
@ 2021-06-10 12:40   ` Fabian Grünbichler
  2021-06-10 12:59     ` Wolfgang Bumiller
  2021-06-11 12:11   ` Fabian Ebner
  1 sibling, 1 reply; 17+ messages in thread
From: Fabian Grünbichler @ 2021-06-10 12:40 UTC (permalink / raw)
  To: Proxmox VE development discussion

On June 9, 2021 3:18 pm, Wolfgang Bumiller wrote:
> This is mostly the same as a directory storage, with 2 major
> differences:
> 
> * 'subvol' volumes are actual btrfs subvolumes and therefore
>   allow snapshots
> * 'raw' files are placed *into* a subvolume and therefore
>   also allow snapshots, the raw file for volume
>   `btrstore:100/vm-100-disk-1.raw` can be found under
>   `$path/images/100/vm-100-disk-1/disk.raw`
> * snapshots add an '@name' suffix to the subvolume's name,
>   so snapshot 'foo' of the above would be found under
>   `$path/images/100/vm-100-disk-1@foo/disk.raw`
> 
> Note that qgroups aren't included in btrfs-send streams,
> therefore for now we will only be using *unsized* subvolumes
> for containers and place a regular raw+ext4 file for sized
> containers.
> We could extend the import/export stream format to include
> the information at the front (similar to how we do the
> "tar+size" format, but we need to include the size of all
> the contained snapshots as well, since they can technically
> change). (But before enabling quotas we should do some
> performance testing on bigger file systems with multiple
> snapshots as there are quite a few reports of the fs slowing
> down considerably in such scenarios).

this should probably be mentioned prominently in the docs somewhere, 
else it could cause quite some confusion. also is ext4 in raw image on 
btrfs really faster than btrfs with quotas? Oo we really should do some 
benchmarks first before going down either route..

> 
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
>  PVE/Storage.pm             |   2 +
>  PVE/Storage/BTRFSPlugin.pm | 616 +++++++++++++++++++++++++++++++++++++
>  PVE/Storage/Makefile       |   1 +
>  3 files changed, 619 insertions(+)
>  create mode 100644 PVE/Storage/BTRFSPlugin.pm
> 
> diff --git a/PVE/Storage.pm b/PVE/Storage.pm
> index aa36bad..f9f8d16 100755
> --- a/PVE/Storage.pm
> +++ b/PVE/Storage.pm
> @@ -38,6 +38,7 @@ use PVE::Storage::GlusterfsPlugin;
>  use PVE::Storage::ZFSPoolPlugin;
>  use PVE::Storage::ZFSPlugin;
>  use PVE::Storage::PBSPlugin;
> +use PVE::Storage::BTRFSPlugin;
>  
>  # Storage API version. Increment it on changes in storage API interface.
>  use constant APIVER => 8;
> @@ -60,6 +61,7 @@ PVE::Storage::GlusterfsPlugin->register();
>  PVE::Storage::ZFSPoolPlugin->register();
>  PVE::Storage::ZFSPlugin->register();
>  PVE::Storage::PBSPlugin->register();
> +PVE::Storage::BTRFSPlugin->register();
>  
>  # load third-party plugins
>  if ( -d '/usr/share/perl5/PVE/Storage/Custom' ) {
> diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
> new file mode 100644
> index 0000000..733be6e
> --- /dev/null
> +++ b/PVE/Storage/BTRFSPlugin.pm
> @@ -0,0 +1,616 @@
> +package PVE::Storage::BTRFSPlugin;
> +
> +use strict;
> +use warnings;
> +
> +use base qw(PVE::Storage::Plugin);
> +
> +use Fcntl qw(S_ISDIR);
> +use File::Basename qw(dirname);
> +use File::Path qw(mkpath);
> +use IO::Dir;
> +
> +use PVE::Tools qw(run_command);
> +
> +use PVE::Storage::DirPlugin;
> +
> +use constant {
> +    BTRFS_FIRST_FREE_OBJECTID => 256,
> +};
> +
> +# Configuration (similar to DirPlugin)
> +
> +sub type {
> +    return 'btrfs';
> +}
> +
> +sub plugindata {
> +    return {
> +	content => [
> +	    {
> +		images => 1,
> +		rootdir => 1,
> +		vztmpl => 1,
> +		iso => 1,
> +		backup => 1,
> +		snippets => 1,
> +		none => 1,
> +	    },
> +	    { images => 1, rootdir => 1 },
> +	],
> +	format => [ { raw => 1, qcow2 => 1, vmdk => 1, subvol => 1 }, 'raw', ],
> +    };
> +}
> +
> +sub options {
> +    return {
> +	path => { fixed => 1 },
> +	nodes => { optional => 1 },
> +	shared => { optional => 1 },
> +	disable => { optional => 1 },
> +	maxfiles => { optional => 1 },
> +	content => { optional => 1 },
> +	format => { optional => 1 },
> +	is_mountpoint => { optional => 1 },
> +	# TODO: The new variant of mkdir with  `populate` vs `create`...
> +    };
> +}
> +
> +# Storage implementation
> +#
> +# We use the same volume names are directory plugins, but map *raw* disk image file names into a
> +# subdirectory.
> +#
> +# `vm-VMID-disk-ID.raw`
> +#   -> `images/VMID/vm-VMID-disk-ID/disk.raw`
> +#   where the `vm-VMID-disk-ID/` subdirectory is a btrfs subvolume
> +
> +# Reuse `DirPlugin`'s `check_config`. This simply checks for invalid paths.
> +sub check_config {
> +    my ($self, $sectionId, $config, $create, $skipSchemaCheck) = @_;
> +    return PVE::Storage::DirPlugin::check_config($self, $sectionId, $config, $create, $skipSchemaCheck);
> +}
> +
> +sub activate_storage {
> +    my ($class, $storeid, $scfg, $cache) = @_;
> +    return PVE::Storage::DirPlugin::activate_storage($class, $storeid, $scfg, $cache);
> +}
> +
> +sub status {
> +    my ($class, $storeid, $scfg, $cache) = @_;
> +    return PVE::Storage::DirPlugin::status($class, $storeid, $scfg, $cache);
> +}
> +
> +# TODO: sub get_volume_notes {}
> +
> +# TODO: sub update_volume_notes {}
> +
> +# croak would not include the caller from within this module
> +sub __error {
> +    my ($msg) = @_;
> +    my (undef, $f, $n) = caller(1);
> +    die "$msg at $f: $n\n";
> +}
> +
> +# Given a name (eg. `vm-VMID-disk-ID.raw`), take the part up to the format suffix as the name of
> +# the subdirectory (subvolume).
> +sub raw_name_to_dir($) {
> +    my ($raw) = @_;
> +
> +    # For the subvolume directory Strip the `.<format>` suffix:
> +    if ($raw =~ /^(.*)\.raw$/) {
> +	return $1;
> +    }
> +
> +    __error "internal error: bad disk name: $raw";
> +}
> +
> +sub raw_file_to_subvol($) {
> +    my ($file) = @_;
> +
> +    if ($file =~ m|^(.*)/disk\.raw$|) {
> +	return "$1";
> +    }
> +
> +    __error "internal error: bad raw path: $file";
> +}
> +
> +sub filesystem_path {
> +    my ($class, $scfg, $volname, $snapname) = @_;
> +
> +    my ($vtype, $name, $vmid, undef, undef, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    my $path = $class->get_subdir($scfg, $vtype);
> +
> +    $path .= "/$vmid" if $vtype eq 'images';
> +
> +    if ($format eq 'raw') {
> +	my $dir = raw_name_to_dir($name);
> +	if ($snapname) {
> +	    $dir .= "\@$snapname";
> +	}
> +	$path .= "/$dir/disk.raw";
> +    } elsif ($format eq 'subvol') {
> +	$path .= "/$name";
> +	if ($snapname) {
> +	    $path .= "\@$snapname";
> +	}
> +    } else {
> +	$path .= "/$name";
> +    }
> +
> +    return wantarray ? ($path, $vmid, $vtype) : $path;
> +}
> +
> +sub btrfs_cmd {
> +    my ($class, $cmd, $outfunc) = @_;
> +
> +    my $msg = '';
> +    my $func;
> +    if (defined($outfunc)) {
> +	$func = sub {
> +	    my $part = &$outfunc(@_);
> +	    $msg .= $part if defined($part);
> +	};
> +    } else {
> +	$func = sub { $msg .= "$_[0]\n" };
> +    }
> +    run_command(['btrfs', '-q', @$cmd], errmsg => 'btrfs error', outfunc => $func);
> +
> +    return $msg;
> +}
> +
> +sub btrfs_get_subvol_id {
> +    my ($class, $path) = @_;
> +    my $info = $class->btrfs_cmd(['subvolume', 'show', '--', $path]);
> +    if ($info !~ /^\s*(?:Object|Subvolume) ID:\s*(\d+)$/m) {
> +	die "failed to get btrfs subvolume ID from: $info\n";
> +    }
> +    return $1;
> +}
> +
> +sub create_base {
> +    my ($class, $storeid, $scfg, $volname) = @_;
> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    my $newname = $name;
> +    $newname =~ s/^vm-/base-/;
> +
> +    # If we're not working with a 'raw' file, which is the only thing that's "different" for btrfs,
> +    # or a subvolume, we forward to the DirPlugin
> +    if ($format ne 'raw' && $format ne 'subvol') {
> +	return PVE::Storage::DirPlugin::create_base(@_);
> +    }
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +    my $newvolname = $basename ? "$basevmid/$basename/$vmid/$newname" : "$vmid/$newname";
> +    my $newpath = $class->filesystem_path($scfg, $newvolname);
> +
> +    my $subvol = $path;
> +    my $newsubvol = $newpath;
> +    if ($format eq 'raw') {
> +	$subvol = raw_file_to_subvol($subvol);
> +	$newsubvol = raw_file_to_subvol($newsubvol);
> +    }
> +
> +    rename($subvol, $newsubvol)
> +	|| die "rename '$subvol' to '$newsubvol' failed - $!\n";
> +    eval { $class->btrfs_cmd(['property', 'set', $newsubvol, 'ro', 'true']) };
> +    warn $@ if $@;
> +
> +    return $newvolname;
> +}
> +
> +sub clone_image {
> +    my ($class, $scfg, $storeid, $volname, $vmid, $snap) = @_;
> +
> +    my ($vtype, $basename, $basevmid, undef, undef, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    # If we're not working with a 'raw' file, which is the only thing that's "different" for btrfs,
> +    # or a subvolume, we forward to the DirPlugin
> +    if ($format ne 'raw' && $format ne 'subvol') {
> +	return PVE::Storage::DirPlugin::clone_image(@_);
> +    }
> +
> +    my $imagedir = $class->get_subdir($scfg, 'images');
> +    $imagedir .= "/$vmid";
> +    mkpath $imagedir;
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +    my $newname = $class->find_free_diskname($storeid, $scfg, $vmid, $format, 1);
> +
> +    # For btrfs subvolumes we don't actually need the "link":
> +    #my $newvolname = "$basevmid/$basename/$vmid/$newname";
> +    my $newvolname = "$vmid/$newname";
> +    my $newpath = $class->filesystem_path($scfg, $newvolname);
> +
> +    my $subvol = $path;
> +    my $newsubvol = $newpath;
> +    if ($format eq 'raw') {
> +	$subvol = raw_file_to_subvol($subvol);
> +	$newsubvol = raw_file_to_subvol($newsubvol);
> +    }
> +
> +    $class->btrfs_cmd(['subvolume', 'snapshot', '--', $subvol, $newsubvol]);
> +
> +    return $newvolname;
> +}
> +
> +sub alloc_image {
> +    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
> +
> +    if ($fmt ne 'raw' && $fmt ne 'subvol') {
> +	return PVE::Storage::DirPlugin::alloc_image(@_);
> +    }
> +
> +    # From Plugin.pm:
> +
> +    my $imagedir = $class->get_subdir($scfg, 'images') . "/$vmid";
> +
> +    mkpath $imagedir;
> +
> +    $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt, 1) if !$name;
> +
> +    my (undef, $tmpfmt) = PVE::Storage::Plugin::parse_name_dir($name);
> +
> +    die "illegal name '$name' - wrong extension for format ('$tmpfmt != '$fmt')\n"
> +	if $tmpfmt ne $fmt;
> +
> +    # End copy from Plugin.pm
> +
> +    my $subvol = "$imagedir/$name";
> +    # .raw is not part of the directory name
> +    $subvol =~ s/\.raw$//;
> +
> +    die "disk image '$subvol' already exists\n" if -e $subvol;
> +
> +    my $path;
> +    if ($fmt eq 'raw') {
> +	$path = "$subvol/disk.raw";
> +    }
> +
> +    $class->btrfs_cmd(['subvolume', 'create', '--', $subvol]);
> +
> +    if ($fmt eq 'subvol' && !!$size) {
> +	# NOTE: `btrfs send/recv` actually drops quota information so supporting subvolumes with
> +	# quotas doesn't play nice with send/recv.
> +	die "btrfs quotas are currently not supported, use an unsized subvolume or a raw file\n";
> +
> +	# This is how we *would* do it:
> +	# # Use the subvol's default 0/$id qgroup
> +	# eval {
> +	#     # This call should happen at storage creation instead and therefore governed by a
> +	#     # configuration option!
> +	#     # $class->btrfs_cmd(['quota', 'enable', $subvol]);
> +	#     my $id = $class->btrfs_get_subvol_id($subvol);
> +	#     $class->btrfs_cmd(['qgroup', 'limit', "${size}k", "0/$id", $subvol]);
> +	# };
> +    } elsif ($fmt eq 'raw') {
> +	# From Plugin.pm (minus the qcow2 part):
> +	my $cmd = ['/usr/bin/qemu-img', 'create'];
> +
> +	push @$cmd, '-f', $fmt, $path, "${size}K";
> +
> +	eval { run_command($cmd, errmsg => "unable to create image"); };
> +    } else {
> +	# be clear that after this if/else we're handling the result of an eval block:
> +	$@ = undef;
> +    }
> +
> +    if (my $err = $@) {
> +	eval { $class->btrfs_cmd(['subvolume', 'delete', '--', $subvol]); };
> +	warn $@ if $@;
> +	die $err;
> +    }
> +
> +    return "$vmid/$name";
> +}
> +
> +# Same as btrfsprogs does:
> +my sub path_is_subvolume : prototype($) {
> +    my ($path) = @_;
> +    my @stat = stat($path)
> +	or die "stat failed on '$path' - $!\n";
> +    my ($ino, $mode) = @stat[1, 2];
> +    return S_ISDIR($mode) && $ino == BTRFS_FIRST_FREE_OBJECTID;
> +}
> +
> +my $BTRFS_VOL_REGEX = qr/((?:vm|base|subvol)-\d+-disk-\d+(?:\.subvol)?)(?:\@(\S+))$/;
> +
> +# Calls `$code->($volume, $name, $snapshot)` for each subvol in a directory matching our volume
> +# regex.
> +my sub foreach_subvol : prototype($$) {
> +    my ($dir, $code) = @_;
> +
> +    dir_glob_foreach($dir, $BTRFS_VOL_REGEX, sub {
> +	my ($volume, $name, $snapshot) = ($1, $2, $3);
> +	return if !path_is_subvolume("$dir/$volume");
> +	$code->($volume, $name, $snapshot);
> +    })
> +}
> +
> +sub free_image {
> +    my ($class, $storeid, $scfg, $volname, $isBase, $_format) = @_;
> +
> +    my (undef, undef, $vmid, undef, undef, undef, $format) =
> +	$class->parse_volname($volname);
> +
> +    if ($format ne 'subvol' && $format ne 'raw') {
> +	return PVE::Storage::DirPlugin::free_image(@_);
> +    }
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +
> +    my $subvol = $path;
> +    if ($format eq 'raw') {
> +	$subvol = raw_file_to_subvol($path);
> +    }
> +
> +    my $dir = dirname($subvol);
> +    my @snapshot_vols;
> +    foreach_subvol($dir, sub {
> +	my ($volume, $name, $snapshot) = @_;
> +	return if !defined $snapshot;
> +	push @snapshot_vols, "$dir/$volume";
> +    });
> +
> +    $class->btrfs_cmd(['subvolume', 'delete', '--', @snapshot_vols, $subvol]);
> +    # try to cleanup directory to not clutter storage with empty $vmid dirs if
> +    # all images from a guest got deleted
> +    rmdir($dir);
> +
> +    return undef;
> +}
> +
> +# Currently not used because quotas clash with send/recv.
> +# my sub btrfs_subvol_quota {
> +#     my ($class, $path) = @_;
> +#     my $id = '0/' . $class->btrfs_get_subvol_id($path);
> +#     my $search = qr/^\Q$id\E\s+(\d)+\s+\d+\s+(\d+)\s*$/;
> +#     my ($used, $size);
> +#     $class->btrfs_cmd(['qgroup', 'show', '--raw', '-rf', '--', $path], sub {
> +# 	return if defined($size);
> +# 	if ($_[0] =~ $search) {
> +# 	    ($used, $size) = ($1, $2);
> +# 	}
> +#     });
> +#     if (!defined($size)) {
> +# 	# syslog should include more information:
> +# 	syslog('err', "failed to get subvolume size for: $path (id $id)");
> +# 	# UI should only see the last path component:
> +# 	$path =~ s|^.*/||;
> +# 	die "failed to get subvolume size for $path\n";
> +#     }
> +#     return wantarray ? ($used, $size) : $size;
> +# }
> +
> +sub volume_size_info {
> +    my ($class, $scfg, $storeid, $volname, $timeout) = @_;
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +
> +    my $format = ($class->parse_volname($volname))[6];
> +
> +    if ($format eq 'subvol') {
> +	my $ctime = (stat($path))[10];
> +	my ($used, $size) = (0, 0);
> +	#my ($used, $size) = btrfs_subvol_quota($class, $path); # uses wantarray
> +	return wantarray ? ($size, 'subvol', $used, undef, $ctime) : 1;
> +    }
> +
> +    return PVE::Storage::Plugin::file_size_info($path, $timeout);
> +}
> +
> +sub volume_resize {
> +    my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
> +
> +    my $format = ($class->parse_volname($volname))[6];
> +    if ($format eq 'subvol') {
> +	my $path = $class->filesystem_path($scfg, $volname);
> +	my $id = '0/' . $class->btrfs_get_subvol_id($path);
> +	$class->btrfs_cmd(['qgroup', 'limit', '--', "${size}k", "0/$id", $path]);
> +	return undef;
> +    }
> +
> +    return PVE::Storage::Plugin::volume_resize(@_);
> +}
> +
> +sub volume_snapshot {
> +    my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +
> +    my ($name, $vmid, $format) = ($class->parse_volname($volname))[1,2,6];
> +    if ($format ne 'subvol' && $format ne 'raw') {
> +	return PVE::Storage::Plugin::volume_snapshot(@_);
> +    }
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +    my $snap_path = $class->filesystem_path($scfg, $volname, $snap);
> +
> +    if ($format eq 'raw') {
> +	$path = raw_file_to_subvol($path);
> +	$snap_path = raw_file_to_subvol($snap_path);
> +    }
> +
> +    my $snapshot_dir = $class->get_subdir($scfg, 'images') . "/$vmid";
> +    mkpath $snapshot_dir;
> +
> +    $class->btrfs_cmd(['subvolume', 'snapshot', '-r', '--', $path, $snap_path]);
> +    return undef;
> +}
> +
> +sub volume_rollback_is_possible {
> +    my ($class, $scfg, $storeid, $volname, $snap) = @_; 
> +
> +    return 1; 
> +}
> +
> +sub volume_snapshot_rollback {
> +    my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +
> +    my ($name, $format) = ($class->parse_volname($volname))[1,6];
> +
> +    if ($format ne 'subvol' && $format ne 'raw') {
> +	return PVE::Storage::Plugin::volume_snapshot_rollback(@_);
> +    }
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +    my $snap_path = $class->filesystem_path($scfg, $volname, $snap);
> +
> +    if ($format eq 'raw') {
> +	$path = raw_file_to_subvol($path);
> +	$snap_path = raw_file_to_subvol($snap_path);
> +    }
> +
> +    # Simple version would be:
> +    #   rename old to temp
> +    #   create new
> +    #   on error rename temp back
> +    # But for atomicity in case the rename after create-failure *also* fails, we create the new
> +    # subvol first, then use RENAME_EXCHANGE, 
> +    my $tmp_path = "$path.tmp.$$";
> +    $class->btrfs_cmd(['subvolume', 'snapshot', '--', $snap_path, $tmp_path]);
> +    # The paths are absolute, so pass -1 as file descriptors.
> +    my $ok = PVE::Tools::renameat2(-1, $tmp_path, -1, $path, &PVE::Tools::RENAME_EXCHANGE);
> +
> +    eval { $class->btrfs_cmd(['subvolume', 'delete', '--', $tmp_path]) };
> +    warn "failed to remove '$tmp_path' subvolume: $@" if $@;
> +
> +    if (!$ok) {
> +	die "failed to rotate '$tmp_path' into place at '$path' - $!\n";
> +    }
> +
> +    return undef;
> +}
> +
> +sub volume_snapshot_delete {
> +    my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
> +
> +    my ($name, $vmid, $format) = ($class->parse_volname($volname))[1,2,6];
> +
> +    if ($format ne 'subvol' && $format ne 'raw') {
> +	return PVE::Storage::Plugin::volume_snapshot_delete(@_);
> +    }
> +
> +    my $path = $class->filesystem_path($scfg, $volname, $snap);
> +
> +    if ($format eq 'raw') {
> +	$path = raw_file_to_subvol($path);
> +    }
> +
> +    $class->btrfs_cmd(['subvolume', 'delete', '--', $path]);
> +
> +    return undef;
> +}
> +
> +sub volume_has_feature {
> +    my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
> +
> +    my $features = {
> +	snapshot => {
> +	    current => { qcow2 => 1, raw => 1, subvol => 1 },
> +	    snap => { qcow2 => 1, raw => 1, subvol => 1 }
> +	},
> +	clone => {
> +	    base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> +	    current => { raw => 1 },
> +	    snap => { raw => 1 },
> +	},
> +	template => { current => { qcow2 => 1, raw => 1, vmdk => 1, subvol => 1 } },
> +	clone => {
> +	    base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> +	    current => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> +	    snap => { qcow2 => 1, raw => 1, subvol => 1 },
> +	},
> +	sparseinit => { base => {qcow2 => 1, raw => 1, vmdk => 1 },
> +			current => {qcow2 => 1, raw => 1, vmdk => 1 } },
> +    };
> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    my $key = undef;
> +    if ($snapname) {
> +        $key = 'snap';
> +    } else {
> +        $key =  $isBase ? 'base' : 'current';
> +    }
> +
> +    return 1 if defined($features->{$feature}->{$key}->{$format});
> +
> +    return undef;
> +}
> +
> +sub list_images {
> +    my ($class, $storeid, $scfg, $vmid, $vollist, $cache) = @_;
> +    my $imagedir = $class->get_subdir($scfg, 'images');
> +
> +    my $res = [];
> +
> +    # Copied from Plugin.pm, with file_size_info calls adapted:
> +    foreach my $fn (<$imagedir/[0-9][0-9]*/*>) {
> +	# different to in Plugin.pm the regex below also excludes '@' as valid file name
> +	next if $fn !~ m@^(/.+/(\d+)/([^/\@.]+(?:\.(qcow2|vmdk|subvol))?))$@;
> +	$fn = $1; # untaint
> +
> +	my $owner = $2;
> +	my $name = $3;
> +	my $ext = $4;
> +
> +	next if !$vollist && defined($vmid) && ($owner ne $vmid);
> +
> +	my $volid = "$storeid:$owner/$name";
> +	my ($size, $format, $used, $parent, $ctime);
> +
> +	if (!$ext) { # raw
> +	    $volid .= '.raw';
> +	    ($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info("$fn/disk.raw");
> +	} elsif ($ext eq 'subvol') {
> +	    ($used, $size) = (0, 0);
> +	    #($used, $size) = btrfs_subvol_quota($class, $fn);
> +	    $format = 'subvol';
> +	} else {
> +	    ($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info($fn);
> +	}
> +	next if !($format && defined($size));
> +
> +	if ($vollist) {
> +	    next if ! grep { $_ eq $volid } @$vollist;
> +	}
> +
> +	my $info = {
> +	    volid => $volid, format => $format,
> +	    size => $size, vmid => $owner, used => $used, parent => $parent,
> +	};
> +
> +        $info->{ctime} = $ctime if $ctime;
> +
> +        push @$res, $info;
> +    }
> +
> +    return $res;
> +}
> +
> +# For now we don't implement `btrfs send/recv` as it needs some updates to our import/export API
> +# first!
> +
> +sub volume_export_formats {
> +    return PVE::Storage::DirPlugin::volume_export_formats(@_);
> +}
> +
> +sub volume_export {
> +    return PVE::Storage::DirPlugin::volume_export(@_);
> +}
> +
> +sub volume_import_formats {
> +    return PVE::Storage::DirPlugin::volume_import_formats(@_);
> +}
> +
> +sub volume_import {
> +    return PVE::Storage::DirPlugin::volume_import(@_);
> +}
> +
> +1
> diff --git a/PVE/Storage/Makefile b/PVE/Storage/Makefile
> index 91b9238..857c485 100644
> --- a/PVE/Storage/Makefile
> +++ b/PVE/Storage/Makefile
> @@ -12,6 +12,7 @@ SOURCES= \
>  	ZFSPoolPlugin.pm \
>  	ZFSPlugin.pm \
>  	PBSPlugin.pm \
> +	BTRFSPlugin.pm \
>  	LvmThinPlugin.pm
>  
>  .PHONY: install
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH storage 2/4] add BTRFS storage plugin
  2021-06-10 12:40   ` Fabian Grünbichler
@ 2021-06-10 12:59     ` Wolfgang Bumiller
  0 siblings, 0 replies; 17+ messages in thread
From: Wolfgang Bumiller @ 2021-06-10 12:59 UTC (permalink / raw)
  To: Fabian Grünbichler; +Cc: Proxmox VE development discussion

On Thu, Jun 10, 2021 at 02:40:22PM +0200, Fabian Grünbichler wrote:
> On June 9, 2021 3:18 pm, Wolfgang Bumiller wrote:
> > This is mostly the same as a directory storage, with 2 major
> > differences:
> > 
> > * 'subvol' volumes are actual btrfs subvolumes and therefore
> >   allow snapshots
> > * 'raw' files are placed *into* a subvolume and therefore
> >   also allow snapshots, the raw file for volume
> >   `btrstore:100/vm-100-disk-1.raw` can be found under
> >   `$path/images/100/vm-100-disk-1/disk.raw`
> > * snapshots add an '@name' suffix to the subvolume's name,
> >   so snapshot 'foo' of the above would be found under
> >   `$path/images/100/vm-100-disk-1@foo/disk.raw`
> > 
> > Note that qgroups aren't included in btrfs-send streams,
> > therefore for now we will only be using *unsized* subvolumes
> > for containers and place a regular raw+ext4 file for sized
> > containers.
> > We could extend the import/export stream format to include
> > the information at the front (similar to how we do the
> > "tar+size" format, but we need to include the size of all
> > the contained snapshots as well, since they can technically
> > change). (But before enabling quotas we should do some
> > performance testing on bigger file systems with multiple
> > snapshots as there are quite a few reports of the fs slowing
> > down considerably in such scenarios).
> 
> this should probably be mentioned prominently in the docs somewhere, 
> else it could cause quite some confusion. also is ext4 in raw image on 
> btrfs really faster than btrfs with quotas? Oo we really should do some 
> benchmarks first before going down either route..

I do also have an idea for a simple-enough & extendable import/export
format change for also sending the quota info over the wire, so we can
go either way.
But yes, we definitely need to do some performance tests... And also
watch btrfs-scrubs with quotas in use... ;-)




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH storage 2/4] add BTRFS storage plugin
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 2/4] add BTRFS storage plugin Wolfgang Bumiller
  2021-06-10 12:40   ` Fabian Grünbichler
@ 2021-06-11 12:11   ` Fabian Ebner
  1 sibling, 0 replies; 17+ messages in thread
From: Fabian Ebner @ 2021-06-11 12:11 UTC (permalink / raw)
  To: pve-devel, Wolfgang Bumiller

Two small things I noticed while skimming over this inline:

Am 09.06.21 um 15:18 schrieb Wolfgang Bumiller:
> This is mostly the same as a directory storage, with 2 major
> differences:
> 
> * 'subvol' volumes are actual btrfs subvolumes and therefore
>    allow snapshots
> * 'raw' files are placed *into* a subvolume and therefore
>    also allow snapshots, the raw file for volume
>    `btrstore:100/vm-100-disk-1.raw` can be found under
>    `$path/images/100/vm-100-disk-1/disk.raw`
> * snapshots add an '@name' suffix to the subvolume's name,
>    so snapshot 'foo' of the above would be found under
>    `$path/images/100/vm-100-disk-1@foo/disk.raw`
> 
> Note that qgroups aren't included in btrfs-send streams,
> therefore for now we will only be using *unsized* subvolumes
> for containers and place a regular raw+ext4 file for sized
> containers.
> We could extend the import/export stream format to include
> the information at the front (similar to how we do the
> "tar+size" format, but we need to include the size of all
> the contained snapshots as well, since they can technically
> change). (But before enabling quotas we should do some
> performance testing on bigger file systems with multiple
> snapshots as there are quite a few reports of the fs slowing
> down considerably in such scenarios).
> 
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
>   PVE/Storage.pm             |   2 +
>   PVE/Storage/BTRFSPlugin.pm | 616 +++++++++++++++++++++++++++++++++++++
>   PVE/Storage/Makefile       |   1 +
>   3 files changed, 619 insertions(+)
>   create mode 100644 PVE/Storage/BTRFSPlugin.pm
> 
> diff --git a/PVE/Storage.pm b/PVE/Storage.pm
> index aa36bad..f9f8d16 100755
> --- a/PVE/Storage.pm
> +++ b/PVE/Storage.pm
> @@ -38,6 +38,7 @@ use PVE::Storage::GlusterfsPlugin;
>   use PVE::Storage::ZFSPoolPlugin;
>   use PVE::Storage::ZFSPlugin;
>   use PVE::Storage::PBSPlugin;
> +use PVE::Storage::BTRFSPlugin;
>   
>   # Storage API version. Increment it on changes in storage API interface.
>   use constant APIVER => 8;
> @@ -60,6 +61,7 @@ PVE::Storage::GlusterfsPlugin->register();
>   PVE::Storage::ZFSPoolPlugin->register();
>   PVE::Storage::ZFSPlugin->register();
>   PVE::Storage::PBSPlugin->register();
> +PVE::Storage::BTRFSPlugin->register();
>   
>   # load third-party plugins
>   if ( -d '/usr/share/perl5/PVE/Storage/Custom' ) {
> diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
> new file mode 100644
> index 0000000..733be6e
> --- /dev/null
> +++ b/PVE/Storage/BTRFSPlugin.pm
> @@ -0,0 +1,616 @@
> +package PVE::Storage::BTRFSPlugin;
> +
> +use strict;
> +use warnings;
> +
> +use base qw(PVE::Storage::Plugin);
> +
> +use Fcntl qw(S_ISDIR);
> +use File::Basename qw(dirname);
> +use File::Path qw(mkpath);
> +use IO::Dir;
> +
> +use PVE::Tools qw(run_command);
> +
> +use PVE::Storage::DirPlugin;
> +
> +use constant {
> +    BTRFS_FIRST_FREE_OBJECTID => 256,
> +};
> +
> +# Configuration (similar to DirPlugin)
> +
> +sub type {
> +    return 'btrfs';
> +}
> +
> +sub plugindata {
> +    return {
> +	content => [
> +	    {
> +		images => 1,
> +		rootdir => 1,
> +		vztmpl => 1,
> +		iso => 1,
> +		backup => 1,
> +		snippets => 1,
> +		none => 1,
> +	    },
> +	    { images => 1, rootdir => 1 },
> +	],
> +	format => [ { raw => 1, qcow2 => 1, vmdk => 1, subvol => 1 }, 'raw', ],
> +    };
> +}
> +
> +sub options {
> +    return {
> +	path => { fixed => 1 },
> +	nodes => { optional => 1 },
> +	shared => { optional => 1 },
> +	disable => { optional => 1 },
> +	maxfiles => { optional => 1 },

missing 'prune-backups'

> +	content => { optional => 1 },
> +	format => { optional => 1 },
> +	is_mountpoint => { optional => 1 },
> +	# TODO: The new variant of mkdir with  `populate` vs `create`...
> +    };
> +}
> +
> +# Storage implementation
> +#
> +# We use the same volume names are directory plugins, but map *raw* disk image file names into a
> +# subdirectory.
> +#
> +# `vm-VMID-disk-ID.raw`
> +#   -> `images/VMID/vm-VMID-disk-ID/disk.raw`
> +#   where the `vm-VMID-disk-ID/` subdirectory is a btrfs subvolume
> +
> +# Reuse `DirPlugin`'s `check_config`. This simply checks for invalid paths.
> +sub check_config {
> +    my ($self, $sectionId, $config, $create, $skipSchemaCheck) = @_;
> +    return PVE::Storage::DirPlugin::check_config($self, $sectionId, $config, $create, $skipSchemaCheck);
> +}
> +
> +sub activate_storage {
> +    my ($class, $storeid, $scfg, $cache) = @_;
> +    return PVE::Storage::DirPlugin::activate_storage($class, $storeid, $scfg, $cache);
> +}
> +
> +sub status {
> +    my ($class, $storeid, $scfg, $cache) = @_;
> +    return PVE::Storage::DirPlugin::status($class, $storeid, $scfg, $cache);
> +}
> +
> +# TODO: sub get_volume_notes {}
> +
> +# TODO: sub update_volume_notes {}
> +
> +# croak would not include the caller from within this module
> +sub __error {
> +    my ($msg) = @_;
> +    my (undef, $f, $n) = caller(1);
> +    die "$msg at $f: $n\n";
> +}
> +
> +# Given a name (eg. `vm-VMID-disk-ID.raw`), take the part up to the format suffix as the name of
> +# the subdirectory (subvolume).
> +sub raw_name_to_dir($) {
> +    my ($raw) = @_;
> +
> +    # For the subvolume directory Strip the `.<format>` suffix:
> +    if ($raw =~ /^(.*)\.raw$/) {
> +	return $1;
> +    }
> +
> +    __error "internal error: bad disk name: $raw";
> +}
> +
> +sub raw_file_to_subvol($) {
> +    my ($file) = @_;
> +
> +    if ($file =~ m|^(.*)/disk\.raw$|) {
> +	return "$1";
> +    }
> +
> +    __error "internal error: bad raw path: $file";
> +}
> +
> +sub filesystem_path {
> +    my ($class, $scfg, $volname, $snapname) = @_;
> +
> +    my ($vtype, $name, $vmid, undef, undef, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    my $path = $class->get_subdir($scfg, $vtype);
> +
> +    $path .= "/$vmid" if $vtype eq 'images';
> +
> +    if ($format eq 'raw') {
> +	my $dir = raw_name_to_dir($name);
> +	if ($snapname) {
> +	    $dir .= "\@$snapname";
> +	}
> +	$path .= "/$dir/disk.raw";
> +    } elsif ($format eq 'subvol') {
> +	$path .= "/$name";
> +	if ($snapname) {
> +	    $path .= "\@$snapname";
> +	}
> +    } else {
> +	$path .= "/$name";
> +    }
> +
> +    return wantarray ? ($path, $vmid, $vtype) : $path;
> +}
> +
> +sub btrfs_cmd {
> +    my ($class, $cmd, $outfunc) = @_;
> +
> +    my $msg = '';
> +    my $func;
> +    if (defined($outfunc)) {
> +	$func = sub {
> +	    my $part = &$outfunc(@_);
> +	    $msg .= $part if defined($part);
> +	};
> +    } else {
> +	$func = sub { $msg .= "$_[0]\n" };
> +    }
> +    run_command(['btrfs', '-q', @$cmd], errmsg => 'btrfs error', outfunc => $func);
> +
> +    return $msg;
> +}
> +
> +sub btrfs_get_subvol_id {
> +    my ($class, $path) = @_;
> +    my $info = $class->btrfs_cmd(['subvolume', 'show', '--', $path]);
> +    if ($info !~ /^\s*(?:Object|Subvolume) ID:\s*(\d+)$/m) {
> +	die "failed to get btrfs subvolume ID from: $info\n";
> +    }
> +    return $1;
> +}
> +
> +sub create_base {
> +    my ($class, $storeid, $scfg, $volname) = @_;
> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    my $newname = $name;
> +    $newname =~ s/^vm-/base-/;
> +
> +    # If we're not working with a 'raw' file, which is the only thing that's "different" for btrfs,
> +    # or a subvolume, we forward to the DirPlugin
> +    if ($format ne 'raw' && $format ne 'subvol') {
> +	return PVE::Storage::DirPlugin::create_base(@_);
> +    }
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +    my $newvolname = $basename ? "$basevmid/$basename/$vmid/$newname" : "$vmid/$newname";
> +    my $newpath = $class->filesystem_path($scfg, $newvolname);
> +
> +    my $subvol = $path;
> +    my $newsubvol = $newpath;
> +    if ($format eq 'raw') {
> +	$subvol = raw_file_to_subvol($subvol);
> +	$newsubvol = raw_file_to_subvol($newsubvol);
> +    }
> +
> +    rename($subvol, $newsubvol)
> +	|| die "rename '$subvol' to '$newsubvol' failed - $!\n";
> +    eval { $class->btrfs_cmd(['property', 'set', $newsubvol, 'ro', 'true']) };
> +    warn $@ if $@;
> +
> +    return $newvolname;
> +}
> +
> +sub clone_image {
> +    my ($class, $scfg, $storeid, $volname, $vmid, $snap) = @_;
> +
> +    my ($vtype, $basename, $basevmid, undef, undef, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    # If we're not working with a 'raw' file, which is the only thing that's "different" for btrfs,
> +    # or a subvolume, we forward to the DirPlugin
> +    if ($format ne 'raw' && $format ne 'subvol') {
> +	return PVE::Storage::DirPlugin::clone_image(@_);
> +    }
> +
> +    my $imagedir = $class->get_subdir($scfg, 'images');
> +    $imagedir .= "/$vmid";
> +    mkpath $imagedir;
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +    my $newname = $class->find_free_diskname($storeid, $scfg, $vmid, $format, 1);
> +
> +    # For btrfs subvolumes we don't actually need the "link":
> +    #my $newvolname = "$basevmid/$basename/$vmid/$newname";
> +    my $newvolname = "$vmid/$newname";
> +    my $newpath = $class->filesystem_path($scfg, $newvolname);
> +
> +    my $subvol = $path;
> +    my $newsubvol = $newpath;
> +    if ($format eq 'raw') {
> +	$subvol = raw_file_to_subvol($subvol);
> +	$newsubvol = raw_file_to_subvol($newsubvol);
> +    }
> +
> +    $class->btrfs_cmd(['subvolume', 'snapshot', '--', $subvol, $newsubvol]);
> +
> +    return $newvolname;
> +}
> +
> +sub alloc_image {
> +    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
> +
> +    if ($fmt ne 'raw' && $fmt ne 'subvol') {
> +	return PVE::Storage::DirPlugin::alloc_image(@_);
> +    }
> +
> +    # From Plugin.pm:
> +
> +    my $imagedir = $class->get_subdir($scfg, 'images') . "/$vmid";
> +
> +    mkpath $imagedir;
> +
> +    $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt, 1) if !$name;
> +
> +    my (undef, $tmpfmt) = PVE::Storage::Plugin::parse_name_dir($name);
> +
> +    die "illegal name '$name' - wrong extension for format ('$tmpfmt != '$fmt')\n"
> +	if $tmpfmt ne $fmt;
> +
> +    # End copy from Plugin.pm
> +
> +    my $subvol = "$imagedir/$name";
> +    # .raw is not part of the directory name
> +    $subvol =~ s/\.raw$//;
> +
> +    die "disk image '$subvol' already exists\n" if -e $subvol;
> +
> +    my $path;
> +    if ($fmt eq 'raw') {
> +	$path = "$subvol/disk.raw";
> +    }
> +
> +    $class->btrfs_cmd(['subvolume', 'create', '--', $subvol]);
> +
> +    if ($fmt eq 'subvol' && !!$size) {
> +	# NOTE: `btrfs send/recv` actually drops quota information so supporting subvolumes with
> +	# quotas doesn't play nice with send/recv.
> +	die "btrfs quotas are currently not supported, use an unsized subvolume or a raw file\n";
> +
> +	# This is how we *would* do it:
> +	# # Use the subvol's default 0/$id qgroup
> +	# eval {
> +	#     # This call should happen at storage creation instead and therefore governed by a
> +	#     # configuration option!
> +	#     # $class->btrfs_cmd(['quota', 'enable', $subvol]);
> +	#     my $id = $class->btrfs_get_subvol_id($subvol);
> +	#     $class->btrfs_cmd(['qgroup', 'limit', "${size}k", "0/$id", $subvol]);
> +	# };
> +    } elsif ($fmt eq 'raw') {
> +	# From Plugin.pm (minus the qcow2 part):
> +	my $cmd = ['/usr/bin/qemu-img', 'create'];
> +
> +	push @$cmd, '-f', $fmt, $path, "${size}K";
> +
> +	eval { run_command($cmd, errmsg => "unable to create image"); };
> +    } else {
> +	# be clear that after this if/else we're handling the result of an eval block:
> +	$@ = undef;
> +    }
> +
> +    if (my $err = $@) {
> +	eval { $class->btrfs_cmd(['subvolume', 'delete', '--', $subvol]); };
> +	warn $@ if $@;
> +	die $err;
> +    }
> +
> +    return "$vmid/$name";
> +}
> +
> +# Same as btrfsprogs does:
> +my sub path_is_subvolume : prototype($) {
> +    my ($path) = @_;
> +    my @stat = stat($path)
> +	or die "stat failed on '$path' - $!\n";
> +    my ($ino, $mode) = @stat[1, 2];
> +    return S_ISDIR($mode) && $ino == BTRFS_FIRST_FREE_OBJECTID;
> +}
> +
> +my $BTRFS_VOL_REGEX = qr/((?:vm|base|subvol)-\d+-disk-\d+(?:\.subvol)?)(?:\@(\S+))$/;
> +
> +# Calls `$code->($volume, $name, $snapshot)` for each subvol in a directory matching our volume
> +# regex.
> +my sub foreach_subvol : prototype($$) {
> +    my ($dir, $code) = @_;
> +
> +    dir_glob_foreach($dir, $BTRFS_VOL_REGEX, sub {
> +	my ($volume, $name, $snapshot) = ($1, $2, $3);
> +	return if !path_is_subvolume("$dir/$volume");
> +	$code->($volume, $name, $snapshot);
> +    })
> +}
> +
> +sub free_image {
> +    my ($class, $storeid, $scfg, $volname, $isBase, $_format) = @_;
> +
> +    my (undef, undef, $vmid, undef, undef, undef, $format) =
> +	$class->parse_volname($volname);
> +
> +    if ($format ne 'subvol' && $format ne 'raw') {
> +	return PVE::Storage::DirPlugin::free_image(@_);
> +    }
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +
> +    my $subvol = $path;
> +    if ($format eq 'raw') {
> +	$subvol = raw_file_to_subvol($path);
> +    }
> +
> +    my $dir = dirname($subvol);
> +    my @snapshot_vols;
> +    foreach_subvol($dir, sub {
> +	my ($volume, $name, $snapshot) = @_;
> +	return if !defined $snapshot;
> +	push @snapshot_vols, "$dir/$volume";
> +    });
> +
> +    $class->btrfs_cmd(['subvolume', 'delete', '--', @snapshot_vols, $subvol]);
> +    # try to cleanup directory to not clutter storage with empty $vmid dirs if
> +    # all images from a guest got deleted
> +    rmdir($dir);
> +
> +    return undef;
> +}
> +
> +# Currently not used because quotas clash with send/recv.
> +# my sub btrfs_subvol_quota {
> +#     my ($class, $path) = @_;
> +#     my $id = '0/' . $class->btrfs_get_subvol_id($path);
> +#     my $search = qr/^\Q$id\E\s+(\d)+\s+\d+\s+(\d+)\s*$/;
> +#     my ($used, $size);
> +#     $class->btrfs_cmd(['qgroup', 'show', '--raw', '-rf', '--', $path], sub {
> +# 	return if defined($size);
> +# 	if ($_[0] =~ $search) {
> +# 	    ($used, $size) = ($1, $2);
> +# 	}
> +#     });
> +#     if (!defined($size)) {
> +# 	# syslog should include more information:
> +# 	syslog('err', "failed to get subvolume size for: $path (id $id)");
> +# 	# UI should only see the last path component:
> +# 	$path =~ s|^.*/||;
> +# 	die "failed to get subvolume size for $path\n";
> +#     }
> +#     return wantarray ? ($used, $size) : $size;
> +# }
> +
> +sub volume_size_info {
> +    my ($class, $scfg, $storeid, $volname, $timeout) = @_;
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +
> +    my $format = ($class->parse_volname($volname))[6];
> +
> +    if ($format eq 'subvol') {
> +	my $ctime = (stat($path))[10];
> +	my ($used, $size) = (0, 0);
> +	#my ($used, $size) = btrfs_subvol_quota($class, $path); # uses wantarray
> +	return wantarray ? ($size, 'subvol', $used, undef, $ctime) : 1;
> +    }
> +
> +    return PVE::Storage::Plugin::file_size_info($path, $timeout);
> +}
> +
> +sub volume_resize {
> +    my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
> +
> +    my $format = ($class->parse_volname($volname))[6];
> +    if ($format eq 'subvol') {
> +	my $path = $class->filesystem_path($scfg, $volname);
> +	my $id = '0/' . $class->btrfs_get_subvol_id($path);
> +	$class->btrfs_cmd(['qgroup', 'limit', '--', "${size}k", "0/$id", $path]);
> +	return undef;
> +    }
> +
> +    return PVE::Storage::Plugin::volume_resize(@_);
> +}
> +
> +sub volume_snapshot {
> +    my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +
> +    my ($name, $vmid, $format) = ($class->parse_volname($volname))[1,2,6];
> +    if ($format ne 'subvol' && $format ne 'raw') {
> +	return PVE::Storage::Plugin::volume_snapshot(@_);
> +    }
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +    my $snap_path = $class->filesystem_path($scfg, $volname, $snap);
> +
> +    if ($format eq 'raw') {
> +	$path = raw_file_to_subvol($path);
> +	$snap_path = raw_file_to_subvol($snap_path);
> +    }
> +
> +    my $snapshot_dir = $class->get_subdir($scfg, 'images') . "/$vmid";
> +    mkpath $snapshot_dir;
> +
> +    $class->btrfs_cmd(['subvolume', 'snapshot', '-r', '--', $path, $snap_path]);
> +    return undef;
> +}
> +
> +sub volume_rollback_is_possible {
> +    my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +
> +    return 1;
> +}
> +
> +sub volume_snapshot_rollback {
> +    my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +
> +    my ($name, $format) = ($class->parse_volname($volname))[1,6];
> +
> +    if ($format ne 'subvol' && $format ne 'raw') {
> +	return PVE::Storage::Plugin::volume_snapshot_rollback(@_);
> +    }
> +
> +    my $path = $class->filesystem_path($scfg, $volname);
> +    my $snap_path = $class->filesystem_path($scfg, $volname, $snap);
> +
> +    if ($format eq 'raw') {
> +	$path = raw_file_to_subvol($path);
> +	$snap_path = raw_file_to_subvol($snap_path);
> +    }
> +
> +    # Simple version would be:
> +    #   rename old to temp
> +    #   create new
> +    #   on error rename temp back
> +    # But for atomicity in case the rename after create-failure *also* fails, we create the new
> +    # subvol first, then use RENAME_EXCHANGE,
> +    my $tmp_path = "$path.tmp.$$";
> +    $class->btrfs_cmd(['subvolume', 'snapshot', '--', $snap_path, $tmp_path]);
> +    # The paths are absolute, so pass -1 as file descriptors.
> +    my $ok = PVE::Tools::renameat2(-1, $tmp_path, -1, $path, &PVE::Tools::RENAME_EXCHANGE);
> +
> +    eval { $class->btrfs_cmd(['subvolume', 'delete', '--', $tmp_path]) };
> +    warn "failed to remove '$tmp_path' subvolume: $@" if $@;
> +
> +    if (!$ok) {
> +	die "failed to rotate '$tmp_path' into place at '$path' - $!\n";
> +    }
> +
> +    return undef;
> +}
> +
> +sub volume_snapshot_delete {
> +    my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
> +
> +    my ($name, $vmid, $format) = ($class->parse_volname($volname))[1,2,6];
> +
> +    if ($format ne 'subvol' && $format ne 'raw') {
> +	return PVE::Storage::Plugin::volume_snapshot_delete(@_);
> +    }
> +
> +    my $path = $class->filesystem_path($scfg, $volname, $snap);
> +
> +    if ($format eq 'raw') {
> +	$path = raw_file_to_subvol($path);
> +    }
> +
> +    $class->btrfs_cmd(['subvolume', 'delete', '--', $path]);
> +
> +    return undef;
> +}
> +
> +sub volume_has_feature {
> +    my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
> +
> +    my $features = {
> +	snapshot => {
> +	    current => { qcow2 => 1, raw => 1, subvol => 1 },
> +	    snap => { qcow2 => 1, raw => 1, subvol => 1 }
> +	},
> +	clone => {
> +	    base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> +	    current => { raw => 1 },
> +	    snap => { raw => 1 },
> +	},
> +	template => { current => { qcow2 => 1, raw => 1, vmdk => 1, subvol => 1 } },
> +	clone => {

should be 'copy'?

> +	    base => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> +	    current => { qcow2 => 1, raw => 1, subvol => 1, vmdk => 1 },
> +	    snap => { qcow2 => 1, raw => 1, subvol => 1 },
> +	},
> +	sparseinit => { base => {qcow2 => 1, raw => 1, vmdk => 1 },
> +			current => {qcow2 => 1, raw => 1, vmdk => 1 } },
> +    };
> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    my $key = undef;
> +    if ($snapname) {
> +        $key = 'snap';
> +    } else {
> +        $key =  $isBase ? 'base' : 'current';
> +    }
> +
> +    return 1 if defined($features->{$feature}->{$key}->{$format});
> +
> +    return undef;
> +}
> +
> +sub list_images {
> +    my ($class, $storeid, $scfg, $vmid, $vollist, $cache) = @_;
> +    my $imagedir = $class->get_subdir($scfg, 'images');
> +
> +    my $res = [];
> +
> +    # Copied from Plugin.pm, with file_size_info calls adapted:
> +    foreach my $fn (<$imagedir/[0-9][0-9]*/*>) {
> +	# different to in Plugin.pm the regex below also excludes '@' as valid file name
> +	next if $fn !~ m@^(/.+/(\d+)/([^/\@.]+(?:\.(qcow2|vmdk|subvol))?))$@;
> +	$fn = $1; # untaint
> +
> +	my $owner = $2;
> +	my $name = $3;
> +	my $ext = $4;
> +
> +	next if !$vollist && defined($vmid) && ($owner ne $vmid);
> +
> +	my $volid = "$storeid:$owner/$name";
> +	my ($size, $format, $used, $parent, $ctime);
> +
> +	if (!$ext) { # raw
> +	    $volid .= '.raw';
> +	    ($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info("$fn/disk.raw");
> +	} elsif ($ext eq 'subvol') {
> +	    ($used, $size) = (0, 0);
> +	    #($used, $size) = btrfs_subvol_quota($class, $fn);
> +	    $format = 'subvol';
> +	} else {
> +	    ($size, $format, $used, $parent, $ctime) = PVE::Storage::Plugin::file_size_info($fn);
> +	}
> +	next if !($format && defined($size));
> +
> +	if ($vollist) {
> +	    next if ! grep { $_ eq $volid } @$vollist;
> +	}
> +
> +	my $info = {
> +	    volid => $volid, format => $format,
> +	    size => $size, vmid => $owner, used => $used, parent => $parent,
> +	};
> +
> +        $info->{ctime} = $ctime if $ctime;
> +
> +        push @$res, $info;
> +    }
> +
> +    return $res;
> +}
> +
> +# For now we don't implement `btrfs send/recv` as it needs some updates to our import/export API
> +# first!
> +
> +sub volume_export_formats {
> +    return PVE::Storage::DirPlugin::volume_export_formats(@_);
> +}
> +
> +sub volume_export {
> +    return PVE::Storage::DirPlugin::volume_export(@_);
> +}
> +
> +sub volume_import_formats {
> +    return PVE::Storage::DirPlugin::volume_import_formats(@_);
> +}
> +
> +sub volume_import {
> +    return PVE::Storage::DirPlugin::volume_import(@_);
> +}
> +
> +1
> diff --git a/PVE/Storage/Makefile b/PVE/Storage/Makefile
> index 91b9238..857c485 100644
> --- a/PVE/Storage/Makefile
> +++ b/PVE/Storage/Makefile
> @@ -12,6 +12,7 @@ SOURCES= \
>   	ZFSPoolPlugin.pm \
>   	ZFSPlugin.pm \
>   	PBSPlugin.pm \
> +	BTRFSPlugin.pm \
>   	LvmThinPlugin.pm
>   
>   .PHONY: install
> 




^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] applied: [PATCH common] Syscalls/Tools: add renameat2
  2021-06-09 13:18 ` [pve-devel] [PATCH common] Syscalls/Tools: add renameat2 Wolfgang Bumiller
@ 2021-06-15 12:35   ` Thomas Lamprecht
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Lamprecht @ 2021-06-15 12:35 UTC (permalink / raw)
  To: Proxmox VE development discussion, Wolfgang Bumiller

On 09.06.21 15:18, Wolfgang Bumiller wrote:
> Mostly for the ability to atomically swap files.
> 
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
>  src/PVE/Syscall.pm |  1 +
>  src/PVE/Tools.pm   | 10 ++++++++++
>  2 files changed, 11 insertions(+)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] applied: [PATCH storage 1/4] fix find_free_disk_name invocations
  2021-06-09 13:18 ` [pve-devel] [PATCH storage 1/4] fix find_free_disk_name invocations Wolfgang Bumiller
@ 2021-06-15 12:36   ` Thomas Lamprecht
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Lamprecht @ 2021-06-15 12:36 UTC (permalink / raw)
  To: Proxmox VE development discussion, Wolfgang Bumiller

On 09.06.21 15:18, Wolfgang Bumiller wrote:
> The interface takes the storeid now, not the image dir.
> 
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
>  PVE/Storage/GlusterfsPlugin.pm | 4 ++--
>  PVE/Storage/Plugin.pm          | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-06-15 12:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 13:18 [pve-devel] [PATCH multiple] btrfs, file system for the brave Wolfgang Bumiller
2021-06-09 13:18 ` [pve-devel] [PATCH common] Syscalls/Tools: add renameat2 Wolfgang Bumiller
2021-06-15 12:35   ` [pve-devel] applied: " Thomas Lamprecht
2021-06-09 13:18 ` [pve-devel] [PATCH storage 1/4] fix find_free_disk_name invocations Wolfgang Bumiller
2021-06-15 12:36   ` [pve-devel] applied: " Thomas Lamprecht
2021-06-09 13:18 ` [pve-devel] [PATCH storage 2/4] add BTRFS storage plugin Wolfgang Bumiller
2021-06-10 12:40   ` Fabian Grünbichler
2021-06-10 12:59     ` Wolfgang Bumiller
2021-06-11 12:11   ` Fabian Ebner
2021-06-09 13:18 ` [pve-devel] [PATCH storage 3/4] update import/export storage API Wolfgang Bumiller
2021-06-10 12:30   ` Fabian Grünbichler
2021-06-09 13:18 ` [pve-devel] [PATCH storage 4/4] btrfs: add 'btrfs' import/export format Wolfgang Bumiller
2021-06-09 13:18 ` [pve-devel] [PATCH container 1/2] migration: fix snapshots boolean accounting Wolfgang Bumiller
2021-06-09 13:18 ` [pve-devel] [PATCH container 2/2] enable btrfs support via subvolumes Wolfgang Bumiller
2021-06-10 12:35   ` Fabian Grünbichler
2021-06-09 13:18 ` [pve-devel] [PATCH qemu-server] allow migrating raw btrfs volumes Wolfgang Bumiller
2021-06-10 12:35   ` Fabian Grünbichler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal