all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Stefan Reiter <s.reiter@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH v2 proxmox-backup 4/4] rustdoc: overhaul backup rustdoc and add locking table
Date: Thu, 15 Oct 2020 12:49:16 +0200	[thread overview]
Message-ID: <20201015104916.21170-5-s.reiter@proxmox.com> (raw)
In-Reply-To: <20201015104916.21170-1-s.reiter@proxmox.com>

Rewrite most of the documentation to be more readable and correct
(according to the current implementations).

Add a table visualizing all different locks used to synchronize
concurrent operations.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

FYI: I used https://www.tablesgenerator.com/markdown_tables for the table

v2:
* Update table to reflect update_manifest changes

 src/backup.rs | 199 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 119 insertions(+), 80 deletions(-)

diff --git a/src/backup.rs b/src/backup.rs
index 1b2180bc..577fdf40 100644
--- a/src/backup.rs
+++ b/src/backup.rs
@@ -1,107 +1,146 @@
-//! This module implements the proxmox backup data storage
+//! This module implements the data storage and access layer.
 //!
-//! Proxmox backup splits large files into chunks, and stores them
-//! deduplicated using a content addressable storage format.
+//! # Data formats
 //!
-//! A chunk is simply defined as binary blob, which is stored inside a
-//! `ChunkStore`, addressed by the SHA256 digest of the binary blob.
+//! PBS splits large files into chunks, and stores them deduplicated using
+//! a content addressable storage format.
 //!
-//! Index files are used to reconstruct the original file. They
-//! basically contain a list of SHA256 checksums. The `DynamicIndex*`
-//! format is able to deal with dynamic chunk sizes, whereas the
-//! `FixedIndex*` format is an optimization to store a list of equal
-//! sized chunks.
+//! Backup snapshots are stored as folders containing a manifest file and
+//! potentially one or more index or blob files.
 //!
-//! # ChunkStore Locking
+//! The manifest contains hashes of all other files and can be signed by
+//! the client.
 //!
-//! We need to be able to restart the proxmox-backup service daemons,
-//! so that we can update the software without rebooting the host. But
-//! such restarts must not abort running backup jobs, so we need to
-//! keep the old service running until those jobs are finished. This
-//! implies that we need some kind of locking for the
-//! ChunkStore. Please note that it is perfectly valid to have
-//! multiple parallel ChunkStore writers, even when they write the
-//! same chunk (because the chunk would have the same name and the
-//! same data). The only real problem is garbage collection, because
-//! we need to avoid deleting chunks which are still referenced.
+//! Blob files contain data directly. They are used for config files and
+//! the like.
 //!
-//! * Read Index Files:
+//! Index files are used to reconstruct an original file. They contain a
+//! list of SHA256 checksums. The `DynamicIndex*` format is able to deal
+//! with dynamic chunk sizes (CT and host backups), whereas the
+//! `FixedIndex*` format is an optimization to store a list of equal sized
+//! chunks (VMs, whole block devices).
 //!
-//!   Acquire shared lock for .idx files.
-//!
-//!
-//! * Delete Index Files:
-//!
-//!   Acquire exclusive lock for .idx files. This makes sure that we do
-//!   not delete index files while they are still in use.
-//!
-//!
-//! * Create Index Files:
-//!
-//!   Acquire shared lock for ChunkStore (process wide).
-//!
-//!   Note: When creating .idx files, we create temporary a (.tmp) file,
-//!   then do an atomic rename ...
-//!
-//!
-//! * Garbage Collect:
-//!
-//!   Acquire exclusive lock for ChunkStore (process wide). If we have
-//!   already a shared lock for the ChunkStore, try to upgrade that
-//!   lock.
-//!
-//!
-//! * Server Restart
-//!
-//!   Try to abort the running garbage collection to release exclusive
-//!   ChunkStore locks ASAP. Start the new service with the existing listening
-//!   socket.
+//! A chunk is defined as a binary blob, which is stored inside a
+//! [ChunkStore](struct.ChunkStore.html) instead of the backup directory
+//! directly, and can be addressed by its SHA256 digest.
 //!
 //!
 //! # Garbage Collection (GC)
 //!
-//! Deleting backups is as easy as deleting the corresponding .idx
-//! files. Unfortunately, this does not free up any storage, because
-//! those files just contain references to chunks.
+//! Deleting backups is as easy as deleting the corresponding .idx files.
+//! However, this does not free up any storage, because those files just
+//! contain references to chunks.
 //!
 //! To free up some storage, we run a garbage collection process at
-//! regular intervals. The collector uses a mark and sweep
-//! approach. In the first phase, it scans all .idx files to mark used
-//! chunks. The second phase then removes all unmarked chunks from the
-//! store.
+//! regular intervals. The collector uses a mark and sweep approach. In
+//! the first phase, it scans all .idx files to mark used chunks. The
+//! second phase then removes all unmarked chunks from the store.
 //!
-//! The above locking mechanism makes sure that we are the only
-//! process running GC. But we still want to be able to create backups
-//! during GC, so there may be multiple backup threads/tasks
-//! running. Either started before GC started, or started while GC is
-//! running.
+//! The locking mechanisms mentioned below make sure that we are the only
+//! process running GC. We still want to be able to create backups during
+//! GC, so there may be multiple backup threads/tasks running, either
+//! started before GC, or while GC is running.
 //!
 //! ## `atime` based GC
 //!
 //! The idea here is to mark chunks by updating the `atime` (access
-//! timestamp) on the chunk file. This is quite simple and does not
-//! need additional RAM.
+//! timestamp) on the chunk file. This is quite simple and does not need
+//! additional RAM.
 //!
 //! One minor problem is that recent Linux versions use the `relatime`
-//! mount flag by default for performance reasons (yes, we want
-//! that). When enabled, `atime` data is written to the disk only if
-//! the file has been modified since the `atime` data was last updated
-//! (`mtime`), or if the file was last accessed more than a certain
-//! amount of time ago (by default 24h). So we may only delete chunks
-//! with `atime` older than 24 hours.
-//!
-//! Another problem arises from running backups. The mark phase does
-//! not find any chunks from those backups, because there is no .idx
-//! file for them (created after the backup). Chunks created or
-//! touched by those backups may have an `atime` as old as the start
-//! time of those backups. Please note that the backup start time may
-//! predate the GC start time. So we may only delete chunks older than
-//! the start time of those running backup jobs.
+//! mount flag by default for performance reasons (and we want that). When
+//! enabled, `atime` data is written to the disk only if the file has been
+//! modified since the `atime` data was last updated (`mtime`), or if the
+//! file was last accessed more than a certain amount of time ago (by
+//! default 24h). So we may only delete chunks with `atime` older than 24
+//! hours.
 //!
+//! Another problem arises from running backups. The mark phase does not
+//! find any chunks from those backups, because there is no .idx file for
+//! them (created after the backup). Chunks created or touched by those
+//! backups may have an `atime` as old as the start time of those backups.
+//! Please note that the backup start time may predate the GC start time.
+//! So we may only delete chunks older than the start time of those
+//! running backup jobs, which might be more than 24h back (this is the
+//! reason why ProcessLocker exclusive locks only have to be exclusive
+//! between processes, since within one we can determine the age of the
+//! oldest shared lock).
 //!
 //! ## Store `marks` in RAM using a HASH
 //!
-//! Not sure if this is better. TODO
+//! Might be better. Under investigation.
+//!
+//!
+//! # Locking
+//!
+//! Since PBS allows multiple potentially interfering operations at the
+//! same time (e.g. garbage collect, prune, multiple backup creations
+//! (only in seperate groups), forget, ...), these need to lock against
+//! each other in certain scenarios. There is no overarching global lock
+//! though, instead always the finest grained lock possible is used,
+//! because running these operations concurrently is treated as a feature
+//! on its own.
+//!
+//! ## Inter-process Locking
+//!
+//! We need to be able to restart the proxmox-backup service daemons, so
+//! that we can update the software without rebooting the host. But such
+//! restarts must not abort running backup jobs, so we need to keep the
+//! old service running until those jobs are finished. This implies that
+//! we need some kind of locking for modifying chunks and indices in the
+//! ChunkStore.
+//!
+//! Please note that it is perfectly valid to have multiple
+//! parallel ChunkStore writers, even when they write the same chunk
+//! (because the chunk would have the same name and the same data, and
+//! writes are completed atomically via a rename). The only problem is
+//! garbage collection, because we need to avoid deleting chunks which are
+//! still referenced.
+//!
+//! To do this we use the
+//! [ProcessLocker](../tools/struct.ProcessLocker.html).
+//!
+//! ### ChunkStore-wide
+//!
+//! * Create Index Files:
+//!
+//!   Acquire shared lock for ChunkStore.
+//!
+//!   Note: When creating .idx files, we create a temporary .tmp file,
+//!   then do an atomic rename.
+//!
+//! * Garbage Collect:
+//!
+//!   Acquire exclusive lock for ChunkStore. If we have
+//!   already a shared lock for the ChunkStore, try to upgrade that
+//!   lock.
+//!
+//! Exclusive locks only work _between processes_. It is valid to have an
+//! exclusive and one or more shared locks held within one process. Writing
+//! chunks within one process is synchronized using the gc_mutex.
+//!
+//! On server restart, we stop any running GC in the old process to avoid
+//! having the exclusive lock held for too long.
+//!
+//! ## Locking table
+//!
+//! Below table shows all operations that play a role in locking, and which
+//! mechanisms are used to make their concurrent usage safe.
+//!
+//! | starting ><br>v during | read index file | create index file | GC mark | GC sweep | update manifest | forget | prune | create backup | verify | reader api |
+//! |-|-|-|-|-|-|-|-|-|-|-|
+//! | **read index file** | / | / | / | / | / | mmap stays valid, oldest_shared_lock prevents GC | see forget column | / | / | / |
+//! | **create index file** | / | / | / | / | / | / | / | /, happens at the end, after all chunks are touched | /, only happens without a manifest | / |
+//! | **GC mark** | / | Datastore process-lock shared | gc_mutex, exclusive ProcessLocker | gc_mutex | /, GC only cares about index files, not manifests | tells GC about removed chunks | see forget column | /, index files don’t exist yet | / | / |
+//! | **GC sweep** | / | Datastore process-lock shared | gc_mutex, exclusive ProcessLocker | gc_mutex | / | /, chunks already marked | see forget column | chunks get touched; chunk_store.mutex; oldest PL lock | / | / |
+//! | **update manifest** | / | / | / | / | update_manifest lock | update_manifest lock, remove dir under lock | see forget column | /, “write manifest” happens at the end | /, can call “write manifest”, see that column | / |
+//! | **forget** | / | / | removed_during_gc mutex is held during unlink | marking done, doesn’t matter if forgotten now | update_manifest lock, forget waits for lock | /, unlink is atomic | causes forget to fail, but that’s OK | running backup has snapshot flock | /, potentially detects missing folder | shared snap flock |
+//! | **prune** | / | / | see forget row | see forget row | see forget row | causes warn in prune, but no error | see forget column | running and last non-running can’t be pruned | see forget row | shared snap flock |
+//! | **create backup** | / | only time this happens, thus has snapshot flock | / | chunks get touched; chunk_store.mutex; oldest PL lock | no lock, but cannot exist beforehand | snapshot flock, can’t be forgotten | running and last non-running can’t be pruned | snapshot group flock, only one running per group | /, won’t be verified since manifest missing | / |
+//! | **verify** | / | / | / | / | see “update manifest” row | /, potentially detects missing folder | see forget column | / | /, but useless (“update manifest” protects itself) | / |
+//! | **reader api** | / | / | / | /, open snap can’t be forgotten, so ref must exist | / | prevented by shared snap flock | prevented by shared snap flock | / | / | /, lock is shared |!
+//! * / = no interaction
+//! * shared/exclusive from POV of 'starting' process
 
 use anyhow::{bail, Error};
 
-- 
2.20.1





  parent reply	other threads:[~2020-10-15 10:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-15 10:49 [pbs-devel] [PATCH v2 0/4] Locking and rustdoc improvements Stefan Reiter
2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] gc: avoid race between phase1 and forget/prune Stefan Reiter
2020-10-16  6:26   ` Dietmar Maurer
2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add manifest locking Stefan Reiter
2020-10-16  6:33   ` Dietmar Maurer
2020-10-16  7:37     ` Dietmar Maurer
2020-10-16  7:39   ` [pbs-devel] applied: " Dietmar Maurer
2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 3/4] rustdoc: add crate level doc Stefan Reiter
2020-10-16  7:47   ` [pbs-devel] applied: " Dietmar Maurer
2020-10-15 10:49 ` Stefan Reiter [this message]
2020-10-16  7:47   ` [pbs-devel] applied: [PATCH v2 proxmox-backup 4/4] rustdoc: overhaul backup rustdoc and add locking table Dietmar Maurer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201015104916.21170-5-s.reiter@proxmox.com \
    --to=s.reiter@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal