public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Stefan Reiter <s.reiter@proxmox.com>
To: Fabian Ebner <f.ebner@proxmox.com>, pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [PATCH proxmox-backup 4/7] prune: also check backup snapshot locks
Date: Wed, 5 Aug 2020 10:34:57 +0200	[thread overview]
Message-ID: <3c2f9662-18c7-f6c8-d27a-315a7d095e27@proxmox.com> (raw)
In-Reply-To: <ddd46801-923d-93a8-3319-72eb867e3d38@proxmox.com>

On 8/5/20 9:23 AM, Fabian Ebner wrote:
> Am 04.08.20 um 12:42 schrieb Stefan Reiter:
>> ...to avoid pruning running backups or backups used as base for others.
>>
>> Unfinished backups that have no lock are considered broken and will
>> always be pruned.
>>
>> The ignore_locks parameter is used for testing, since flocks are not
>> available in test environment.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> ---
>>
>> This one I'm a bit scared of: The prune logic is astonishingly 
>> confusing IMHO,
>> but I hope I worked out how this makes the most sense.
>>
>> I left my long-form comments in, even though they might seem a bit 
>> excessive -
>> they helped me make light of the situation, and might help in the 
>> future as
>> well. But if it's too much, feel free to shorten/remove them.
>>
>>   src/api2/admin/datastore.rs     |  2 +-
>>   src/backup/prune.rs             | 48 +++++++++++++++++++++++----------
>>   src/bin/proxmox-backup-proxy.rs |  2 +-
>>   tests/prune.rs                  |  6 +++--
>>   4 files changed, 40 insertions(+), 18 deletions(-)
>>
>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>> index 97fd9093..6194c9c7 100644
>> --- a/src/api2/admin/datastore.rs
>> +++ b/src/api2/admin/datastore.rs
>> @@ -621,7 +621,7 @@ fn prune(
>>       let list = group.list_backups(&datastore.base_path())?;
>> -    let mut prune_info = compute_prune_info(list, &prune_options)?;
>> +    let mut prune_info = compute_prune_info(list, &prune_options, 
>> false)?;
>>       prune_info.reverse(); // delete older snapshots first
>> diff --git a/src/backup/prune.rs b/src/backup/prune.rs
>> index f7a87c5a..1bc5b9e8 100644
>> --- a/src/backup/prune.rs
>> +++ b/src/backup/prune.rs
>> @@ -45,26 +45,45 @@ fn mark_selections<F: Fn(DateTime<Local>, 
>> &BackupInfo) -> String> (
>>       }
>>   }
>> -fn remove_incomplete_snapshots(
>> +fn mark_incomplete_and_in_use_snapshots(
>>       mark: &mut HashMap<PathBuf, PruneMark>,
>>       list: &Vec<BackupInfo>,
>> +    ignore_locks: bool,
>>   ) {
>> -    let mut keep_unfinished = true;
>> +    let mut first_finished = true;
>>       for info in list.iter() {
>> -        // backup is considered unfinished if there is no manifest
>> -        if info.is_finished() {
>> -            // There is a new finished backup, so there is no need
>> -            // to keep older unfinished backups.
>> -            keep_unfinished = false;
>> -        } else {
>> -            let backup_id = info.backup_dir.relative_path();
>> -            if keep_unfinished { // keep first unfinished
>> -                mark.insert(backup_id, PruneMark::KeepPartial);
>> -            } else {
>> +        let backup_id = info.backup_dir.relative_path();
>> +        let finished = info.is_finished();
>> +        if ignore_locks || info.lock().is_ok() {
>> +            if !finished {
>> +                // incomplete backup, but we can lock it, so it's not 
>> running -
>> +                // always remove to clean up broken backups
>>                   mark.insert(backup_id, PruneMark::Remove);
>>               }
>> -            keep_unfinished = false;
>> +            // if locking succeeds and the backup is complete, let 
>> the regular
>> +            // marking logic figure it out
>> +            // note that we don't need to keep any locks either - any 
>> newly
>> +            // started backup can only ever use the latest finished 
>> backup as
>> +            // base, which is kept anyway (since we always keep the 
>> latest, and
>> +            // this function already filters out any unfinished ones)
>> +        } else {
>> +            if finished {
>> +                // if the backup is finished but locking fails, we 
>> have to keep
>> +                // it, however, if it's the latest finished one, it 
>> will be kept
>> +                // anyway, so don't mark it at all
>> +                if !first_finished {
>> +                    mark.insert(backup_id, PruneMark::Keep);
>> +                }
>> +            } else {
>> +                // ...if it's incomplete, and locking fails, that 
>> probably means
>> +                // it's currently running, so keep it, but do not 
>> include it in
>> +                // mark computation
>> +                mark.insert(backup_id, PruneMark::KeepPartial);
>> +            };
>> +        }
>> +        if first_finished && finished {
>> +            first_finished = false;
>>           }
>>       }
>>   }
> 
> Is there any downside to treating all finished backups where the lock 
> can't be acquired the same way? Not treating first_finished differently
> might save a few lines.
> 

I did it exactly for this reason, so the last backup does in fact count 
towards the kept backups, since it's guaranteed to be newest in it's 
selection (month, week, etc.), however...

> Another thing is, by marking finished backups where we cannot acquire 
> the lock as Keep, they will count towards the covered time periods in 
> the Prune selection. This might cause the following, IMHO unwanted, 
> behavior:
> Say there's two finished backups A and B for a given month (and they're 
> not the first finished one). A is about to be removed with forget. While 
> A is being removed, the prune operation starts, say with --keep-monthly, 
> and it cannot acquire the lock on A. But the prune selection considers 
> the month to be already covered, because A is marked as Keep, so it will 
> mark B with Remove. In the end there is no backup for that month remaining.
> 

...I didn't consider a lock conflict with forget instead of backup, so I 
suppose that is indeed wrong.

> Marking these backups with KeepPartial would work around the issue, 
> because those are not counted towards covering time periods. Of course 
> the name doesn't really fit, maybe we should introduce an 
> Ignore/Protected mark?
> 

That still doesn't make the behaviour clearer: If a backup is running, 
the first *three* snapshots will now never be pruned (first two are 
locked and the third is treated as the first, so always kept). At least 
it's more correct I guess, and I don't really see an easier way to deal 
with the 'forget' race.

AFAICT this is the only place we use KeepPartial anyway, so we could 
just rename it. We should probably also introduce a RemoveBroken (or 
similar) to distinguish the reasons why a backup is removed...

> I haven't thought too much about it yet, but what about selecting 
> normally and only removing the one's where we have the locks and warn 
> about the others?
> 

...though this only makes sense if we update the GUI to show the 
reasons, as well as any warnings like the one you mentioned, which 
probably makes sense to do in a seperate series.

>> @@ -173,13 +192,14 @@ impl PruneOptions {
>>   pub fn compute_prune_info(
>>       mut list: Vec<BackupInfo>,
>>       options: &PruneOptions,
>> +    ignore_locks: bool,
>>   ) -> Result<Vec<(BackupInfo, bool)>, Error> {
>>       let mut mark = HashMap::new();
>>       BackupInfo::sort_list(&mut list, false);
>> -    remove_incomplete_snapshots(&mut mark, &list);
>> +    mark_incomplete_and_in_use_snapshots(&mut mark, &list, 
>> ignore_locks);
>>       if let Some(keep_last) = options.keep_last {
>>           mark_selections(&mut mark, &list, keep_last as usize, 
>> |_local_time, info| {
>> diff --git a/src/bin/proxmox-backup-proxy.rs 
>> b/src/bin/proxmox-backup-proxy.rs
>> index dc0d16d2..694e4b60 100644
>> --- a/src/bin/proxmox-backup-proxy.rs
>> +++ b/src/bin/proxmox-backup-proxy.rs
>> @@ -441,7 +441,7 @@ async fn schedule_datastore_prune() {
>>                   let groups = BackupGroup::list_groups(&base_path)?;
>>                   for group in groups {
>>                       let list = group.list_backups(&base_path)?;
>> -                    let mut prune_info = compute_prune_info(list, 
>> &prune_options)?;
>> +                    let mut prune_info = compute_prune_info(list, 
>> &prune_options, false)?;
>>                       prune_info.reverse(); // delete older snapshots 
>> first
>>                       worker.log(format!("Starting prune on store 
>> \"{}\" group \"{}/{}\"",
>> diff --git a/tests/prune.rs b/tests/prune.rs
>> index d9758ea7..923a94fd 100644
>> --- a/tests/prune.rs
>> +++ b/tests/prune.rs
>> @@ -9,7 +9,8 @@ fn get_prune_list(
>>       options: &PruneOptions,
>>   ) -> Vec<PathBuf> {
>> -    let mut prune_info = compute_prune_info(list, options).unwrap();
>> +    // ignore locks, would always fail in test environment
>> +    let mut prune_info = compute_prune_info(list, options, 
>> true).unwrap();
>>       prune_info.reverse();
>> @@ -38,7 +39,8 @@ fn create_info(
>>           files.push(String::from(MANIFEST_BLOB_NAME));
>>       }
>> -    BackupInfo { backup_dir, files }
>> +    // note: base_path is only used for locking, which we ignore here 
>> anyway
>> +    BackupInfo { backup_dir, files, base_path: 
>> PathBuf::from("/tmp/pbs-test") }
>>   }
>>   #[test]
>>




  reply	other threads:[~2020-08-05  8:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-04 10:41 [pbs-devel] [PATCH 0/7] More flocking and race elimination Stefan Reiter
2020-08-04 10:41 ` [pbs-devel] [PATCH proxmox-backup 1/7] finish_backup: mark backup as finished only after checks have passed Stefan Reiter
2020-08-06  4:39   ` [pbs-devel] applied: " Dietmar Maurer
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 2/7] backup: only allow finished backups as base snapshot Stefan Reiter
2020-08-06  4:45   ` Dietmar Maurer
2020-08-06  7:58     ` Stefan Reiter
2020-08-07  5:40       ` [pbs-devel] applied: " Dietmar Maurer
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 3/7] datastore: prevent in-use deletion with locks instead of heuristic Stefan Reiter
2020-08-06  4:51   ` Dietmar Maurer
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 4/7] prune: also check backup snapshot locks Stefan Reiter
2020-08-05  7:23   ` Fabian Ebner
2020-08-05  8:34     ` Stefan Reiter [this message]
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 5/7] backup: flock snapshot on backup start Stefan Reiter
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 6/7] Revert "backup: ensure base snapshots are still available after backup" Stefan Reiter
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 7/7] backup: lock base snapshot and ensure existance on finish Stefan Reiter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c2f9662-18c7-f6c8-d27a-315a7d095e27@proxmox.com \
    --to=s.reiter@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal