public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: "Robert Obkircher" <r.obkircher@proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
	pbs-devel@lists.proxmox.com
Subject: Re: [PATCH v1 proxmox-backup 11/11] tape: media catalog: use Flock wrapper instead of deprecated function
Date: Fri, 6 Mar 2026 09:25:33 +0100	[thread overview]
Message-ID: <64e8e02b-abaa-4739-ba49-5dfcf35500b0@proxmox.com> (raw)
In-Reply-To: <99147604-bffe-4c88-a5f7-a91f4e9fc821@proxmox.com>



On 3/5/26 3:38 PM, Robert Obkircher wrote:
> 
> On 3/5/26 14:13, Dominik Csapak wrote:
>>
>>
>> On 3/5/26 12:08 PM, Fabian Grünbichler wrote:
>>> On February 26, 2026 3:40 pm, Robert Obkircher wrote:
>>>> Switch to the new wrapper type to fix the deprecation warnings. Panic
>>>> if unlock fails, because there is no easy way to return the File that
>>>> was temporarily moved out of the field.
>>>>
>>>> An alternative to moving out of the optional field would be to use
>>>> `File::try_clone` but that isn't ideal either because duplicated file
>>>> descriptors share the same lock status. That also doesn't solve the
>>>> problem that propagating the error without leaking the locked file
>>>> descriptor requires unsafe code or a change to nix.
>>>>
>>>> [1]
>>>> https://lore.proxmox.com/pbs-devel/1763634535.5sqd1r4buu.astroid@yuna.none/
>>>>
>>>> Signed-off-by: Robert Obkircher <r.obkircher@proxmox.com>
>>>> ---
>>>>    src/tape/media_catalog.rs | 49
>>>> +++++++++++++++++++++------------------
>>>>    1 file changed, 27 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/src/tape/media_catalog.rs b/src/tape/media_catalog.rs
>>>> index 63329a65..372bf59d 100644
>>>> --- a/src/tape/media_catalog.rs
>>>> +++ b/src/tape/media_catalog.rs
>>>> @@ -6,6 +6,7 @@ use std::path::{Path, PathBuf};
>>>>      use anyhow::{bail, format_err, Error};
>>>>    use endian_trait::Endian;
>>>> +use nix::fcntl::{Flock, FlockArg};
>>>>      use proxmox_sys::fs::read_subdir;
>>>>    @@ -194,7 +195,7 @@ impl MediaCatalog {
>>>>            let me = proxmox_lang::try_block!({
>>>>                Self::create_basedir(base_path)?;
>>>>    -            let mut file = std::fs::OpenOptions::new()
>>>> +            let file = std::fs::OpenOptions::new()
>>>>                    .read(true)
>>>>                    .write(write)
>>>>                    .create(create)
>>>> @@ -219,9 +220,10 @@ impl MediaCatalog {
>>>>                };
>>>>                  // Note: lock file, to get a consistent view with
>>>> load_catalog
>>>> -            nix::fcntl::flock(file.as_raw_fd(),
>>>> nix::fcntl::FlockArg::LockExclusive)?;
>>>> +            let mut file = Flock::lock(file,
>>>> FlockArg::LockExclusive)
>>>> +                .map_err(|e| format_err!("flock failed: {e:?}"))?;
>>>>                let result = me.load_catalog(&mut file,
>>>> media_id.media_set_label.as_ref());
>>>> -            nix::fcntl::flock(file.as_raw_fd(),
>>>> nix::fcntl::FlockArg::Unlock)?;
>>>> +            let file = file.unlock().expect("shouldn't fail for
>>>> valid fd");
>>>
>>> here we could just return an error instead of panicing? we only read
>>> from the file anyway, and bubbling up the error should be fine..
>>
>> full agree, we bubbled up the error before, so we should keep that
>> behavior
> Naively returning the error would likely panic when the Flock wrapper
> is dropped, and we also can't just mem::forget it, because that would
> leak the File (unless we retrieve it with unsafe ptr::read).
> 
> Should I try to send a PR to nix with a `Flock::unwrap(self) -> T`
> method that could be used in map_err? (Changing unlock to return T
> instead of Self might be an alternative, but that would be weird if
> errno is EINTR...)


well since it can return EINTR anyways, we have to provide some kind of
wrapper that retries in case of EWOULDBLOCK/EINTR anyway?

i get that bubbling up the error would not achieve anything, but
as fabian said, crashing the process (and potentially poisoning held
mutex locks is far from ideal...

> 
> It would also be useful if they could add an impl Flockable for &mut
> File. I think that should be safe.
>>
>>>
>>>>                  let (found_magic_number, _) = result?;
>>>>    @@ -367,27 +369,30 @@ impl MediaCatalog {
>>>>                return Ok(());
>>>>            }
>>>>    -        match self.file {
>>>> -            Some(ref mut file) => {
>>>> -                let pending = &self.pending;
>>>> -                // Note: lock file, to get a consistent view with
>>>> load_catalog
>>>> -                nix::fcntl::flock(file.as_raw_fd(),
>>>> nix::fcntl::FlockArg::LockExclusive)?;
>>>> -                let result: Result<(), Error> =
>>>> proxmox_lang::try_block!({
>>>> -                    file.write_all(pending)?;
>>>> -                    file.flush()?;
>>>> -                    file.sync_data()?;
>>>> -                    Ok(())
>>>> -                });
>>>> -                nix::fcntl::flock(file.as_raw_fd(),
>>>> nix::fcntl::FlockArg::Unlock)?;
>>>> -
>>>> -                result?;
>>>> -            }
>>>> -            None => bail!("media catalog not writable (opened
>>>> read only)"),
>>>> -        }
>>>> +        let Some(file) = self.file.take() else {
>>>> +            bail!("media catalog not writable (opened read only)");
>>>> +        };
>>>> +
>>>> +        let mut file = Flock::lock(file,
>>>> FlockArg::LockExclusive).map_err(|(f, e)| {
>>>> +            self.file = Some(f);
>>>> +            format_err!("flock failed: {:?} - {e:?}", self.file)
>>>> +        })?;
>>>>    -        self.pending = Vec::new();
>>>> +        let pending = &self.pending;
>>>> +        // Note: lock file, to get a consistent view with
>>>> load_catalog
>>>> +        let result: Result<(), Error> = proxmox_lang::try_block!({
>>>> +            file.write_all(pending)?;
>>>> +            file.flush()?;
>>>> +            file.sync_data()?;
>>>> +            Ok(())
>>>> +        });
>>>>    -        Ok(())
>>>> +        self.file = Some(file.unlock().expect("shouldn't fail for
>>>> valid fd"));
>>>
>>> here the situation is a bit more complicated, since we wrote to the
>>> file
>>> above.. but looking at the call graphs/error behaviour, it seems to me
>>> that we are often operating on temporary files (though we do not
>>> seem to
>>> clean them up in the error path?), or failure to commit herer would
>>> bubble up and abort writing to tape anyway..
>>>
>>> triggering a panic here doesn't help in any case - the possibly
>>> inconsistent writes/reads have already happened, at best we could
>>> maybe
>>> (try to) invalidate the catalog as part of error handling to force
>>> re-reading it from tape? or we could bubble up the error to abort
>>> whatever operation is attempting to commit, and log the error
>>> prominently - taking down the whole PBS instance doesn't buy us
>>> anything
>>> AFAICT?
> Theoretically it buys us slightly better protection against deadlocks,
> but it doesn't look like the callers are doing anything potentially
> dangerous like holding on to the file descriptor across await points.
> (Leaving self.file as None would also prevent that, but it would be a
> change of behavior as well.)
> 
>>
>> also here agreeing regarding panicking, we did simply bubble up the
>> error before, no reason to change that IMO.
>>
>> looking at the code  i think we should maybe revamp the catalog
>> handling?
>>
>> out of scope for this series, but I think we should try to write the
>> catalog atomically, (meaning we should always writing into a tmp
>> file and renaming into place, like we do with most of the other
>> files we
>> write)
>>
>> then I think we'd need no lock for reading the catalog at all,
>> but we'd have to copy the old catalog to a tmp one first when
>> we want to append data. (this might be a bad idea since
>> they can get quite large)
> Not sure if it is relevant, but copying wouldn't be enough if there
> were multiple writers.

how so?

only one should be able to write during the lock, so the file
is always consistent?

>>
>> but as i wrote, not really relevant for this series, just
>> wanted it to point out
>>
>>>
>>>> +
>>>> +        if result.is_ok() {
>>>> +            self.pending = Vec::new();
>>>> +        }
>>>> +        result
>>>>        }
>>>>          /// Conditionally commit if in pending data is large (> 1Mb)
>>>> -- 
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>





  reply	other threads:[~2026-03-06  8:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-26 14:40 [PATCH v1 proxmox-backup 00/11] fix various warnings Robert Obkircher
2026-02-26 14:40 ` [PATCH v1 proxmox-backup 01/11] datastore: remove allow(clippy::cast_ptr_alignment) attribute Robert Obkircher
2026-02-26 14:40 ` [PATCH v1 proxmox-backup 02/11] api: use checked_div for compression ratio calculation Robert Obkircher
2026-02-26 14:40 ` [PATCH v1 proxmox-backup 03/11] datastore+server: sort by key instead of using comparison functions Robert Obkircher
2026-02-26 14:40 ` [PATCH v1 proxmox-backup 04/11] api: remove unnecessary ampersand to fix clippy warning Robert Obkircher
2026-02-26 14:40 ` [PATCH v1 proxmox-backup 05/11] bin: debug: use pattern match instead of is_some+unwrap Robert Obkircher
2026-02-26 14:40 ` [PATCH v1 proxmox-backup 06/11] bin: proxy: fix clippy warning about unnecesary use of find_map Robert Obkircher
2026-02-26 14:40 ` [PATCH v1 proxmox-backup 07/11] datastore: silence warning about too many arguments Robert Obkircher
2026-02-26 14:40 ` [PATCH v1 proxmox-backup 08/11] client: catalog shell: avoid unnecessary block_on in async code Robert Obkircher
2026-02-26 14:40 ` [PATCH v1 proxmox-backup 09/11] client: catalog shell: combine multiple block_on calls into one Robert Obkircher
2026-02-26 14:40 ` [PATCH v1 proxmox-backup 10/11] client: catalog shell: avoid unsafe transmute Robert Obkircher
2026-02-26 14:40 ` [PATCH v1 proxmox-backup 11/11] tape: media catalog: use Flock wrapper instead of deprecated function Robert Obkircher
2026-03-05 11:08   ` Fabian Grünbichler
2026-03-05 13:13     ` Dominik Csapak
2026-03-05 14:39       ` Robert Obkircher
2026-03-06  8:25         ` Dominik Csapak [this message]
2026-03-05 11:20 ` partially-applied: [PATCH v1 proxmox-backup 00/11] fix various warnings Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64e8e02b-abaa-4739-ba49-5dfcf35500b0@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    --cc=r.obkircher@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal