From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 7AB4E1FF13C for ; Thu, 05 Mar 2026 14:12:47 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 08A498C03; Thu, 5 Mar 2026 14:13:53 +0100 (CET) Message-ID: <54a5a607-215c-44e8-bbb8-4f7219412934@proxmox.com> Date: Thu, 5 Mar 2026 14:13:47 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [PATCH v1 proxmox-backup 11/11] tape: media catalog: use Flock wrapper instead of deprecated function To: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= , pbs-devel@lists.proxmox.com, Robert Obkircher References: <20260226144033.211039-1-r.obkircher@proxmox.com> <20260226144033.211039-12-r.obkircher@proxmox.com> <1772707888.l459trzwfo.astroid@yuna.none> Content-Language: en-US From: Dominik Csapak In-Reply-To: <1772707888.l459trzwfo.astroid@yuna.none> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1772716401054 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.016 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.018 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.703 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 1.386 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: 4ZUUA7KZOKAWSTHSRKQIK3B5L4JYJ44O X-Message-ID-Hash: 4ZUUA7KZOKAWSTHSRKQIK3B5L4JYJ44O X-MailFrom: d.csapak@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 3/5/26 12:08 PM, Fabian Grünbichler wrote: > On February 26, 2026 3:40 pm, Robert Obkircher wrote: >> Switch to the new wrapper type to fix the deprecation warnings. Panic >> if unlock fails, because there is no easy way to return the File that >> was temporarily moved out of the field. >> >> An alternative to moving out of the optional field would be to use >> `File::try_clone` but that isn't ideal either because duplicated file >> descriptors share the same lock status. That also doesn't solve the >> problem that propagating the error without leaking the locked file >> descriptor requires unsafe code or a change to nix. >> >> [1] https://lore.proxmox.com/pbs-devel/1763634535.5sqd1r4buu.astroid@yuna.none/ >> >> Signed-off-by: Robert Obkircher >> --- >> src/tape/media_catalog.rs | 49 +++++++++++++++++++++------------------ >> 1 file changed, 27 insertions(+), 22 deletions(-) >> >> diff --git a/src/tape/media_catalog.rs b/src/tape/media_catalog.rs >> index 63329a65..372bf59d 100644 >> --- a/src/tape/media_catalog.rs >> +++ b/src/tape/media_catalog.rs >> @@ -6,6 +6,7 @@ use std::path::{Path, PathBuf}; >> >> use anyhow::{bail, format_err, Error}; >> use endian_trait::Endian; >> +use nix::fcntl::{Flock, FlockArg}; >> >> use proxmox_sys::fs::read_subdir; >> >> @@ -194,7 +195,7 @@ impl MediaCatalog { >> let me = proxmox_lang::try_block!({ >> Self::create_basedir(base_path)?; >> >> - let mut file = std::fs::OpenOptions::new() >> + let file = std::fs::OpenOptions::new() >> .read(true) >> .write(write) >> .create(create) >> @@ -219,9 +220,10 @@ impl MediaCatalog { >> }; >> >> // Note: lock file, to get a consistent view with load_catalog >> - nix::fcntl::flock(file.as_raw_fd(), nix::fcntl::FlockArg::LockExclusive)?; >> + let mut file = Flock::lock(file, FlockArg::LockExclusive) >> + .map_err(|e| format_err!("flock failed: {e:?}"))?; >> let result = me.load_catalog(&mut file, media_id.media_set_label.as_ref()); >> - nix::fcntl::flock(file.as_raw_fd(), nix::fcntl::FlockArg::Unlock)?; >> + let file = file.unlock().expect("shouldn't fail for valid fd"); > > here we could just return an error instead of panicing? we only read > from the file anyway, and bubbling up the error should be fine.. full agree, we bubbled up the error before, so we should keep that behavior > >> >> let (found_magic_number, _) = result?; >> >> @@ -367,27 +369,30 @@ impl MediaCatalog { >> return Ok(()); >> } >> >> - match self.file { >> - Some(ref mut file) => { >> - let pending = &self.pending; >> - // Note: lock file, to get a consistent view with load_catalog >> - nix::fcntl::flock(file.as_raw_fd(), nix::fcntl::FlockArg::LockExclusive)?; >> - let result: Result<(), Error> = proxmox_lang::try_block!({ >> - file.write_all(pending)?; >> - file.flush()?; >> - file.sync_data()?; >> - Ok(()) >> - }); >> - nix::fcntl::flock(file.as_raw_fd(), nix::fcntl::FlockArg::Unlock)?; >> - >> - result?; >> - } >> - None => bail!("media catalog not writable (opened read only)"), >> - } >> + let Some(file) = self.file.take() else { >> + bail!("media catalog not writable (opened read only)"); >> + }; >> + >> + let mut file = Flock::lock(file, FlockArg::LockExclusive).map_err(|(f, e)| { >> + self.file = Some(f); >> + format_err!("flock failed: {:?} - {e:?}", self.file) >> + })?; >> >> - self.pending = Vec::new(); >> + let pending = &self.pending; >> + // Note: lock file, to get a consistent view with load_catalog >> + let result: Result<(), Error> = proxmox_lang::try_block!({ >> + file.write_all(pending)?; >> + file.flush()?; >> + file.sync_data()?; >> + Ok(()) >> + }); >> >> - Ok(()) >> + self.file = Some(file.unlock().expect("shouldn't fail for valid fd")); > > here the situation is a bit more complicated, since we wrote to the file > above.. but looking at the call graphs/error behaviour, it seems to me > that we are often operating on temporary files (though we do not seem to > clean them up in the error path?), or failure to commit herer would > bubble up and abort writing to tape anyway.. > > triggering a panic here doesn't help in any case - the possibly > inconsistent writes/reads have already happened, at best we could maybe > (try to) invalidate the catalog as part of error handling to force > re-reading it from tape? or we could bubble up the error to abort > whatever operation is attempting to commit, and log the error > prominently - taking down the whole PBS instance doesn't buy us anything > AFAICT? also here agreeing regarding panicking, we did simply bubble up the error before, no reason to change that IMO. looking at the code i think we should maybe revamp the catalog handling? out of scope for this series, but I think we should try to write the catalog atomically, (meaning we should always writing into a tmp file and renaming into place, like we do with most of the other files we write) then I think we'd need no lock for reading the catalog at all, but we'd have to copy the old catalog to a tmp one first when we want to append data. (this might be a bad idea since they can get quite large) but as i wrote, not really relevant for this series, just wanted it to point out > >> + >> + if result.is_ok() { >> + self.pending = Vec::new(); >> + } >> + result >> } >> >> /// Conditionally commit if in pending data is large (> 1Mb) >> -- >> 2.47.3 >> >> >> >> >> >>