From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <h.laimer@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 4691897107
 for <pbs-devel@lists.proxmox.com>; Mon,  4 Mar 2024 12:12:12 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 26C3816114
 for <pbs-devel@lists.proxmox.com>; Mon,  4 Mar 2024 12:12:12 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pbs-devel@lists.proxmox.com>; Mon,  4 Mar 2024 12:12:11 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 2CC6B43AC2
 for <pbs-devel@lists.proxmox.com>; Mon,  4 Mar 2024 12:12:11 +0100 (CET)
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
Date: Mon, 04 Mar 2024 12:12:09 +0100
Message-Id: <CZKX5NDO7ZOK.2O9REBO5K03M@dev>
From: "Hannes Laimer" <h.laimer@proxmox.com>
To: "Thomas Lamprecht" <t.lamprecht@proxmox.com>, "Proxmox Backup Server
 development discussion" <pbs-devel@lists.proxmox.com>
X-Mailer: aerc 0.14.0
References: <20240301150315.12253-1-h.laimer@proxmox.com>
 <07b5578e-52c4-4c41-83e2-20f5f73fce93@proxmox.com>
In-Reply-To: <07b5578e-52c4-4c41-83e2-20f5f73fce93@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.007 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [maintenance.rs, datastore.rs, proxmox-backup-proxy.rs]
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2] datastore: remove
 datastore from internal cache based on maintenance mode
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2024 11:12:12 -0000

On Mon Mar 4, 2024 at 11:42 AM CET, Thomas Lamprecht wrote:
> Am 01/03/2024 um 16:03 schrieb Hannes Laimer:
> > We keep a DataStore cache, so ChunkStore's and lock files are kept by
> > the proxy process and don't have to be reopened every time. However, fo=
r
> > specific maintenance modes, e.g. 'offline', our process should not keep
> > file in that datastore open. This clears the cache entry of a datastore
> > if it is in a specific maintanance mode and the last task finished, whi=
ch
> > also drops any files still open by the process.
>
> One always asks themselves if command sockets are the right approach, but
> for this it seems alright.
>
> Some code style comments inline.
>
> > Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
> > Tested-by: Gabriel Goller <g.goller@proxmox.com>
> > Reviewed-by: Gabriel Goller <g.goller@proxmox.com>
> > ---
> >=20
> > v2, thanks @Gabriel:
> >  - improve comments
> >  - remove not needed &'s and .clone()'s
> >=20
> >  pbs-api-types/src/maintenance.rs   |  6 +++++
> >  pbs-datastore/src/datastore.rs     | 41 ++++++++++++++++++++++++++++--
> >  pbs-datastore/src/task_tracking.rs | 23 ++++++++++-------
> >  src/api2/config/datastore.rs       | 18 +++++++++++++
> >  src/bin/proxmox-backup-proxy.rs    |  8 ++++++
> >  5 files changed, 85 insertions(+), 11 deletions(-)
> >=20
> > diff --git a/pbs-api-types/src/maintenance.rs b/pbs-api-types/src/maint=
enance.rs
> > index 1b03ca94..a1564031 100644
> > --- a/pbs-api-types/src/maintenance.rs
> > +++ b/pbs-api-types/src/maintenance.rs
> > @@ -77,6 +77,12 @@ pub struct MaintenanceMode {
> >  }
> > =20
> >  impl MaintenanceMode {
> > +    /// Used for deciding whether the datastore is cleared from the in=
ternal cache after the last
> > +    /// task finishes, so all open files are closed.
> > +    pub fn clear_from_cache(&self) -> bool {
>
> that function name makes it sound like calling it does actively clears it=
,
> but this is only for checking if a required condition for clearing is met=
.
>
> So maybe use a name that better convey that and maybe even avoid coupling
> this to an action that a user of ours executes, as this might have some u=
se
> for other call sites too.
>
> From top of my head one could use `is_offline` as name, adding a note to
> the doc-comment that this is e.g. used to check if a datastore can be
> removed from the cache would still be fine though.
>

I agree, the name is somewhat misleading. The idea was to make it easy
to potentially add more modes here in the future, so maybe something
a little more general like `is_accessible` would make sense?

> > +        self.ty =3D=3D MaintenanceType::Offline
> > +    }
> > +
> >      pub fn check(&self, operation: Option<Operation>) -> Result<(), Er=
ror> {
> >          if self.ty =3D=3D MaintenanceType::Delete {
> >              bail!("datastore is being deleted");
> > diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datasto=
re.rs
> > index 2f0e5279..f26dff83 100644
> > --- a/pbs-datastore/src/datastore.rs
> > +++ b/pbs-datastore/src/datastore.rs
> > @@ -104,8 +104,27 @@ impl Clone for DataStore {
> >  impl Drop for DataStore {
> >      fn drop(&mut self) {
> >          if let Some(operation) =3D self.operation {
> > -            if let Err(e) =3D update_active_operations(self.name(), op=
eration, -1) {
> > -                log::error!("could not update active operations - {}",=
 e);
> > +            let mut last_task =3D false;
> > +            match update_active_operations(self.name(), operation, -1)=
 {
> > +                Err(e) =3D> log::error!("could not update active opera=
tions - {}", e),
> > +                Ok(updated_operations) =3D> {
> > +                    last_task =3D updated_operations.read + updated_op=
erations.write =3D=3D 0;
> > +                }
> > +            }
> > +
> > +            // remove datastore from cache iff=20
> > +            //  - last task finished, and
> > +            //  - datastore is in a maintenance mode that mandates it
> > +            let remove_from_cache =3D last_task
> > +                && pbs_config::datastore::config()
> > +                    .and_then(|(s, _)| s.lookup::<DataStoreConfig>("da=
tastore", self.name()))
> > +                    .map_or(false, |c| {
> > +                        c.get_maintenance_mode()
> > +                            .map_or(false, |m| m.clear_from_cache())
> > +                    });
> > +
> > +            if remove_from_cache {
> > +                DATASTORE_MAP.lock().unwrap().remove(self.name());
> >              }
> >          }
> >      }
> > @@ -193,6 +212,24 @@ impl DataStore {
> >          Ok(())
> >      }
> > =20
> > +    /// trigger clearing cache entries based on maintenance mode. Entr=
ies will only
> > +    /// be cleared iff there is no other task running, if there is, th=
e end of the
> > +    /// last running task will trigger the clearing of the cache entry=
.
> > +    pub fn update_datastore_cache() -> Result<(), Error> {
>
> why does this work on all but not a single datastore, after all we always=
 want to
> remove a specific one?
>

Actually just missed that our command_socket also does args, will update
this in v3.

> > +        let (config, _digest) =3D pbs_config::datastore::config()?;
> > +        for (store, (_, _)) in &config.sections {
> > +            let datastore: DataStoreConfig =3D config.lookup("datastor=
e", store)?;
> > +            if datastore
> > +                .get_maintenance_mode()
> > +                .map_or(false, |m| m.clear_from_cache())
> > +            {
> > +                let _ =3D DataStore::lookup_datastore(store, Some(Oper=
ation::Lookup));
>
> A comment that the actual removal from the cache happens through the drop=
 handler
> would be good, as this is a bit to subtle for my taste, if one stumbles o=
ver this
> in a few months down the line it might cause a bit to much easily to avoi=
d head
> scratching...
>
> Alternatively, factor the actual check-maintenance-mode-and-remove-from-c=
ache out
> of the drop handler and call that explicit here, all you need of outside =
info is
> the name there anyway.

I think that would entail having to open the file twice in the drop
handler, once for updating it, and once for reading it. But just
reading it here and explicitly clearing it from the cache seems
reasonable, it makes it way clrearer what's happening. I'll change that
in a v3.

Thanks for the review!