From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Nicolas Frey <n.frey@proxmox.com>,
Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
Shannon Sterz <s.sterz@proxmox.com>,
Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: pve-devel <pve-devel-bounces@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH pve-storage] fix #6450: add file-checksum endpoint to storage API
Date: Thu, 02 Oct 2025 14:51:16 +0200 [thread overview]
Message-ID: <1759409033.tjmikxhssu.astroid@yuna.none> (raw)
In-Reply-To: <5a01ab84-2d91-4e64-826a-29ebf6bd4545@proxmox.com>
On October 2, 2025 2:41 pm, Thomas Lamprecht wrote:
> Am 02.10.25 um 14:15 schrieb Shannon Sterz:
>>> warn $@ if $@;
>>> }
>>>
>>> + if (exists $param->{checksum}) {
>>> + print "calculating checksum...\n";
>>> + $entry->{checksum} = PVE::Tools::get_file_hash($param->{checksum}, $path);
>> i've tested this with some not too uncommon disk images such as a 32GB
>> volume that is essentially empty and the api endpoint here just times
>> out. which is not too surprising. i wonder if we can cache the hashes
>> here somehow and calculate them in a worker tasks. i also wonder how
>> this should ideally work for running vm and container images as their
>> checksum could change all the time.
>>
>> maybe we can at least calculate the hashes here for some more static
>> assets such iso etc. ahead of time and only enable this flag for things
>> like that (so isos, container templates, images of vm and container
>> templates etc.) basically things that don't change that much?
>
>
> I could not find it, but IIRC there was such a request (or patch?) for
> checksums of storage content submitted in the past where we discussed
> this already.
>
> Anyhow, this is really not something trivial and would need some system
> to cache the hash while also having a heuristic that ensures the cached
> hash is still valid – as having a wrong hash returned might needlessly
> wreck some nerves of any admin that take their job seriously.
>
> We could do a file that contains the hash(es) and a inode nr., file
> size and mtime value from the time those hash(es) got created as
> heuristic to detect legitimate change. Plus probably the date to
> show the user that this is was not calculated on the fly.
> And yes, actual calculation needs to happen in a task worker, as
> this can run for quite a while on big files and/or slow storages.
> So probably best done in a dedicated API call I guess, but with all
> this in mind I'm questing a bit if this is really worth that much
> effort...
recently discussed this with Dominik in the context of the streaming PBS
content API - we should really finally get around to implement an async
storage content list API call - then this could easily be only enabled
for the async variant..
the rough sketch was:
- add a task worker variant that is "ephemeral"/"light-weight"/..
- such task workers return a structured result object that is saved to disk
- the API endpoint starting them returns some kind of "token" (similar
to the UPID for regular tasks, or maybe even use the same format?)
- they are not included in the regular task list
- the result can be queried using the token, once the task has finished
either an error or the result is returned and the result is removed
from disk
the UI could then trigger periodic refreshs of the content view, always
display (slightly outdated) information, etc.pp., other clients could
opt-into the async variant as well, if it fits their use case.
besides the storage content view, there's a few more that would benefit
from this kind of mechanism (with or without a client-side cache):
https://bugzilla.proxmox.com/show_bug.cgi?id=4447
https://bugzilla.proxmox.com/show_bug.cgi?id=3045
https://bugzilla.proxmox.com/show_bug.cgi?id=4961
and probably a few more that I failed to find quickly.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
prev parent reply other threads:[~2025-10-02 12:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-29 9:35 Nicolas Frey
2025-10-02 12:15 ` Shannon Sterz
2025-10-02 12:41 ` Thomas Lamprecht
2025-10-02 12:51 ` Fabian Grünbichler [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1759409033.tjmikxhssu.astroid@yuna.none \
--to=f.gruenbichler@proxmox.com \
--cc=n.frey@proxmox.com \
--cc=pve-devel-bounces@lists.proxmox.com \
--cc=pve-devel@lists.proxmox.com \
--cc=s.sterz@proxmox.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox