From: Christian Ebner <c.ebner@proxmox.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
"Proxmox Backup Server development discussion"
<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH v3 proxmox-backup 15/33] fix #3044: server: implement push support for sync operations
Date: Mon, 14 Oct 2024 12:15:45 +0200 [thread overview]
Message-ID: <4a4635b7-66f9-4d0c-b2f9-336dfa7b4cb9@proxmox.com> (raw)
In-Reply-To: <552165182.3870.1728900086933@webmail.proxmox.com>
On 10/14/24 12:01, Fabian Grünbichler wrote:
>
>> Christian Ebner <c.ebner@proxmox.com> hat am 14.10.2024 11:53 CEST geschrieben:
>>
>>
>> On 10/14/24 11:41, Fabian Grünbichler wrote:
>>>
>>>> Christian Ebner <c.ebner@proxmox.com> hat am 14.10.2024 11:32 CEST geschrieben:
>>>> On 10/10/24 16:48, Fabian Grünbichler wrote:
>>>>>> +// Read fixed or dynamic index and push to target by uploading via the backup writer instance
>>>>>> +//
>>>>>> +// For fixed indexes, the size must be provided as given by the index reader.
>>>>>> +#[allow(clippy::too_many_arguments)]
>>>>>> +async fn push_index<'a>(
>>>>>> + filename: &'a str,
>>>>>> + index: impl IndexFile + Send + 'static,
>>>>>> + chunk_reader: Arc<dyn AsyncReadChunk>,
>>>>>> + backup_writer: &BackupWriter,
>>>>>> + manifest: &mut BackupManifest,
>>>>>> + crypt_mode: CryptMode,
>>>>>> + size: Option<u64>,
>>>>>> + known_chunks: &Arc<Mutex<HashSet<[u8; 32]>>>,
>>>>>> +) -> Result<SyncStats, Error> {
>>>>>> + let (upload_channel_tx, upload_channel_rx) = mpsc::channel(20);
>>>>>> + let mut chunk_infos =
>>>>>> + stream::iter(0..index.index_count()).map(move |pos| index.chunk_info(pos).unwrap());
>>>>>
>>>>> so this iterates over all the chunks in the index..
>>>>>
>>>>>> +
>>>>>> + tokio::spawn(async move {
>>>>>> + while let Some(chunk_info) = chunk_infos.next().await {
>>>>>> + let chunk_info = chunk_reader
>>>>>> + .read_raw_chunk(&chunk_info.digest)
>>>>>
>>>>> and this reads them
>>>>>
>>>>>> + .await
>>>>>> + .map(|chunk| ChunkInfo {
>>>>>> + chunk,
>>>>>> + digest: chunk_info.digest,
>>>>>> + chunk_len: chunk_info.size(),
>>>>>> + offset: chunk_info.range.start,
>>>>>> + });
>>>>>> + let _ = upload_channel_tx.send(chunk_info).await;
>>>>>
>>>>> and sends them further along to the upload code.. which will then (in
>>>>> many cases) throw away all that data we just read because it's already
>>>>> on the target and we know that because of the previous manifest..
>>>>>
>>>>> wouldn't it be better to deduplicate here already, and instead of
>>>>> reading known chunks over and over again, just tell the server to
>>>>> re-register them? or am I missing something here? :)
>>>>
>>>> Good catch, this is indeed a possible huge performance bottleneck!
>>>>
>>>> Did fix this by moving the known chunks check here (as suggested) and
>>>> stream a `MergedChunkInfo` instead of `ChunkInfo`, which allows to only
>>>> send the chunk's digest and size over to the backup writer. By this
>>>> known chunk are never read.
>>>
>>> this would be one of the few cases btw where re-uploading a chunk that is missing on the server side (for whatever reason) would be possible - not sure how easy it would be to integrate though ;)
>>
>> Well, unfortunately not directly. Here the chunk is only send via the
>> channel to the backup writer, which itself further processes the stream
>> and transforms it into futures for the upload requests. So the actual
>> failure will be there...
>>
>> In order to upload corrupt/missing chunks I think a different,
>> completely decoupled approach might be better (at least for the sync).
>>
>> E.g. the backup writer could get access to a chunk buffer, which keeps
>> the chunks in memory until the upload was successful. This could also
>> include possible lookup mechanisms to the local/remote chunk store or
>> re-generate required chunks by reading from the block device in case of
>> dirty bitmap tracking..
>>
>> The re-upload in case of reused payload chunks for `ppxar` archives is
>> more tricky...
>
> I mostly meant in the sense of - the source chunk doesn't disappear while we are uploading (as opposed to a regular backup, where the input data might have changed in the meantime), so we can go back and retrieve it without the need to always read it and keep it in memory. it would definitely require propagating the information which chunks need to be uploaded despite being "known" back to the client in some fashion.
Yeah, I think ideally there would be a generic interface which allows
the client to re-generate a missing/corrupt chunk as reported by the
server (on known chunk upload).
The details of how to re-generate or buffer such chunks should than be
handled differently based on what the chunk source is (backup of
streams, backup of filesystems, backup of ...., buffered chunks, syncs,
...).
If not possible, the backup should simply fail...
Something like a
```
trait {
fn regenerate_known_chunk(digest: [32; u8]) -> Result<ChunkInfo,
Error>;
}
```
which can be implemented as required?
But definitely out of scope for this patch series :)
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2024-10-14 10:15 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-12 14:32 [pbs-devel] [PATCH v3 proxmox-backup 00/33] fix #3044: push datastore to remote target Christian Ebner
2024-09-12 14:32 ` [pbs-devel] [PATCH v3 proxmox-backup 01/33] api: datastore: add missing whitespace in description Christian Ebner
2024-09-12 14:32 ` [pbs-devel] [PATCH v3 proxmox-backup 02/33] server: sync: move sync related stats to common module Christian Ebner
2024-09-12 14:32 ` [pbs-devel] [PATCH v3 proxmox-backup 03/33] server: sync: move reader trait to common sync module Christian Ebner
2024-09-12 14:32 ` [pbs-devel] [PATCH v3 proxmox-backup 04/33] server: sync: move source " Christian Ebner
2024-09-12 14:32 ` [pbs-devel] [PATCH v3 proxmox-backup 05/33] client: backup writer: bundle upload stats counters Christian Ebner
2024-10-10 14:49 ` Fabian Grünbichler
2024-09-12 14:32 ` [pbs-devel] [PATCH v3 proxmox-backup 06/33] client: backup writer: factor out merged chunk stream upload Christian Ebner
2024-09-12 14:32 ` [pbs-devel] [PATCH v3 proxmox-backup 07/33] client: backup writer: add chunk count and duration stats Christian Ebner
2024-09-12 14:32 ` [pbs-devel] [PATCH v3 proxmox-backup 08/33] client: backup writer: allow push uploading index and chunks Christian Ebner
2024-09-12 14:32 ` [pbs-devel] [PATCH v3 proxmox-backup 09/33] server: sync: move skip info/reason to common sync module Christian Ebner
2024-09-12 14:32 ` [pbs-devel] [PATCH v3 proxmox-backup 10/33] server: sync: make skip reason message more genenric Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 11/33] server: sync: factor out namespace depth check into sync module Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 12/33] config: acl: mention optional namespace acl path component Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 13/33] config: acl: allow namespace components for remote datastores Christian Ebner
2024-10-10 14:49 ` Fabian Grünbichler
2024-10-14 8:18 ` Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 14/33] api types: define remote permissions and roles for push sync Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 15/33] fix #3044: server: implement push support for sync operations Christian Ebner
2024-10-10 14:48 ` Fabian Grünbichler
2024-10-14 9:32 ` Christian Ebner
2024-10-14 9:41 ` Fabian Grünbichler
2024-10-14 9:53 ` Christian Ebner
2024-10-14 10:01 ` Fabian Grünbichler
2024-10-14 10:15 ` Christian Ebner [this message]
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 16/33] config: jobs: add `sync-push` config type for push sync jobs Christian Ebner
2024-10-10 14:48 ` Fabian Grünbichler
2024-10-14 8:16 ` Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 17/33] api: push: implement endpoint for sync in push direction Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 18/33] api: sync: move sync job invocation to server sync module Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 19/33] api: sync jobs: expose optional `sync-direction` parameter Christian Ebner
2024-10-10 14:48 ` Fabian Grünbichler
2024-10-14 8:10 ` Christian Ebner
2024-10-14 9:25 ` Fabian Grünbichler
2024-10-14 9:36 ` Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 20/33] api: sync: add permission checks for push sync jobs Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 21/33] bin: manager: add datastore push cli command Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 22/33] ui: group filter: allow to set namespace for local datastore Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 23/33] ui: sync edit: source group filters based on sync direction Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 24/33] ui: add view with separate grids for pull and push sync jobs Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 25/33] ui: sync job: adapt edit window to be used for pull and push Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 26/33] ui: sync: pass sync-direction to allow removing push jobs Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 27/33] ui: sync view: do not use data model proxy for store Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 28/33] ui: sync view: set sync direction when invoking run task via api Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 29/33] datastore: move `BackupGroupDeleteStats` to api types Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 30/33] api types: implement api type for `BackupGroupDeleteStats` Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 31/33] datastore: increment deleted group counter when removing group Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 32/33] api: datastore/namespace: return backup groups delete stats on remove Christian Ebner
2024-10-11 9:32 ` Fabian Grünbichler
2024-10-14 10:24 ` Christian Ebner
2024-09-12 14:33 ` [pbs-devel] [PATCH v3 proxmox-backup 33/33] server: sync job: use delete stats provided by the api Christian Ebner
2024-10-11 9:32 ` Fabian Grünbichler
2024-10-15 7:30 ` Christian Ebner
2024-10-15 7:44 ` Fabian Grünbichler
2024-10-15 8:04 ` Christian Ebner
2024-10-10 14:48 ` [pbs-devel] [PATCH v3 proxmox-backup 00/33] fix #3044: push datastore to remote target Fabian Grünbichler
2024-10-11 7:12 ` Christian Ebner
2024-10-11 7:51 ` Fabian Grünbichler
2024-10-14 11:04 ` [pbs-devel] partially-applied: " Fabian Grünbichler
2024-10-17 13:31 ` [pbs-devel] " Christian Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4a4635b7-66f9-4d0c-b2f9-336dfa7b4cb9@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=f.gruenbichler@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox