From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <m.carrara@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 3A0569D81
 for <pbs-devel@lists.proxmox.com>; Mon, 28 Aug 2023 16:42:37 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 2275E1EE3B
 for <pbs-devel@lists.proxmox.com>; Mon, 28 Aug 2023 16:42:37 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pbs-devel@lists.proxmox.com>; Mon, 28 Aug 2023 16:42:35 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 63C3A47253
 for <pbs-devel@lists.proxmox.com>; Mon, 28 Aug 2023 16:42:35 +0200 (CEST)
From: Max Carrara <m.carrara@proxmox.com>
To: pbs-devel@lists.proxmox.com
Date: Mon, 28 Aug 2023 16:42:02 +0200
Message-Id: <20230828144204.3591503-1-m.carrara@proxmox.com>
X-Mailer: git-send-email 2.39.2
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.083 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: [pbs-devel] [RFC PATCH proxmox-backup 0/2] Introduce experimental
 `AsyncExtractor<T>`
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Mon, 28 Aug 2023 14:42:37 -0000

This RFC proposes an asynchronous implementation of
`pbs_client::pxar::extract::{Extractor, ExtractorIter}`.

This `AsyncExtractor<T>` has been remodeled from the ground up while
preserving the core extraction logic. Its purpose is to provide
fully concurrent extraction of pxar files or streams. It does so by
offloading every synchronous / blocking call to a separate worker
thread with an internal queue. Extraction tasks are executed
sequentially to allow for predictable behaviour.

The async extractor is intentionally put into a separate module
(located at `pbs_client::pxar::aio`), as a complete refactor of every
existing extraction-related piece of code is beyond the scope (and
intention) of this RFC.

Its public API is nowhere near final, but serves its purpose for the
time being. Other functions found within `pbs_client::pxar::extract`
are not yet implemented.

Questions this RFC intends to resolve:
  1. In which situations would the `AsyncExtractor<T>` make sense?
     In which wouldn't it?
  2. Should the sync variant be kept around, sharing a `common`
     implementation with its async variant? If yes, why?
  3. Are there any features that the `AsyncExtractor<T>` lacks?

Even though of lesser priority, these questions should also be addressed:
  4. Which parts of the `AsyncExtractor<T>` are inadequate and could
     use improvement?
  5. Which traits should the `AsyncExtractor<T>` implement (if any?)
     (e.g. `tokio_stream`, etc.)

Furthermore, due to the nature of async applications requiring a
runtime in Rust, the `AsyncExtractor<T>` currently suffers from
the runtime's overhead. This difference in performance can be seen
when comparing the async version of `pxar` (see patch 2) with its
current sync counterpart. In my opinion, this does point towards a
common implementation which may be used by either sync or async
variant, but I am curious to what others have to say.

Let me know what you think! :-)

Max Carrara (2):
  pbs-client: pxar: Add prototype implementation of `AsyncExtractor<T>`
  pxar-bin: Use async instead of sync extractor

 Cargo.toml                                   |   1 +
 pbs-client/Cargo.toml                        |   1 +
 pbs-client/src/pxar/aio/dir_stack.rs         | 543 +++++++++++++++++++
 pbs-client/src/pxar/aio/extract/extractor.rs | 446 +++++++++++++++
 pbs-client/src/pxar/aio/extract/mod.rs       | 220 ++++++++
 pbs-client/src/pxar/aio/extract/raw.rs       | 503 +++++++++++++++++
 pbs-client/src/pxar/aio/metadata.rs          | 412 ++++++++++++++
 pbs-client/src/pxar/aio/mod.rs               |  11 +
 pbs-client/src/pxar/aio/worker.rs            | 167 ++++++
 pbs-client/src/pxar/mod.rs                   |   1 +
 pxar-bin/src/main.rs                         |  91 ++--
 11 files changed, 2352 insertions(+), 44 deletions(-)
 create mode 100644 pbs-client/src/pxar/aio/dir_stack.rs
 create mode 100644 pbs-client/src/pxar/aio/extract/extractor.rs
 create mode 100644 pbs-client/src/pxar/aio/extract/mod.rs
 create mode 100644 pbs-client/src/pxar/aio/extract/raw.rs
 create mode 100644 pbs-client/src/pxar/aio/metadata.rs
 create mode 100644 pbs-client/src/pxar/aio/mod.rs
 create mode 100644 pbs-client/src/pxar/aio/worker.rs

--
2.39.2