From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 58C301FF17A for ; Tue, 6 Jan 2026 15:25:18 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 1A98F1A5F8; Tue, 6 Jan 2026 15:26:33 +0100 (CET) From: Kefu Chai To: pve-devel@lists.proxmox.com Date: Tue, 6 Jan 2026 22:24:39 +0800 Message-ID: <20260106142440.2368585-16-k.chai@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260106142440.2368585-1-k.chai@proxmox.com> References: <20260106142440.2368585-1-k.chai@proxmox.com> MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1767709520943 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.202 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_ASCII_DIVIDERS 0.8 Email that uses ascii formatting dividers and possible spam tricks KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH pve-cluster 15/15] pmxcfs-rs: add project documentation X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Cc: Kefu Chai Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" From: Kefu Chai --- src/pmxcfs-rs/ARCHITECTURE.txt | 350 +++++++++++++++++++++++++++++++++ src/pmxcfs-rs/README.md | 235 ++++++++++++++++++++++ 2 files changed, 585 insertions(+) create mode 100644 src/pmxcfs-rs/ARCHITECTURE.txt create mode 100644 src/pmxcfs-rs/README.md diff --git a/src/pmxcfs-rs/ARCHITECTURE.txt b/src/pmxcfs-rs/ARCHITECTURE.txt new file mode 100644 index 00000000..2854520b --- /dev/null +++ b/src/pmxcfs-rs/ARCHITECTURE.txt @@ -0,0 +1,350 @@ +================================================================================ + pmxcfs-rs Architecture Overview +================================================================================ + + Crate Dependency Graph +================================================================================ + + +-------------------+ + | pmxcfs-api-types | + | (Shared Types) | + +-------------------+ + ^ + | + +----------------------+----------------------+ + | | | + | | | ++---------+---------+ +---------+---------+ +---------+---------+ +| pmxcfs-config | | pmxcfs-memdb | | pmxcfs-rrd | +| (Configuration) | | (SQLite DB) | | (RRD Files) | ++-------------------+ +-------------------+ +-------------------+ + ^ ^ ^ + | | | + | +------------+------------+ | + | | | | ++---------+---------+ +---------+---------+ +| pmxcfs-ipc | | pmxcfs-status | +| (libqb Server) | | (VM/Node Status) | ++-------------------+ +-------------------+ + ^ ^ + | | + | +------------------------+ + | | ++---------+---------+ +| pmxcfs-logger | +| (Cluster Log) | ++-------------------+ + ^ + | ++---------+---------+ +-------------------+ +| pmxcfs-dfsm | | pmxcfs-services | +| (State Machine) | | (Service Mgmt) | ++-------------------+ +-------------------+ + ^ ^ + | | + +------------------+---------------+ + | + +---------+---------+ + | pmxcfs | + | (Main Daemon) | + +-------------------+ + + +================================================================================ + Component Descriptions +================================================================================ + +pmxcfs-api-types + Shared types, errors, and constants used across all crates + - Error types (PmxcfsError) + - Common data structures + - VmType enum (Qemu, Lxc) + +pmxcfs-config + Corosync configuration parsing and management + - Reads /etc/corosync/corosync.conf + - Extracts cluster configuration (nodes, quorum, etc.) + - Provides Config struct + +pmxcfs-memdb + In-memory database with SQLite persistence + - SQLite schema version 5 (C-compatible) + - FUSE plugin system (6 functional + 4 link plugins) + - Key-value storage + - Version tracking + +pmxcfs-rrd + Round-Robin Database file management + - RRD file creation and updates + - Schema definitions (CPU, memory, network, etc.) + - Format migration (v1/v2/v3) + - rrdcached integration + +pmxcfs-status + Cluster status tracking + - VM/CT registration and tracking + - Node online/offline status + - RRD data collection + - Cluster log storage + +pmxcfs-ipc + libqb-compatible IPC server + - Unix socket server (@pve2) + - Wire protocol compatibility with libqb clients + - QB_IPC_SOCKET implementation + - 13 IPC operations (version, get, set, mkdir, etc.) + +pmxcfs-logger + Cluster log with distributed synchronization + - Ring buffer storage (50,000 entries) + - Deduplication + - Binary message format (32-byte aligned) + - Multi-node synchronization + +pmxcfs-dfsm + Distributed Finite State Machine + - State synchronization via Corosync CPG + - Message ordering and queuing + - Leader-based updates + - Membership change handling + - Services: + * ClusterDatabaseService (MemDB sync) + * StatusSyncService (Status sync) + +pmxcfs-services + Service lifecycle management framework + - Automatic retry logic + - Service dependencies + - Graceful shutdown + +pmxcfs (main daemon) + Main binary that integrates all components + - FUSE filesystem operations + - Corosync/CPG integration + - IPC server lifecycle + - Plugin system + - Daemon process management + + +================================================================================ + Data Flow: Write Operation +================================================================================ + +User/API + | + | write to /etc/pve/nodes/node1/qemu-server/100.conf + | + v +FUSE Layer (pmxcfs::fuse::filesystem) + | + | filesystem::write() + | + v +MemDB (pmxcfs-memdb) + | + | memdb.set(path, data) + | Update SQLite database + | + v +DFSM (pmxcfs-dfsm) + | + | dfsm.broadcast_update(FuseMessage::Write) + | + v +Corosync CPG + | + | CPG multicast to all nodes + | + v +All Cluster Nodes + | + | Receive CPG message + | Apply update to local MemDB + | Update FUSE filesystem + + +================================================================================ + Data Flow: Cluster Log Entry +================================================================================ + +Local Log Event + | + | cluster log write + | + v +Logger (pmxcfs-logger) + | + | Add to ring buffer + | Check for duplicates + | + v +Status (pmxcfs-status) + | + | Store in status subsystem + | + v +DFSM (pmxcfs-dfsm) + | + | Broadcast via StatusSyncService + | + v +Corosync CPG + | + | Multicast to cluster + | + v +All Nodes + | + | Receive and merge log entries + + +================================================================================ + Data Flow: IPC Request +================================================================================ + +Perl Client (PVE::IPCC) + | + | libqb IPC request (e.g., get("/nodes/localhost/qemu-server/100.conf")) + | + v +IPC Server (pmxcfs-ipc) + | + | Parse libqb wire protocol + | Route to appropriate handler + | + v +MemDB (pmxcfs-memdb) + | + | memdb.get(path) + | Query SQLite or plugin + | + v +IPC Server + | + | Format libqb response + | + v +Perl Client + | + | Receive data + + +================================================================================ + Initialization Sequence +================================================================================ + +1. Parse command line arguments + - Debug mode, local mode, paths, etc. + +2. Set up logging (tracing) + - journald integration + - Environment filter + - .debug file toggle support + +3. Initialize MemDB + - Open/create SQLite database + - Initialize schema (version 5) + - Register plugins + +4. Load Corosync configuration + - Parse corosync.conf + - Extract node info, quorum settings + +5. Initialize Status subsystem + - Set up VM/CT tracking + - Initialize RRD storage + - Set up cluster log + +6. Create DFSM + - Initialize state machine + - Set up CPG handler + - Register callbacks (MemDbCallbacks, StatusCallbacks) + +7. Start Services + - ClusterDatabaseService (MemDB sync) + - StatusSyncService (Status sync) + - QuorumService (quorum monitoring) + - ClusterConfigService (config sync) + +8. Initialize IPC Server + - Create Unix socket (@pve2) + - Set up request handlers + - Start listening + +9. Mount FUSE Filesystem + - Create mount point (/etc/pve) + - Initialize FUSE operations + - Start FUSE event loop + +10. Enter main event loop + - Handle DFSM messages + - Process IPC requests + - Service FUSE operations + - Monitor quorum + + +================================================================================ + Key Design Patterns +================================================================================ + +Trait-Based Abstraction + - DFSM uses Callbacks trait for MemDB/Status updates + - Enables testing with mock implementations + - Clean separation of concerns + +Service Framework + - pmxcfs-services provides retry logic + - Services can be started/stopped independently + - Automatic error recovery + +Plugin System + - MemDB supports dynamic plugins + - Functional plugins: Generate content on-the-fly + - Link plugins: Symlinks to other paths + - Examples: .version, .members, .vmlist, etc. + +Wire Protocol Compatibility + - IPC server implements libqb wire protocol + - Binary compatibility with C libqb clients + - Enables Perl tools (PVE::IPCC) to work unchanged + +Async Runtime + - tokio for async I/O + - Non-blocking operations + - Efficient resource usage + + +================================================================================ + Thread Model +================================================================================ + +Main Thread + - FUSE event loop (blocking) + - Handles filesystem operations + +Tokio Runtime + - IPC server (async) + - DFSM message handling (async) + - Service tasks (async) + - CPG message processing + +Background Threads + - SQLite I/O (blocking, offloaded) + - RRD file writes (blocking) + + +================================================================================ + Testing +================================================================================ + +Unit Tests + - Per-crate unit tests with mock implementations + - Run with: cargo test --workspace + +Integration Tests + - Comprehensive test suite in integration-tests/ directory + - Single-node, multi-node, and mixed C/Rust cluster tests + - See integration-tests/README.md for full documentation + + +================================================================================ diff --git a/src/pmxcfs-rs/README.md b/src/pmxcfs-rs/README.md new file mode 100644 index 00000000..4ad846f3 --- /dev/null +++ b/src/pmxcfs-rs/README.md @@ -0,0 +1,235 @@ +# pmxcfs-rs + +## Executive Summary + +pmxcfs-rs is a complete rewrite of the Proxmox Cluster File System from C to Rust, achieving full functional parity while maintaining wire-format compatibility with the C implementation. The implementation has passed comprehensive single-node and multi-node integration testing. + +**Overall Completion**: All subsystems implemented +- All core subsystems implemented and tested +- Wire protocol compatibility verified +- Comprehensive test coverage (24 integration tests + extensive unit tests) +- Production client compatibility confirmed +- Multi-node cluster functionality validated + +--- + +## Component Status + +### Workspace Structure + +pmxcfs-rs is organized as a Rust workspace with 9 crates: + +| Crate | Purpose | +|-------|---------| +| `pmxcfs` | Main daemon binary | +| `pmxcfs-config` | Configuration management | +| `pmxcfs-api-types` | Shared types and errors | +| `pmxcfs-memdb` | Database with SQLite backend | +| `pmxcfs-dfsm` | Distributed state machine + CPG | +| `pmxcfs-rrd` | RRD file persistence | +| `pmxcfs-status` | Status monitoring + RRD | +| `pmxcfs-ipc` | libqb-compatible IPC server | +| `pmxcfs-services` | Service lifecycle framework | +| `pmxcfs-logger` | Cluster log + ring buffer | + +### Compatibility Matrix + +| Component | Notes | +|-----------|-------| +| **FUSE Filesystem** | All operations implemented | +| **Database (MemDB)** | SQLite schema compatible | +| **Cluster Communication** | CPG/Quorum via Corosync | +| **DFSM State Machine** | Binary message format compatible | +| **IPC Server** | Wire protocol verified with libqb clients | +| **Plugin System** | All 10 plugins (6 func + 4 link) with write support | +| **RRD Integration** | Format migration implemented | +| **Status Subsystem** | VM list, config tracking, cluster log | + +--- + +## Design Decisions and Notable Differences + +### 1. IPC Protocol: Partial libqb Implementation + +**Decision**: Implement libqb-compatible wire protocol without using libqb library directly. + +**C Implementation**: +- Uses libqb library directly (`libqb0`, `libqb-dev`) +- Full libqb feature set (SHM ring buffers, POSIX message queues, etc.) +- IPC types: `QB_IPC_SOCKET`, `QB_IPC_SHM`, `QB_IPC_POSIX_MQ` + +**Rust Implementation**: +- Custom implementation of libqb wire protocol +- Only implements `QB_IPC_SOCKET` type (Unix datagram sockets + shared memory control files) +- Compatible handshake, request/response structures +- Verified with both libqb C clients and production Perl clients (PVE::IPCC) + +**Rationale**: +- libqb has no Rust bindings and FFI would be complex +- pmxcfs only uses `QB_IPC_SOCKET` type in production +- Wire protocol compatibility is what matters for clients +- Simpler implementation, easier to maintain + +**Compatibility Impact**: **None** - All production clients work identically + +**Reference**: +- C: `src/pmxcfs/server.c` (uses libqb API) +- Rust: `src/pmxcfs-rs/pmxcfs-ipc/src/server.rs` (custom implementation) +- Verification: `pmxcfs-ipc/tests/qb_wire_compat.rs` (all tests passing) + +--- + +### 2. Logging System: tracing vs qb_log + +**Decision**: Use Rust `tracing` ecosystem instead of libqb's `qb_log`. + +**C Implementation**: +- Uses `qb_log` from libqb for all logging +- Log levels: `QB_LOG_EMERG`, `QB_LOG_ALERT`, `QB_LOG_CRIT`, `QB_LOG_ERR`, `QB_LOG_WARNING`, `QB_LOG_NOTICE`, `QB_LOG_INFO`, `QB_LOG_DEBUG` +- Output: syslog + stderr +- Runtime control: Write to `/etc/pve/.debug` file (0 = info, 1 = debug) +- Format: `[domain] LEVEL: message (file.c:line:function)` + +**Rust Implementation**: +- Uses `tracing` crate with `tracing-subscriber` +- Log levels: `ERROR`, `WARN`, `INFO`, `DEBUG`, `TRACE` +- Output: journald (via `tracing-journald`) + stdout +- Runtime control: Same mechanism - `.debug` plugin file (0 = info, 1 = debug) +- Format: `[timestamp] LEVEL module::path: message` + +**Key Differences**: + +| Aspect | C (qb_log) | Rust (tracing) | Impact | +|--------|-----------|----------------|--------| +| **Log format** | `[domain] INFO: msg (file.c:123)` | `2025-11-14T10:30:45 INFO pmxcfs::module: msg` | Log parsers need update | +| **Severity levels** | 8 levels (syslog standard) | 5 levels (standard Rust) | Mapping works fine | +| **Destination** | syslog | journald (systemd) | Both queryable, journald is modern | +| **Runtime toggle** | `/etc/pve/.debug` | Same | **No change** | +| **CLI flag** | `-d` or `--debug` | Same | **No change** | + +**Rationale**: +- `tracing` is the Rust ecosystem standard +- Better async/structured logging support +- No FFI to libqb needed +- Integrates with systemd/journald natively +- Same user-facing behavior (`.debug` file toggle) + +**Compatibility Impact**: **Minor** - Log monitoring scripts may need format updates + +**Migration**: +```bash +# Old C logs (syslog) +journalctl -u pve-cluster | grep pmxcfs + +# New Rust logs (journald, same command works) +journalctl -u pve-cluster | grep pmxcfs +``` + +**Reference**: +- C: `src/pmxcfs/pmxcfs.c` (qb_log initialization) +- Rust: `src/pmxcfs-rs/pmxcfs/src/main.rs` (tracing-subscriber setup) + +--- + +### 3. OpenVZ Container Support: Intentionally Excluded + +**Decision**: No functional support for OpenVZ containers. + +**C Implementation**: +- Includes OpenVZ VM type (`VMTYPE_OPENVZ = 2`) +- Detects OpenVZ action scripts (`vps*.mount`, `*.start`, `*.stop`, etc.) +- Sets executable permissions on OpenVZ scripts +- Scans `nodes/*/openvz/` directories for containers +- **All code marked**: `// FIXME: remove openvz stuff for 7.x` + +**Rust Implementation**: +- VM types: `VmType::Qemu = 1`, `VmType::Lxc = 3` (no `VMTYPE_OPENVZ = 2`) +- `/openvz` symlink exists (for backward compatibility) but no functional support +- No OpenVZ script detection or VM scanning + +**Rationale**: +- OpenVZ deprecated in Proxmox VE 4.0 (2015) +- OpenVZ removed completely in Proxmox VE 7.0 (2021) +- pmxcfs-rs ships with Proxmox VE 9.x (2 major versions after removal) +- Last OpenVZ code change: October 2011 (14 years ago) +- Mandatory LXC migration completed years ago + +**Compatibility Impact**: **None** - No PVE 9.x systems have OpenVZ containers + +**Reference**: +- C: `src/pmxcfs/status.h:31-32`, `cfs-plug-memdb.c:46-93`, `memdb.c:455-460` +- Rust: `pmxcfs-api-types/src/lib.rs:99-102` (VmType enum) + +--- + +## Testing + +pmxcfs-rs has a comprehensive test suite with 100+ tests organized following modern Rust testing best practices. + +### Quick Start + +```bash +# Run all tests +cargo test --workspace + +# Run unit tests only (fast, inline tests) +cargo test --lib + +# Run integration tests only +cargo test --test '*' + +# Run specific package tests +cargo test -p pmxcfs-memdb +``` + +### Multi-Node Integration Tests + +Complete integration test suite covering single-node, multi-node cluster, and C/Rust interoperability. + +```bash +cd integration-tests +./test --build # Build and run all tests +./test --no-build # Quick iteration +./test --list # Show available tests +``` + +See [integration-tests/README.md](integration-tests/README.md) for detailed documentation. + +--- + +## Compatibility Summary + +### Wire-Compatible +- IPC protocol (verified with libqb clients) +- DFSM message format (binary compatible) +- Database schema (SQLite version 5) +- RRD file formats (all versions) +- FUSE operations (all 12 ops) + +### Different but Compatible +- Logging system (tracing vs qb_log) - format differs, functionality same +- IPC implementation (custom vs libqb) - protocol identical, implementation differs +- Event loop (tokio vs qb_loop) - both provide event-driven concurrency + +### Intentionally Different +- OpenVZ support (removed, not needed) +- Service priority levels (all run concurrently in Rust) + +--- + +## References + +- **C Implementation**: `src/pmxcfs/` +- **Rust Implementation**: `src/pmxcfs-rs/` + - `pmxcfs` - Main daemon binary + - `pmxcfs-config` - Configuration management + - `pmxcfs-api-types` - Shared types and error definitions + - `pmxcfs-memdb` - In-memory database with SQLite persistence + - `pmxcfs-dfsm` - Distributed Finite State Machine (CPG integration) + - `pmxcfs-rrd` - RRD persistence + - `pmxcfs-status` - Status monitoring and RRD data management + - `pmxcfs-ipc` - libqb-compatible IPC server + - `pmxcfs-services` - Service framework for lifecycle management + - `pmxcfs-logger` - Cluster log with ring buffer and deduplication +- **Testing Guide**: `integration-tests/README.md` +- **Test Runner**: `integration-tests/test` (unified test interface) -- 2.47.3 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel