public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support
@ 2022-10-18  9:20 Fabian Grünbichler
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-apt 1/2] packages file: add section field Fabian Grünbichler
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Fabian Grünbichler @ 2022-10-18  9:20 UTC (permalink / raw)
  To: pve-devel

this series implements filtering based on package section (exact match)
or package name (glob), and extends mirroring support to source
packages/deb-src repositories.

technically the first patch in proxmox-apt is a breaking change, but the
only user of the changed struct is proxmox-offline-mirror, which doesn't
do any incompatible initializations.

proxmox-apt:

Fabian Grünbichler (2):
  packages file: add section field
  deb822: source index support

 src/deb822/mod.rs                             |      3 +
 src/deb822/packages_file.rs                   |      2 +
 src/deb822/release_file.rs                    |      2 +-
 src/deb822/sources_file.rs                    |    255 +
 ..._debian_dists_bullseye_main_source_Sources | 858657 +++++++++++++++
 5 files changed, 858918 insertions(+), 1 deletion(-)
 create mode 100644 src/deb822/sources_file.rs
 create mode 100644 tests/deb822/sources/deb.debian.org_debian_dists_bullseye_main_source_Sources

proxmox-offline-mirror:

Fabian Grünbichler (4):
  mirror: add exclusion of packages/sections
  mirror: implement source packages mirroring
  fix #4264: only require either Release or InRelease
  mirror: refactor fetch_binary/source_packages

 Cargo.toml                                    |   1 +
 debian/control                                |   2 +
 src/bin/proxmox-offline-mirror.rs             |   4 +-
 src/bin/proxmox_offline_mirror_cmds/config.rs |   8 +
 src/config.rs                                 |  40 +-
 src/mirror.rs                                 | 483 ++++++++++++++----
 6 files changed, 437 insertions(+), 101 deletions(-)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH proxmox-apt 1/2] packages file: add section field
  2022-10-18  9:20 [pve-devel] [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support Fabian Grünbichler
@ 2022-10-18  9:20 ` Fabian Grünbichler
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-apt 2/2] deb822: source index support Fabian Grünbichler
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Fabian Grünbichler @ 2022-10-18  9:20 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    technically a breaking change, but the only user (pom) doesn't care.
    not bumping to an incompatible version would avoid the need to bump the dep in proxmox-perl-rs

 src/deb822/packages_file.rs | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/deb822/packages_file.rs b/src/deb822/packages_file.rs
index a51f71e..90b21c6 100644
--- a/src/deb822/packages_file.rs
+++ b/src/deb822/packages_file.rs
@@ -57,6 +57,7 @@ pub struct PackageEntry {
     pub size: usize,
     pub installed_size: Option<usize>,
     pub checksums: CheckSums,
+    pub section: String,
 }
 
 #[derive(Debug, Default, PartialEq, Eq)]
@@ -83,6 +84,7 @@ impl TryFrom<PackagesFileRaw> for PackageEntry {
             size: value.size.parse::<usize>()?,
             installed_size,
             checksums: CheckSums::default(),
+            section: value.section,
         };
 
         if let Some(md5) = value.md5_sum {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH proxmox-apt 2/2] deb822: source index support
  2022-10-18  9:20 [pve-devel] [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support Fabian Grünbichler
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-apt 1/2] packages file: add section field Fabian Grünbichler
@ 2022-10-18  9:20 ` Fabian Grünbichler
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 1/4] mirror: add exclusion of packages/sections Fabian Grünbichler
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Fabian Grünbichler @ 2022-10-18  9:20 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
the test file needs to be downloaded from the referenced URL and
uncompressed (it's too big to send as patch).

its SHA256sum is e7777c1d305f5e0a31bcf2fe26e955436986edb5c211c03a362c7d557c899349 

 src/deb822/mod.rs                             |      3 +
 src/deb822/release_file.rs                    |      2 +-
 src/deb822/sources_file.rs                    |    255 +
 ..._debian_dists_bullseye_main_source_Sources | 858657 +++++++++++++++
 4 files changed, 858916 insertions(+), 1 deletion(-)
 create mode 100644 src/deb822/sources_file.rs
 create mode 100644 tests/deb822/sources/deb.debian.org_debian_dists_bullseye_main_source_Sources

diff --git a/src/deb822/mod.rs b/src/deb822/mod.rs
index 7a1bb0e..59e7c21 100644
--- a/src/deb822/mod.rs
+++ b/src/deb822/mod.rs
@@ -5,6 +5,9 @@ pub use release_file::{CompressionType, FileReference, FileReferenceType, Releas
 mod packages_file;
 pub use packages_file::PackagesFile;
 
+mod sources_file;
+pub use sources_file::SourcesFile;
+
 #[derive(Copy, Clone, Debug, Default, PartialEq, Eq, PartialOrd, Ord)]
 pub struct CheckSums {
     pub md5: Option<[u8; 16]>,
diff --git a/src/deb822/release_file.rs b/src/deb822/release_file.rs
index c50c095..85d3436 100644
--- a/src/deb822/release_file.rs
+++ b/src/deb822/release_file.rs
@@ -245,7 +245,7 @@ impl FileReferenceType {
     }
 
     pub fn is_package_index(&self) -> bool {
-        matches!(self, FileReferenceType::Packages(_, _))
+        matches!(self, FileReferenceType::Packages(_, _) | FileReferenceType::Sources(_))
     }
 }
 
diff --git a/src/deb822/sources_file.rs b/src/deb822/sources_file.rs
new file mode 100644
index 0000000..a13d84f
--- /dev/null
+++ b/src/deb822/sources_file.rs
@@ -0,0 +1,255 @@
+use std::collections::HashMap;
+
+use anyhow::{bail, Error, format_err};
+use rfc822_like::de::Deserializer;
+use serde::Deserialize;
+use serde_json::Value;
+
+use super::CheckSums;
+//Uploaders
+//
+//Homepage
+//
+//Version Control System (VCS) fields
+//
+//Testsuite
+//
+//Dgit
+//
+//Standards-Version (mandatory)
+//
+//Build-Depends et al
+//
+//Package-List (recommended)
+//
+//Checksums-Sha1 and Checksums-Sha256 (mandatory)
+//
+//Files (mandatory)
+
+
+
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "PascalCase")]
+pub struct SourcesFileRaw {
+    pub format: String,
+    pub package: String,
+    pub binary: Option<Vec<String>>,
+    pub version: String,
+    pub section: Option<String>,
+    pub priority: Option<String>,
+    pub maintainer: String,
+    pub uploaders: Option<String>,
+    pub architecture: Option<String>,
+    pub directory: String,
+    pub files: String,
+    #[serde(rename = "Checksums-Sha256")]
+    pub sha256: Option<String>,
+    #[serde(rename = "Checksums-Sha512")]
+    pub sha512: Option<String>,
+    #[serde(flatten)]
+    pub extra_fields: HashMap<String, Value>,
+}
+
+#[derive(Debug, PartialEq, Eq)]
+pub struct SourcePackageEntry {
+    pub format: String,
+    pub package: String,
+    pub binary: Option<Vec<String>>,
+    pub version: String,
+    pub architecture: Option<String>,
+    pub section: Option<String>,
+    pub priority: Option<String>,
+    pub maintainer: String,
+    pub uploaders: Option<String>,
+    pub directory: String,
+    pub files: HashMap<String, SourcePackageFileReference>,
+}
+
+#[derive(Debug, PartialEq, Eq)]
+pub struct SourcePackageFileReference {
+    pub file: String,
+    pub size: usize,
+    pub checksums: CheckSums,
+}
+
+impl SourcePackageEntry {
+    pub fn size(&self) -> usize {
+        self.files.values().map(|f| f.size).sum()
+    }
+}
+
+#[derive(Debug, Default, PartialEq, Eq)]
+/// A parsed representation of a Release file
+pub struct SourcesFile {
+    pub source_packages: Vec<SourcePackageEntry>,
+}
+
+impl TryFrom<SourcesFileRaw> for SourcePackageEntry {
+    type Error = Error;
+
+    fn try_from(value: SourcesFileRaw) -> Result<Self, Self::Error> {
+        let mut parsed = SourcePackageEntry {
+            package: value.package,
+            binary: value.binary,
+            version: value.version,
+            architecture: value.architecture,
+            files: HashMap::new(),
+            format: value.format,
+            section: value.section,
+            priority: value.priority,
+            maintainer: value.maintainer,
+            uploaders: value.uploaders,
+            directory: value.directory,
+        };
+
+        for file_reference in value.files.lines() {
+            let (file_name, size, md5) = parse_file_reference(file_reference, 16)?;
+            let entry = parsed.files.entry(file_name.clone()).or_insert_with(|| SourcePackageFileReference { file: file_name, size, checksums: CheckSums::default()});
+            entry.checksums.md5 = Some(md5.try_into().map_err(|_|format_err!("unexpected checksum length"))?);
+            if entry.size != size {
+                bail!("Size mismatch: {} != {}", entry.size, size);
+            }
+        }
+
+        if let Some(sha256) = value.sha256 {
+            for line in sha256.lines() {
+                let (file_name, size, sha256) = parse_file_reference(line, 32)?;
+                let entry = parsed.files.entry(file_name.clone()).or_insert_with(|| SourcePackageFileReference { file: file_name, size, checksums: CheckSums::default()});
+                entry.checksums.sha256 = Some(sha256.try_into().map_err(|_|format_err!("unexpected checksum length"))?);
+                if entry.size != size {
+                    bail!("Size mismatch: {} != {}", entry.size, size);
+                }
+            }
+        };
+
+        if let Some(sha512) = value.sha512 {
+            for line in sha512.lines() {
+                let (file_name, size, sha512) = parse_file_reference(line, 64)?;
+                let entry = parsed.files.entry(file_name.clone()).or_insert_with(|| SourcePackageFileReference { file: file_name, size, checksums: CheckSums::default()});
+                entry.checksums.sha512 = Some(sha512.try_into().map_err(|_|format_err!("unexpected checksum length"))?);
+                if entry.size != size {
+                    bail!("Size mismatch: {} != {}", entry.size, size);
+                }
+            }
+        };
+
+        for (file_name, reference) in &parsed.files {
+            if !reference.checksums.is_secure() {
+                bail!(
+                    "no strong checksum found for source entry '{}'",
+                    file_name
+                );
+            }
+        }
+
+        Ok(parsed)
+    }
+}
+
+impl TryFrom<String> for SourcesFile {
+    type Error = Error;
+
+    fn try_from(value: String) -> Result<Self, Self::Error> {
+        value.as_bytes().try_into()
+    }
+}
+
+impl TryFrom<&[u8]> for SourcesFile {
+    type Error = Error;
+
+    fn try_from(value: &[u8]) -> Result<Self, Self::Error> {
+        let deserialized = <Vec<SourcesFileRaw>>::deserialize(Deserializer::new(value))?;
+        deserialized.try_into()
+    }
+}
+
+impl TryFrom<Vec<SourcesFileRaw>> for SourcesFile {
+    type Error = Error;
+
+    fn try_from(value: Vec<SourcesFileRaw>) -> Result<Self, Self::Error> {
+        let mut source_packages = Vec::with_capacity(value.len());
+        for entry in value {
+            let entry: SourcePackageEntry = entry.try_into()?;
+            source_packages.push(entry);
+        }
+
+        Ok(Self { source_packages })
+    }
+}
+
+fn parse_file_reference(
+    line: &str,
+    csum_len: usize,
+) -> Result<(String, usize, Vec<u8>), Error> {
+    let mut split = line.split_ascii_whitespace();
+
+    let checksum = split
+        .next()
+        .ok_or_else(|| format_err!("Missing 'checksum' field."))?;
+    if checksum.len() > csum_len * 2 {
+        bail!(
+            "invalid checksum length: '{}', expected {} bytes",
+            checksum,
+            csum_len
+        );
+    }
+
+    let checksum = hex::decode(checksum)?;
+
+    let size = split
+        .next()
+        .ok_or_else(|| format_err!("Missing 'size' field."))?
+        .parse::<usize>()?;
+
+    let file = split
+        .next()
+        .ok_or_else(|| format_err!("Missing 'file name' field."))?
+        .to_string();
+
+    Ok((file, size, checksum))
+}
+
+#[test]
+pub fn test_deb_packages_file() {
+    let input = include_str!(concat!(
+        env!("CARGO_MANIFEST_DIR"),
+        "/tests/deb822/sources/deb.debian.org_debian_dists_bullseye_main_source_Sources"
+    ));
+
+    let deserialized =
+        <Vec<SourcesFileRaw>>::deserialize(Deserializer::new(input.as_bytes())).unwrap();
+    assert_eq!(deserialized.len(), 30953);
+
+    let parsed: SourcesFile = deserialized.try_into().unwrap();
+
+    assert_eq!(parsed.source_packages.len(), 30953);
+
+    let found = parsed.source_packages.iter().find(|source| source.package == "base-files").expect("test file contains 'base-files' entry");
+    assert_eq!(found.package, "base-files");
+    assert_eq!(found.format, "3.0 (native)");
+    assert_eq!(found.architecture.as_deref(), Some("any"));
+    assert_eq!(found.directory, "pool/main/b/base-files");
+    assert_eq!(found.section.as_deref(), Some("admin"));
+    assert_eq!(found.version, "11.1+deb11u5");
+
+    let binary_packages = found.binary.as_ref().expect("base-files source package builds base-files binary package");
+    assert_eq!(binary_packages.len(), 1);
+    assert_eq!(binary_packages[0], "base-files");
+    
+    let references = &found.files;
+    assert_eq!(references.len(), 2);
+
+    let dsc_file = "base-files_11.1+deb11u5.dsc";
+    let dsc = references.get(dsc_file).expect("base-files source package contains 'dsc' reference");
+    assert_eq!(dsc.file, dsc_file);
+    assert_eq!(dsc.size, 1110);
+    assert_eq!(dsc.checksums.md5.expect("dsc has md5 checksum"), hex::decode("741c34ac0151262a03de8d5a07bc4271").unwrap()[..]);
+    assert_eq!(dsc.checksums.sha256.expect("dsc has sha256 checksum"), hex::decode("c41a7f00d57759f27e6068240d1ea7ad80a9a752e4fb43850f7e86e967422bd3").unwrap()[..]);
+
+    let tar_file = "base-files_11.1+deb11u5.tar.xz";
+    let tar = references.get(tar_file).expect("base-files source package contains 'tar' reference");
+    assert_eq!(tar.file, tar_file);
+    assert_eq!(tar.size, 65612);
+    assert_eq!(tar.checksums.md5.expect("tar has md5 checksum"), hex::decode("995df33642118b566a4026410e1c6aac").unwrap()[..]);
+    assert_eq!(tar.checksums.sha256.expect("tar has sha256 checksum"), hex::decode("31c9e5745845a73f3d5c8a7868c379d77aaca42b81194679d7ab40cc28e3a0e9").unwrap()[..]);
+}
\ No newline at end of file
diff --git a/tests/deb822/sources/deb.debian.org_debian_dists_bullseye_main_source_Sources b/tests/deb822/sources/deb.debian.org_debian_dists_bullseye_main_source_Sources
new file mode 100644
index 0000000..2b8e387
--- /dev/null
+++ b/tests/deb822/sources/deb.debian.org_debian_dists_bullseye_main_source_Sources
@@ -0,0 +1,1 @@
+DOWNLOAD-ME-FROM: http://snapshot.debian.org/archive/debian/20221017T212657Z/dists/bullseye/main/source/Sources.xz
-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH proxmox-offline-mirror 1/4] mirror: add exclusion of packages/sections
  2022-10-18  9:20 [pve-devel] [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support Fabian Grünbichler
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-apt 1/2] packages file: add section field Fabian Grünbichler
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-apt 2/2] deb822: source index support Fabian Grünbichler
@ 2022-10-18  9:20 ` Fabian Grünbichler
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 2/4] mirror: implement source packages mirroring Fabian Grünbichler
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Fabian Grünbichler @ 2022-10-18  9:20 UTC (permalink / raw)
  To: pve-devel

to keep the size of mirror snapshots down by excluding unnecessary files
(e.g., games data, browsers, debug packages, ..).

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    requires proxmox-apt with 'section' field

    we could suggest excluding sections like 'games' in the
    wizard/docs..

 Cargo.toml                                    |  1 +
 debian/control                                |  2 +
 src/bin/proxmox-offline-mirror.rs             |  4 +-
 src/bin/proxmox_offline_mirror_cmds/config.rs |  8 +++
 src/config.rs                                 | 40 ++++++++++++-
 src/mirror.rs                                 | 59 ++++++++++++++++++-
 6 files changed, 111 insertions(+), 3 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index 76791c8..b2bb188 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -13,6 +13,7 @@ anyhow = "1.0"
 base64 = "0.13"
 bzip2 = "0.4"
 flate2 = "1.0.22"
+globset = "0.4.8"
 hex = "0.4.3"
 lazy_static = "1.4"
 nix = "0.24"
diff --git a/debian/control b/debian/control
index 0741a7b..9fe6605 100644
--- a/debian/control
+++ b/debian/control
@@ -10,6 +10,7 @@ Build-Depends: debhelper (>= 12),
  librust-base64-0.13+default-dev,
  librust-bzip2-0.4+default-dev,
  librust-flate2-1+default-dev (>= 1.0.22-~~),
+ librust-globset-0.4+default-dev (>= 0.4.8-~~),
  librust-hex-0.4+default-dev (>= 0.4.3-~~),
  librust-lazy-static-1+default-dev (>= 1.4-~~),
  librust-nix-0.24+default-dev,
@@ -57,6 +58,7 @@ Depends:
  librust-base64-0.13+default-dev,
  librust-bzip2-0.4+default-dev,
  librust-flate2-1+default-dev (>= 1.0.22-~~),
+ librust-globset-0.4+default-dev (>= 0.4.8-~~),
  librust-hex-0.4+default-dev (>= 0.4.3-~~),
  librust-lazy-static-1+default-dev (>= 1.4-~~),
  librust-nix-0.24+default-dev,
diff --git a/src/bin/proxmox-offline-mirror.rs b/src/bin/proxmox-offline-mirror.rs
index 522056b..07b6ce6 100644
--- a/src/bin/proxmox-offline-mirror.rs
+++ b/src/bin/proxmox-offline-mirror.rs
@@ -13,7 +13,7 @@ use proxmox_offline_mirror::helpers::tty::{
     read_bool_from_tty, read_selection_from_tty, read_string_from_tty,
 };
 use proxmox_offline_mirror::{
-    config::{save_config, MediaConfig, MirrorConfig},
+    config::{save_config, MediaConfig, MirrorConfig, SkipConfig},
     mirror,
     types::{ProductType, MEDIA_ID_SCHEMA, MIRROR_ID_SCHEMA},
 };
@@ -387,6 +387,7 @@ fn action_add_mirror(config: &SectionConfigData) -> Result<Vec<MirrorConfig>, Er
                 base_dir: base_dir.clone(),
                 use_subscription: None,
                 ignore_errors: false,
+                skip: SkipConfig::default(), // TODO sensible default?
             });
         }
     }
@@ -401,6 +402,7 @@ fn action_add_mirror(config: &SectionConfigData) -> Result<Vec<MirrorConfig>, Er
         base_dir,
         use_subscription,
         ignore_errors: false,
+        skip: SkipConfig::default(),
     };
 
     configs.push(main_config);
diff --git a/src/bin/proxmox_offline_mirror_cmds/config.rs b/src/bin/proxmox_offline_mirror_cmds/config.rs
index 5ebf6d5..3ebf4ad 100644
--- a/src/bin/proxmox_offline_mirror_cmds/config.rs
+++ b/src/bin/proxmox_offline_mirror_cmds/config.rs
@@ -266,6 +266,14 @@ pub fn update_mirror(
         data.ignore_errors = ignore_errors
     }
 
+    if let Some(skip_packages) = update.skip.skip_packages {
+        data.skip.skip_packages = Some(skip_packages);
+    }
+
+    if let Some(skip_sections) = update.skip.skip_sections {
+        data.skip.skip_sections = Some(skip_sections);
+    }
+
     config.set_data(&id, "mirror", &data)?;
     proxmox_offline_mirror::config::save_config(&config_file, &config)?;
 
diff --git a/src/config.rs b/src/config.rs
index be8f96b..39b1193 100644
--- a/src/config.rs
+++ b/src/config.rs
@@ -14,6 +14,38 @@ use crate::types::{
     PROXMOX_SUBSCRIPTION_KEY_SCHEMA,
 };
 
+/// Skip Configuration
+#[api(
+    properties: {
+        "skip-sections": {
+            type: Array,
+            optional: true,
+            items: {
+                type: String,
+                description: "Section name",
+            },
+        },
+        "skip-packages": {
+            type: Array,
+            optional: true,
+            items: {
+                type: String,
+                description: "Package name",
+            },
+        },
+    },
+)]
+#[derive(Default, Serialize, Deserialize, Updater, Clone, Debug)]
+#[serde(rename_all = "kebab-case")]
+pub struct SkipConfig {
+    /// Sections which should be skipped
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub skip_sections: Option<Vec<String>>,
+    /// Packages which should be skipped, supports globbing
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub skip_packages: Option<Vec<String>>,
+}
+
 #[api(
     properties: {
         id: {
@@ -46,6 +78,9 @@ use crate::types::{
             optional: true,
             default: false,
         },
+        "skip": {
+            type: SkipConfig,
+        },
     }
 )]
 #[derive(Clone, Debug, Serialize, Deserialize, Updater)]
@@ -73,6 +108,9 @@ pub struct MirrorConfig {
     /// Whether to downgrade download errors to warnings
     #[serde(default)]
     pub ignore_errors: bool,
+    /// Skip package files using these criteria
+    #[serde(default, flatten)]
+    pub skip: SkipConfig,
 }
 
 #[api(
@@ -191,7 +229,7 @@ fn init() -> SectionConfig {
     let mut config = SectionConfig::new(&MIRROR_ID_SCHEMA);
 
     let mirror_schema = match MirrorConfig::API_SCHEMA {
-        Schema::Object(ref obj_schema) => obj_schema,
+        Schema::AllOf(ref all_of_schema) => all_of_schema,
         _ => unreachable!(),
     };
     let mirror_plugin = SectionConfigPlugin::new(
diff --git a/src/mirror.rs b/src/mirror.rs
index dfb4cc9..22dc716 100644
--- a/src/mirror.rs
+++ b/src/mirror.rs
@@ -7,12 +7,13 @@ use std::{
 
 use anyhow::{bail, format_err, Error};
 use flate2::bufread::GzDecoder;
+use globset::{Glob, GlobSetBuilder};
 use nix::libc;
 use proxmox_http::{client::sync::Client, HttpClient, HttpOptions};
 use proxmox_sys::fs::file_get_contents;
 
 use crate::{
-    config::{MirrorConfig, SubscriptionKey},
+    config::{MirrorConfig, SkipConfig, SubscriptionKey},
     convert_repo_line,
     pool::Pool,
     types::{Diff, Snapshot, SNAPSHOT_REGEX},
@@ -47,6 +48,7 @@ struct ParsedMirrorConfig {
     pub auth: Option<String>,
     pub client: Client,
     pub ignore_errors: bool,
+    pub skip: SkipConfig,
 }
 
 impl TryInto<ParsedMirrorConfig> for MirrorConfig {
@@ -76,6 +78,7 @@ impl TryInto<ParsedMirrorConfig> for MirrorConfig {
             auth: None,
             client,
             ignore_errors: self.ignore_errors,
+            skip: self.skip,
         })
     }
 }
@@ -664,8 +667,22 @@ pub fn create_snapshot(
         }
     }
 
+    let skipped_package_globs = if let Some(skipped_packages) = &config.skip.skip_packages {
+        let mut globs = GlobSetBuilder::new();
+        for glob in skipped_packages {
+            let glob = Glob::new(glob)?;
+            globs.add(glob);
+        }
+        let globs = globs.build()?;
+        Some(globs)
+    } else {
+        None
+    };
+
     println!("\nFetching packages..");
     let mut dry_run_progress = Progress::new();
+    let mut total_skipped_count = 0usize;
+    let mut total_skipped_bytes = 0usize;
     for (basename, references) in packages_indices {
         let total_files = references.files.len();
         if total_files == 0 {
@@ -676,7 +693,37 @@ pub fn create_snapshot(
         }
 
         let mut fetch_progress = Progress::new();
+        let mut skipped_count = 0usize;
+        let mut skipped_bytes = 0usize;
         for package in references.files {
+            if let Some(ref sections) = &config.skip.skip_sections {
+                if sections.iter().any(|section| package.section == *section) {
+                    println!(
+                        "\tskipping {} - {}b (section '{}')",
+                        package.package, package.size, package.section
+                    );
+                    skipped_count += 1;
+                    skipped_bytes += package.size;
+                    continue;
+                }
+            }
+            if let Some(skipped_package_globs) = &skipped_package_globs {
+                let matches = skipped_package_globs.matches(&package.package);
+                if !matches.is_empty() {
+                    // safety, skipped_package_globs is set based on this
+                    let globs = config.skip.skip_packages.as_ref().unwrap();
+                    let matches: Vec<String> = matches.iter().map(|i| globs[*i].clone()).collect();
+                    println!(
+                        "\tskipping {} - {}b (package glob(s): {})",
+                        package.package,
+                        package.size,
+                        matches.join(", ")
+                    );
+                    skipped_count += 1;
+                    skipped_bytes += package.size;
+                    continue;
+                }
+            }
             let url = get_repo_url(&config.repository, &package.file);
 
             if dry_run {
@@ -728,6 +775,11 @@ pub fn create_snapshot(
         } else {
             total_progress += fetch_progress;
         }
+        if skipped_count > 0 {
+            total_skipped_count += skipped_count;
+            total_skipped_bytes += skipped_bytes;
+            println!("Skipped downloading {skipped_count} packages totalling {skipped_bytes}b");
+        }
     }
 
     if dry_run {
@@ -736,6 +788,11 @@ pub fn create_snapshot(
     } else {
         println!("\nStats: {total_progress}");
     }
+    if total_count > 0 {
+        println!(
+            "Skipped downloading {total_skipped_count} packages totalling {total_skipped_bytes}b"
+        );
+    }
 
     if !warnings.is_empty() {
         eprintln!("Warnings:");
-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH proxmox-offline-mirror 2/4] mirror: implement source packages mirroring
  2022-10-18  9:20 [pve-devel] [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support Fabian Grünbichler
                   ` (2 preceding siblings ...)
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 1/4] mirror: add exclusion of packages/sections Fabian Grünbichler
@ 2022-10-18  9:20 ` Fabian Grünbichler
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 3/4] fix #4264: only require either Release or InRelease Fabian Grünbichler
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Fabian Grünbichler @ 2022-10-18  9:20 UTC (permalink / raw)
  To: pve-devel

similar to the binary package one, but with one additional layer since
each source package consists of 2-3 files, not a single .deb file.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    requires proxmox-apt with source index support

 src/mirror.rs | 158 +++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 150 insertions(+), 8 deletions(-)

diff --git a/src/mirror.rs b/src/mirror.rs
index 22dc716..37dca97 100644
--- a/src/mirror.rs
+++ b/src/mirror.rs
@@ -22,6 +22,7 @@ use crate::{
 use proxmox_apt::{
     deb822::{
         CheckSums, CompressionType, FileReference, FileReferenceType, PackagesFile, ReleaseFile,
+        SourcesFile,
     },
     repositories::{APTRepository, APTRepositoryPackageType},
 };
@@ -598,10 +599,15 @@ pub fn create_snapshot(
 
     let mut packages_size = 0_usize;
     let mut packages_indices = HashMap::new();
+
+    let mut source_packages_indices = HashMap::new();
+
     let mut failed_references = Vec::new();
     for (component, references) in per_component {
         println!("\nFetching indices for component '{component}'");
         let mut component_deb_size = 0;
+        let mut component_dsc_size = 0;
+
         let mut fetch_progress = Progress::new();
 
         for basename in references {
@@ -642,21 +648,49 @@ pub fn create_snapshot(
                 fetch_progress.update(&res);
 
                 if package_index_data.is_none() && reference.file_type.is_package_index() {
-                    package_index_data = Some(res.data());
+                    package_index_data = Some((&reference.file_type, res.data()));
                 }
             }
-            if let Some(data) = package_index_data {
-                let packages: PackagesFile = data[..].try_into()?;
-                let size: usize = packages.files.iter().map(|p| p.size).sum();
-                println!("\t{} packages totalling {size}", packages.files.len());
-                component_deb_size += size;
-
-                packages_indices.entry(basename).or_insert(packages);
+            if let Some((reference_type, data)) = package_index_data {
+                match reference_type {
+                    FileReferenceType::Packages(_, _) => {
+                        let packages: PackagesFile = data[..].try_into()?;
+                        let size: usize = packages.files.iter().map(|p| p.size).sum();
+                        println!("\t{} packages totalling {size}", packages.files.len());
+                        component_deb_size += size;
+
+                        packages_indices.entry(basename).or_insert(packages);
+                    }
+                    FileReferenceType::Sources(_) => {
+                        let source_packages: SourcesFile = data[..].try_into()?;
+                        let size: usize = source_packages
+                            .source_packages
+                            .iter()
+                            .map(|s| s.size())
+                            .sum();
+                        println!(
+                            "\t{} source packages totalling {size}",
+                            source_packages.source_packages.len()
+                        );
+                        component_dsc_size += size;
+                        source_packages_indices
+                            .entry(basename)
+                            .or_insert(source_packages);
+                    }
+                    unknown => {
+                        eprintln!("Unknown package index '{unknown:?}', skipping processing..")
+                    }
+                }
             }
             println!("Progress: {fetch_progress}");
         }
+
         println!("Total deb size for component: {component_deb_size}");
         packages_size += component_deb_size;
+
+        println!("Total dsc size for component: {component_dsc_size}");
+        packages_size += component_dsc_size;
+
         total_progress += fetch_progress;
     }
     println!("Total deb size: {packages_size}");
@@ -782,6 +816,114 @@ pub fn create_snapshot(
         }
     }
 
+    for (basename, references) in source_packages_indices {
+        let total_source_packages = references.source_packages.len();
+        if total_source_packages == 0 {
+            println!("\n{basename} - no files, skipping.");
+            continue;
+        } else {
+            println!("\n{basename} - {total_source_packages} total source package(s)");
+        }
+
+        let mut fetch_progress = Progress::new();
+        let mut skipped_count = 0usize;
+        let mut skipped_bytes = 0usize;
+        for package in references.source_packages {
+            if let Some(ref sections) = &config.skip.skip_sections {
+                if sections
+                    .iter()
+                    .any(|section| package.section.as_ref() == Some(section))
+                {
+                    println!(
+                        "\tskipping {} - {}b (section '{}')",
+                        package.package,
+                        package.size(),
+                        package.section.as_ref().unwrap(),
+                    );
+                    skipped_count += 1;
+                    skipped_bytes += package.size();
+                    continue;
+                }
+            }
+            if let Some(skipped_package_globs) = &skipped_package_globs {
+                let matches = skipped_package_globs.matches(&package.package);
+                if !matches.is_empty() {
+                    // safety, skipped_package_globs is set based on this
+                    let globs = config.skip.skip_packages.as_ref().unwrap();
+                    let matches: Vec<String> = matches.iter().map(|i| globs[*i].clone()).collect();
+                    println!(
+                        "\tskipping {} - {}b (package glob(s): {})",
+                        package.package,
+                        package.size(),
+                        matches.join(", ")
+                    );
+                    skipped_count += 1;
+                    skipped_bytes += package.size();
+                    continue;
+                }
+            }
+
+            for file_reference in package.files.values() {
+                let path = format!("{}/{}", package.directory, file_reference.file);
+                let url = get_repo_url(&config.repository, &path);
+
+                if dry_run {
+                    if config.pool.contains(&file_reference.checksums) {
+                        fetch_progress.update(&FetchResult {
+                            data: vec![],
+                            fetched: 0,
+                        });
+                    } else {
+                        println!("\t(dry-run) GET missing '{url}' ({}b)", file_reference.size);
+                        fetch_progress.update(&FetchResult {
+                            data: vec![],
+                            fetched: file_reference.size,
+                        });
+                    }
+                } else {
+                    let mut full_path = PathBuf::from(prefix);
+                    full_path.push(&path);
+
+                    match fetch_plain_file(
+                        &config,
+                        &url,
+                        &full_path,
+                        file_reference.size,
+                        &file_reference.checksums,
+                        false,
+                        dry_run,
+                    ) {
+                        Ok(res) => fetch_progress.update(&res),
+                        Err(err) if config.ignore_errors => {
+                            let msg = format!(
+                                "{}: failed to fetch package '{}' - {}",
+                                basename, file_reference.file, err,
+                            );
+                            eprintln!("{msg}");
+                            warnings.push(msg);
+                        }
+                        Err(err) => return Err(err),
+                    }
+                }
+
+                if fetch_progress.file_count() % (max(total_source_packages / 100, 1)) == 0 {
+                    println!("\tProgress: {fetch_progress}");
+                }
+            }
+        }
+        println!("\tProgress: {fetch_progress}");
+        if dry_run {
+            dry_run_progress += fetch_progress;
+        } else {
+            total_progress += fetch_progress;
+        }
+        if skipped_count > 0 {
+            total_skipped_count += skipped_count;
+            total_skipped_bytes += skipped_bytes;
+            println!("Skipped downloading {skipped_count} packages totalling {skipped_bytes}b");
+        }
+    }
+
     if dry_run {
         println!("\nDry-run Stats (indices, downloaded but not persisted):\n{total_progress}");
         println!("\nDry-run stats (packages, new == missing):\n{dry_run_progress}");
-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH proxmox-offline-mirror 3/4] fix #4264: only require either Release or InRelease
  2022-10-18  9:20 [pve-devel] [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support Fabian Grünbichler
                   ` (3 preceding siblings ...)
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 2/4] mirror: implement source packages mirroring Fabian Grünbichler
@ 2022-10-18  9:20 ` Fabian Grünbichler
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 4/4] mirror: refactor fetch_binary/source_packages Fabian Grünbichler
  2022-10-20 12:49 ` [pve-devel] applied-series: [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support Thomas Lamprecht
  6 siblings, 0 replies; 8+ messages in thread
From: Fabian Grünbichler @ 2022-10-18  9:20 UTC (permalink / raw)
  To: pve-devel

strictly speaking InRelease is required, and Release optional, but that
might not be true for older repositories. treat failure to fetch either
as non-fatal, provided the other is available.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 src/mirror.rs | 70 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 51 insertions(+), 19 deletions(-)

diff --git a/src/mirror.rs b/src/mirror.rs
index 37dca97..39b7f47 100644
--- a/src/mirror.rs
+++ b/src/mirror.rs
@@ -144,40 +144,61 @@ fn fetch_repo_file(
 
 /// Helper to fetch InRelease (`detached` == false) or Release/Release.gpg (`detached` == true) files from repository.
 ///
-/// Verifies the contained/detached signature, stores all fetched files under `prefix`, and returns the verified raw release file data.
+/// Verifies the contained/detached signature and stores all fetched files under `prefix`.
+/// 
+/// Returns the verified raw release file data, or None if the "fetch" part itself fails.
 fn fetch_release(
     config: &ParsedMirrorConfig,
     prefix: &Path,
     detached: bool,
     dry_run: bool,
-) -> Result<FetchResult, Error> {
+) -> Result<Option<FetchResult>, Error> {
     let (name, fetched, sig) = if detached {
         println!("Fetching Release/Release.gpg files");
-        let sig = fetch_repo_file(
+        let sig = match fetch_repo_file(
             &config.client,
             &get_dist_url(&config.repository, "Release.gpg"),
             1024 * 1024,
             None,
             config.auth.as_deref(),
-        )?;
-        let mut fetched = fetch_repo_file(
+        ) {
+            Ok(res) => res,
+            Err(err) => {
+                eprintln!("Release.gpg fetch failure: {err}");
+                return Ok(None);
+            }
+        };
+
+        let mut fetched = match fetch_repo_file(
             &config.client,
             &get_dist_url(&config.repository, "Release"),
             256 * 1024 * 1024,
             None,
             config.auth.as_deref(),
-        )?;
+        ) {
+            Ok(res) => res,
+            Err(err) => {
+                eprintln!("Release fetch failure: {err}");
+                return Ok(None);
+            }
+        };
         fetched.fetched += sig.fetched;
         ("Release(.gpg)", fetched, Some(sig.data()))
     } else {
         println!("Fetching InRelease file");
-        let fetched = fetch_repo_file(
+        let fetched = match fetch_repo_file(
             &config.client,
             &get_dist_url(&config.repository, "InRelease"),
             256 * 1024 * 1024,
             None,
             config.auth.as_deref(),
-        )?;
+        ) {
+            Ok(res) => res,
+            Err(err) => {
+                eprintln!("InRelease fetch failure: {err}");
+                return Ok(None);
+            }
+        };
         ("InRelease", fetched, None)
     };
 
@@ -193,10 +214,10 @@ fn fetch_release(
     };
 
     if dry_run {
-        return Ok(FetchResult {
+        return Ok(Some(FetchResult {
             data: verified,
             fetched: fetched.fetched,
-        });
+        }));
     }
 
     let locked = &config.pool.lock()?;
@@ -230,10 +251,10 @@ fn fetch_release(
         )?;
     }
 
-    Ok(FetchResult {
+    Ok(Some(FetchResult {
         data: verified,
         fetched: fetched.fetched,
-    })
+    }))
 }
 
 /// Helper to fetch an index file referenced by a `ReleaseFile`.
@@ -510,14 +531,25 @@ pub fn create_snapshot(
         Ok(parsed)
     };
 
-    // we want both on-disk for compat reasons
-    let res = fetch_release(&config, prefix, true, dry_run)?;
-    total_progress.update(&res);
-    let _release = parse_release(res, "Release")?;
+    // we want both on-disk for compat reasons, if both are available
+    let release = fetch_release(&config, prefix, true, dry_run)?
+        .map(|res| {
+            total_progress.update(&res);
+            parse_release(res, "Release")
+        })
+        .transpose()?;
+
+    let in_release = fetch_release(&config, prefix, false, dry_run)?
+        .map(|res| {
+            total_progress.update(&res);
+            parse_release(res, "InRelease")
+        })
+        .transpose()?;
 
-    let res = fetch_release(&config, prefix, false, dry_run)?;
-    total_progress.update(&res);
-    let release = parse_release(res, "InRelease")?;
+    // at least one must be available to proceed
+    let release = release
+        .or(in_release)
+        .ok_or_else(|| format_err!("Neither Release(.gpg) nor InRelease available!"))?;
 
     let mut per_component = HashMap::new();
     let mut others = Vec::new();
-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH proxmox-offline-mirror 4/4] mirror: refactor fetch_binary/source_packages
  2022-10-18  9:20 [pve-devel] [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support Fabian Grünbichler
                   ` (4 preceding siblings ...)
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 3/4] fix #4264: only require either Release or InRelease Fabian Grünbichler
@ 2022-10-18  9:20 ` Fabian Grünbichler
  2022-10-20 12:49 ` [pve-devel] applied-series: [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support Thomas Lamprecht
  6 siblings, 0 replies; 8+ messages in thread
From: Fabian Grünbichler @ 2022-10-18  9:20 UTC (permalink / raw)
  To: pve-devel

and pull out some of the progress variables into a struct.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 src/mirror.rs | 520 ++++++++++++++++++++++++++++----------------------
 1 file changed, 287 insertions(+), 233 deletions(-)

diff --git a/src/mirror.rs b/src/mirror.rs
index 39b7f47..faaaa19 100644
--- a/src/mirror.rs
+++ b/src/mirror.rs
@@ -7,7 +7,7 @@ use std::{
 
 use anyhow::{bail, format_err, Error};
 use flate2::bufread::GzDecoder;
-use globset::{Glob, GlobSetBuilder};
+use globset::{Glob, GlobSet, GlobSetBuilder};
 use nix::libc;
 use proxmox_http::{client::sync::Client, HttpClient, HttpOptions};
 use proxmox_sys::fs::file_get_contents;
@@ -145,7 +145,7 @@ fn fetch_repo_file(
 /// Helper to fetch InRelease (`detached` == false) or Release/Release.gpg (`detached` == true) files from repository.
 ///
 /// Verifies the contained/detached signature and stores all fetched files under `prefix`.
-/// 
+///
 /// Returns the verified raw release file data, or None if the "fetch" part itself fails.
 fn fetch_release(
     config: &ParsedMirrorConfig,
@@ -474,6 +474,259 @@ pub fn list_snapshots(config: &MirrorConfig) -> Result<Vec<Snapshot>, Error> {
     Ok(list)
 }
 
+struct MirrorProgress {
+    warnings: Vec<String>,
+    dry_run: Progress,
+    total: Progress,
+    skip_count: usize,
+    skip_bytes: usize,
+}
+
+fn convert_to_globset(config: &ParsedMirrorConfig) -> Result<Option<GlobSet>, Error> {
+    Ok(if let Some(skipped_packages) = &config.skip.skip_packages {
+        let mut globs = GlobSetBuilder::new();
+        for glob in skipped_packages {
+            let glob = Glob::new(glob)?;
+            globs.add(glob);
+        }
+        let globs = globs.build()?;
+        Some(globs)
+    } else {
+        None
+    })
+}
+
+fn fetch_binary_packages(
+    config: &ParsedMirrorConfig,
+    packages_indices: HashMap<&String, PackagesFile>,
+    dry_run: bool,
+    prefix: &Path,
+    progress: &mut MirrorProgress,
+) -> Result<(), Error> {
+    let skipped_package_globs = convert_to_globset(config)?;
+
+    for (basename, references) in packages_indices {
+        let total_files = references.files.len();
+        if total_files == 0 {
+            println!("\n{basename} - no files, skipping.");
+            continue;
+        } else {
+            println!("\n{basename} - {total_files} total file(s)");
+        }
+
+        let mut fetch_progress = Progress::new();
+        let mut skip_count = 0usize;
+        let mut skip_bytes = 0usize;
+        for package in references.files {
+            if let Some(ref sections) = &config.skip.skip_sections {
+                if sections.iter().any(|section| package.section == *section) {
+                    println!(
+                        "\tskipping {} - {}b (section '{}')",
+                        package.package, package.size, package.section
+                    );
+                    skip_count += 1;
+                    skip_bytes += package.size;
+                    continue;
+                }
+            }
+            if let Some(skipped_package_globs) = &skipped_package_globs {
+                let matches = skipped_package_globs.matches(&package.package);
+                if !matches.is_empty() {
+                    // safety, skipped_package_globs is set based on this
+                    let globs = config.skip.skip_packages.as_ref().unwrap();
+                    let matches: Vec<String> = matches.iter().map(|i| globs[*i].clone()).collect();
+                    println!(
+                        "\tskipping {} - {}b (package glob(s): {})",
+                        package.package,
+                        package.size,
+                        matches.join(", ")
+                    );
+                    skip_count += 1;
+                    skip_bytes += package.size;
+                    continue;
+                }
+            }
+            let url = get_repo_url(&config.repository, &package.file);
+
+            if dry_run {
+                if config.pool.contains(&package.checksums) {
+                    fetch_progress.update(&FetchResult {
+                        data: vec![],
+                        fetched: 0,
+                    });
+                } else {
+                    println!("\t(dry-run) GET missing '{url}' ({}b)", package.size);
+                    fetch_progress.update(&FetchResult {
+                        data: vec![],
+                        fetched: package.size,
+                    });
+                }
+            } else {
+                let mut full_path = PathBuf::from(prefix);
+                full_path.push(&package.file);
+
+                match fetch_plain_file(
+                    config,
+                    &url,
+                    &full_path,
+                    package.size,
+                    &package.checksums,
+                    false,
+                    dry_run,
+                ) {
+                    Ok(res) => fetch_progress.update(&res),
+                    Err(err) if config.ignore_errors => {
+                        let msg = format!(
+                            "{}: failed to fetch package '{}' - {}",
+                            basename, package.file, err,
+                        );
+                        eprintln!("{msg}");
+                        progress.warnings.push(msg);
+                    }
+                    Err(err) => return Err(err),
+                }
+            }
+
+            if fetch_progress.file_count() % (max(total_files / 100, 1)) == 0 {
+                println!("\tProgress: {fetch_progress}");
+            }
+        }
+        println!("\tProgress: {fetch_progress}");
+        if dry_run {
+            progress.dry_run += fetch_progress;
+        } else {
+            progress.total += fetch_progress;
+        }
+        if skip_count > 0 {
+            progress.skip_count += skip_count;
+            progress.skip_bytes += skip_bytes;
+            println!("Skipped downloading {skip_count} packages totalling {skip_bytes}b");
+        }
+    }
+
+    Ok(())
+}
+
+fn fetch_source_packages(
+    config: &ParsedMirrorConfig,
+    source_packages_indices: HashMap<&String, SourcesFile>,
+    dry_run: bool,
+    prefix: &Path,
+    progress: &mut MirrorProgress,
+) -> Result<(), Error> {
+    let skipped_package_globs = convert_to_globset(config)?;
+
+    for (basename, references) in source_packages_indices {
+        let total_source_packages = references.source_packages.len();
+        if total_source_packages == 0 {
+            println!("\n{basename} - no files, skipping.");
+            continue;
+        } else {
+            println!("\n{basename} - {total_source_packages} total source package(s)");
+        }
+
+        let mut fetch_progress = Progress::new();
+        let mut skip_count = 0usize;
+        let mut skip_bytes = 0usize;
+        for package in references.source_packages {
+            if let Some(ref sections) = &config.skip.skip_sections {
+                if sections
+                    .iter()
+                    .any(|section| package.section.as_ref() == Some(section))
+                {
+                    println!(
+                        "\tskipping {} - {}b (section '{}')",
+                        package.package,
+                        package.size(),
+                        package.section.as_ref().unwrap(),
+                    );
+                    skip_count += 1;
+                    skip_bytes += package.size();
+                    continue;
+                }
+            }
+            if let Some(skipped_package_globs) = &skipped_package_globs {
+                let matches = skipped_package_globs.matches(&package.package);
+                if !matches.is_empty() {
+                    // safety, skipped_package_globs is set based on this
+                    let globs = config.skip.skip_packages.as_ref().unwrap();
+                    let matches: Vec<String> = matches.iter().map(|i| globs[*i].clone()).collect();
+                    println!(
+                        "\tskipping {} - {}b (package glob(s): {})",
+                        package.package,
+                        package.size(),
+                        matches.join(", ")
+                    );
+                    skip_count += 1;
+                    skip_bytes += package.size();
+                    continue;
+                }
+            }
+
+            for file_reference in package.files.values() {
+                let path = format!("{}/{}", package.directory, file_reference.file);
+                let url = get_repo_url(&config.repository, &path);
+
+                if dry_run {
+                    if config.pool.contains(&file_reference.checksums) {
+                        fetch_progress.update(&FetchResult {
+                            data: vec![],
+                            fetched: 0,
+                        });
+                    } else {
+                        println!("\t(dry-run) GET missing '{url}' ({}b)", file_reference.size);
+                        fetch_progress.update(&FetchResult {
+                            data: vec![],
+                            fetched: file_reference.size,
+                        });
+                    }
+                } else {
+                    let mut full_path = PathBuf::from(prefix);
+                    full_path.push(&path);
+
+                    match fetch_plain_file(
+                        config,
+                        &url,
+                        &full_path,
+                        file_reference.size,
+                        &file_reference.checksums,
+                        false,
+                        dry_run,
+                    ) {
+                        Ok(res) => fetch_progress.update(&res),
+                        Err(err) if config.ignore_errors => {
+                            let msg = format!(
+                                "{}: failed to fetch package '{}' - {}",
+                                basename, file_reference.file, err,
+                            );
+                            eprintln!("{msg}");
+                            progress.warnings.push(msg);
+                        }
+                        Err(err) => return Err(err),
+                    }
+                }
+
+                if fetch_progress.file_count() % (max(total_source_packages / 100, 1)) == 0 {
+                    println!("\tProgress: {fetch_progress}");
+                }
+            }
+        }
+        println!("\tProgress: {fetch_progress}");
+        if dry_run {
+            progress.dry_run += fetch_progress;
+        } else {
+            progress.total += fetch_progress;
+        }
+        if skip_count > 0 {
+            progress.skip_count += skip_count;
+            progress.skip_bytes += skip_bytes;
+            println!("Skipped downloading {skip_count} packages totalling {skip_bytes}b");
+        }
+    }
+
+    Ok(())
+}
+
 /// Create a new snapshot of the remote repository, fetching and storing files as needed.
 ///
 /// Operates in three phases:
@@ -518,8 +771,13 @@ pub fn create_snapshot(
     let prefix = format!("{snapshot}.tmp");
     let prefix = Path::new(&prefix);
 
-    let mut total_progress = Progress::new();
-    let mut warnings = Vec::new();
+    let mut progress = MirrorProgress {
+        warnings: Vec::new(),
+        skip_count: 0,
+        skip_bytes: 0,
+        dry_run: Progress::new(),
+        total: Progress::new(),
+    };
 
     let parse_release = |res: FetchResult, name: &str| -> Result<ReleaseFile, Error> {
         println!("Parsing {name}..");
@@ -534,14 +792,14 @@ pub fn create_snapshot(
     // we want both on-disk for compat reasons, if both are available
     let release = fetch_release(&config, prefix, true, dry_run)?
         .map(|res| {
-            total_progress.update(&res);
+            progress.total.update(&res);
             parse_release(res, "Release")
         })
         .transpose()?;
 
     let in_release = fetch_release(&config, prefix, false, dry_run)?
         .map(|res| {
-            total_progress.update(&res);
+            progress.total.update(&res);
             parse_release(res, "InRelease")
         })
         .transpose()?;
@@ -671,7 +929,7 @@ pub fn create_snapshot(
                             reference.file_type, reference.path
                         );
                         eprintln!("{msg}");
-                        warnings.push(msg);
+                        progress.warnings.push(msg);
                         failed_references.push(reference);
                         continue;
                     }
@@ -723,7 +981,7 @@ pub fn create_snapshot(
         println!("Total dsc size for component: {component_dsc_size}");
         packages_size += component_dsc_size;
 
-        total_progress += fetch_progress;
+        progress.total += fetch_progress;
     }
     println!("Total deb size: {packages_size}");
     if !failed_references.is_empty() {
@@ -733,244 +991,40 @@ pub fn create_snapshot(
         }
     }
 
-    let skipped_package_globs = if let Some(skipped_packages) = &config.skip.skip_packages {
-        let mut globs = GlobSetBuilder::new();
-        for glob in skipped_packages {
-            let glob = Glob::new(glob)?;
-            globs.add(glob);
-        }
-        let globs = globs.build()?;
-        Some(globs)
-    } else {
-        None
-    };
-
     println!("\nFetching packages..");
-    let mut dry_run_progress = Progress::new();
-    let mut total_skipped_count = 0usize;
-    let mut total_skipped_bytes = 0usize;
-    for (basename, references) in packages_indices {
-        let total_files = references.files.len();
-        if total_files == 0 {
-            println!("\n{basename} - no files, skipping.");
-            continue;
-        } else {
-            println!("\n{basename} - {total_files} total file(s)");
-        }
-
-        let mut fetch_progress = Progress::new();
-        let mut skipped_count = 0usize;
-        let mut skipped_bytes = 0usize;
-        for package in references.files {
-            if let Some(ref sections) = &config.skip.skip_sections {
-                if sections.iter().any(|section| package.section == *section) {
-                    println!(
-                        "\tskipping {} - {}b (section '{}')",
-                        package.package, package.size, package.section
-                    );
-                    skipped_count += 1;
-                    skipped_bytes += package.size;
-                    continue;
-                }
-            }
-            if let Some(skipped_package_globs) = &skipped_package_globs {
-                let matches = skipped_package_globs.matches(&package.package);
-                if !matches.is_empty() {
-                    // safety, skipped_package_globs is set based on this
-                    let globs = config.skip.skip_packages.as_ref().unwrap();
-                    let matches: Vec<String> = matches.iter().map(|i| globs[*i].clone()).collect();
-                    println!(
-                        "\tskipping {} - {}b (package glob(s): {})",
-                        package.package,
-                        package.size,
-                        matches.join(", ")
-                    );
-                    skipped_count += 1;
-                    skipped_bytes += package.size;
-                    continue;
-                }
-            }
-            let url = get_repo_url(&config.repository, &package.file);
-
-            if dry_run {
-                if config.pool.contains(&package.checksums) {
-                    fetch_progress.update(&FetchResult {
-                        data: vec![],
-                        fetched: 0,
-                    });
-                } else {
-                    println!("\t(dry-run) GET missing '{url}' ({}b)", package.size);
-                    fetch_progress.update(&FetchResult {
-                        data: vec![],
-                        fetched: package.size,
-                    });
-                }
-            } else {
-                let mut full_path = PathBuf::from(prefix);
-                full_path.push(&package.file);
-
-                match fetch_plain_file(
-                    &config,
-                    &url,
-                    &full_path,
-                    package.size,
-                    &package.checksums,
-                    false,
-                    dry_run,
-                ) {
-                    Ok(res) => fetch_progress.update(&res),
-                    Err(err) if config.ignore_errors => {
-                        let msg = format!(
-                            "{}: failed to fetch package '{}' - {}",
-                            basename, package.file, err,
-                        );
-                        eprintln!("{msg}");
-                        warnings.push(msg);
-                    }
-                    Err(err) => return Err(err),
-                }
-            }
-
-            if fetch_progress.file_count() % (max(total_files / 100, 1)) == 0 {
-                println!("\tProgress: {fetch_progress}");
-            }
-        }
-        println!("\tProgress: {fetch_progress}");
-        if dry_run {
-            dry_run_progress += fetch_progress;
-        } else {
-            total_progress += fetch_progress;
-        }
-        if skipped_count > 0 {
-            total_skipped_count += skipped_count;
-            total_skipped_bytes += skipped_bytes;
-            println!("Skipped downloading {skipped_count} packages totalling {skipped_bytes}b");
-        }
-    }
 
-    for (basename, references) in source_packages_indices {
-        let total_source_packages = references.source_packages.len();
-        if total_source_packages == 0 {
-            println!("\n{basename} - no files, skipping.");
-            continue;
-        } else {
-            println!("\n{basename} - {total_source_packages} total source package(s)");
-        }
+    fetch_binary_packages(&config, packages_indices, dry_run, prefix, &mut progress)?;
 
-        let mut fetch_progress = Progress::new();
-        let mut skipped_count = 0usize;
-        let mut skipped_bytes = 0usize;
-        for package in references.source_packages {
-            if let Some(ref sections) = &config.skip.skip_sections {
-                if sections
-                    .iter()
-                    .any(|section| package.section.as_ref() == Some(section))
-                {
-                    println!(
-                        "\tskipping {} - {}b (section '{}')",
-                        package.package,
-                        package.size(),
-                        package.section.as_ref().unwrap(),
-                    );
-                    skipped_count += 1;
-                    skipped_bytes += package.size();
-                    continue;
-                }
-            }
-            if let Some(skipped_package_globs) = &skipped_package_globs {
-                let matches = skipped_package_globs.matches(&package.package);
-                if !matches.is_empty() {
-                    // safety, skipped_package_globs is set based on this
-                    let globs = config.skip.skip_packages.as_ref().unwrap();
-                    let matches: Vec<String> = matches.iter().map(|i| globs[*i].clone()).collect();
-                    println!(
-                        "\tskipping {} - {}b (package glob(s): {})",
-                        package.package,
-                        package.size(),
-                        matches.join(", ")
-                    );
-                    skipped_count += 1;
-                    skipped_bytes += package.size();
-                    continue;
-                }
-            }
-
-            for file_reference in package.files.values() {
-                let path = format!("{}/{}", package.directory, file_reference.file);
-                let url = get_repo_url(&config.repository, &path);
-
-                if dry_run {
-                    if config.pool.contains(&file_reference.checksums) {
-                        fetch_progress.update(&FetchResult {
-                            data: vec![],
-                            fetched: 0,
-                        });
-                    } else {
-                        println!("\t(dry-run) GET missing '{url}' ({}b)", file_reference.size);
-                        fetch_progress.update(&FetchResult {
-                            data: vec![],
-                            fetched: file_reference.size,
-                        });
-                    }
-                } else {
-                    let mut full_path = PathBuf::from(prefix);
-                    full_path.push(&path);
-
-                    match fetch_plain_file(
-                        &config,
-                        &url,
-                        &full_path,
-                        file_reference.size,
-                        &file_reference.checksums,
-                        false,
-                        dry_run,
-                    ) {
-                        Ok(res) => fetch_progress.update(&res),
-                        Err(err) if config.ignore_errors => {
-                            let msg = format!(
-                                "{}: failed to fetch package '{}' - {}",
-                                basename, file_reference.file, err,
-                            );
-                            eprintln!("{msg}");
-                            warnings.push(msg);
-                        }
-                        Err(err) => return Err(err),
-                    }
-                }
-
-                if fetch_progress.file_count() % (max(total_source_packages / 100, 1)) == 0 {
-                    println!("\tProgress: {fetch_progress}");
-                }
-            }
-        }
-        println!("\tProgress: {fetch_progress}");
-        if dry_run {
-            dry_run_progress += fetch_progress;
-        } else {
-            total_progress += fetch_progress;
-        }
-        if skipped_count > 0 {
-            total_skipped_count += skipped_count;
-            total_skipped_bytes += skipped_bytes;
-            println!("Skipped downloading {skipped_count} packages totalling {skipped_bytes}b");
-        }
-    }
+    fetch_source_packages(
+        &config,
+        source_packages_indices,
+        dry_run,
+        prefix,
+        &mut progress,
+    )?;
 
     if dry_run {
-        println!("\nDry-run Stats (indices, downloaded but not persisted):\n{total_progress}");
-        println!("\nDry-run stats (packages, new == missing):\n{dry_run_progress}");
+        println!(
+            "\nDry-run Stats (indices, downloaded but not persisted):\n{}",
+            progress.total
+        );
+        println!(
+            "\nDry-run stats (packages, new == missing):\n{}",
+            progress.dry_run
+        );
     } else {
-        println!("\nStats: {total_progress}");
+        println!("\nStats: {}", progress.total);
     }
     if total_count > 0 {
         println!(
-            "Skipped downloading {total_skipped_count} packages totalling {total_skipped_bytes}b"
+            "Skipped downloading {} packages totalling {}b",
+            progress.skip_count, progress.skip_bytes,
         );
     }
 
-    if !warnings.is_empty() {
+    if !progress.warnings.is_empty() {
         eprintln!("Warnings:");
-        for msg in warnings {
+        for msg in progress.warnings {
             eprintln!("- {msg}");
         }
     }
-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] applied-series: [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support
  2022-10-18  9:20 [pve-devel] [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support Fabian Grünbichler
                   ` (5 preceding siblings ...)
  2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 4/4] mirror: refactor fetch_binary/source_packages Fabian Grünbichler
@ 2022-10-20 12:49 ` Thomas Lamprecht
  6 siblings, 0 replies; 8+ messages in thread
From: Thomas Lamprecht @ 2022-10-20 12:49 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

Am 18/10/2022 um 11:20 schrieb Fabian Grünbichler:
> this series implements filtering based on package section (exact match)
> or package name (glob), and extends mirroring support to source
> packages/deb-src repositories.
> 
> technically the first patch in proxmox-apt is a breaking change, but the
> only user of the changed struct is proxmox-offline-mirror, which doesn't
> do any incompatible initializations.
> 
> proxmox-apt:
> 
> Fabian Grünbichler (2):
>   packages file: add section field
>   deb822: source index support
> 
>  src/deb822/mod.rs                             |      3 +
>  src/deb822/packages_file.rs                   |      2 +
>  src/deb822/release_file.rs                    |      2 +-
>  src/deb822/sources_file.rs                    |    255 +
>  ..._debian_dists_bullseye_main_source_Sources | 858657 +++++++++++++++
>  5 files changed, 858918 insertions(+), 1 deletion(-)
>  create mode 100644 src/deb822/sources_file.rs
>  create mode 100644 tests/deb822/sources/deb.debian.org_debian_dists_bullseye_main_source_Sources
> 
> proxmox-offline-mirror:
> 
> Fabian Grünbichler (4):
>   mirror: add exclusion of packages/sections
>   mirror: implement source packages mirroring
>   fix #4264: only require either Release or InRelease
>   mirror: refactor fetch_binary/source_packages
> 
>  Cargo.toml                                    |   1 +
>  debian/control                                |   2 +
>  src/bin/proxmox-offline-mirror.rs             |   4 +-
>  src/bin/proxmox_offline_mirror_cmds/config.rs |   8 +
>  src/config.rs                                 |  40 +-
>  src/mirror.rs                                 | 483 ++++++++++++++----
>  6 files changed, 437 insertions(+), 101 deletions(-)
> 

applied series, thanks!

Waiting for some doc patches before bumping, describing how to use this with ideally
common sensible section filters like 'games' and 'kernel' as I don't think many people
will find this in the rather hidden usage, at least not until its "too late" and they
already downloaded way more than they wanted (in most cases).




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-10-20 12:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-18  9:20 [pve-devel] [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support Fabian Grünbichler
2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-apt 1/2] packages file: add section field Fabian Grünbichler
2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-apt 2/2] deb822: source index support Fabian Grünbichler
2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 1/4] mirror: add exclusion of packages/sections Fabian Grünbichler
2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 2/4] mirror: implement source packages mirroring Fabian Grünbichler
2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 3/4] fix #4264: only require either Release or InRelease Fabian Grünbichler
2022-10-18  9:20 ` [pve-devel] [PATCH proxmox-offline-mirror 4/4] mirror: refactor fetch_binary/source_packages Fabian Grünbichler
2022-10-20 12:49 ` [pve-devel] applied-series: [PATCH-SERIES 0/6] proxmox-offline-mirror filtering & deb-src support Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal