all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: pmg-devel@lists.proxmox.com
Subject: [PATCH log-tracker v2 3/3] fix #7702: improve postfix QID parsing
Date: Wed, 17 Jun 2026 09:47:50 +0200	[thread overview]
Message-ID: <20260617074934.955079-3-d.csapak@proxmox.com> (raw)
In-Reply-To: <20260617074934.955079-1-d.csapak@proxmox.com>

since adding support for postfix's long QIDs, the criteria was that the
QIDs had to be alphanumeric (since no built-in base52 check exists).

This lead to false positives in some logs that had lines such as:
 postfix/smtpd[xxxx]: warning: hostname example.org does not resolve to address x.y.z.w

Where the 'warning:' was interpreted as QIDs.
(There are other things postfix can print here that were too interpreted
as QIDs, this is just an example).

To fix it, add a simple state machine during parsing that keeps track of
what should be encountered in a long/short QID and abort if there is an
impossible combination (e.g. vowels in a long QID).

Since adding new checks here could be performance critical, I
benchmarked it and it showed no significant performance changes.

Setup:
A ~2.5 GiB large syslog (on ram; otherwise we're likely IO bottlenecked)
filtered for some time range and a search for a from address:

Benchmark 1: current-master
  Time (mean ± σ):      4.017 s ±  0.013 s    [User: 3.765 s, System: 0.252 s]
  Range (min … max):    4.004 s …  4.041 s    10 runs

Benchmark 2: this-patch
  Time (mean ± σ):      3.998 s ±  0.034 s    [User: 3.752 s, System: 0.245 s]
  Range (min … max):    3.938 s …  4.073 s    10 runs

Summary
  this-patch ran
    1.00 ± 0.01 times faster than current-master

Need to adapt one test that was not a valid long qid.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v1:
* adapted to split out parameter removal
* improved comment
* updated commit message
* added a minimum length check at the beginning of parse_qid

 src/main.rs                        | 143 ++++++++++++++++++++++++-----
 tests/test_input_misleading_lines  |  20 ++++
 tests/test_output_misleading_lines |  38 ++++++++
 tests/tests_misleading_lines.rs    |  39 ++++++++
 4 files changed, 219 insertions(+), 21 deletions(-)
 create mode 100644 tests/test_input_misleading_lines
 create mode 100644 tests/test_output_misleading_lines
 create mode 100644 tests/tests_misleading_lines.rs

diff --git a/src/main.rs b/src/main.rs
index 3b815cb..cd5ad7e 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -1,6 +1,7 @@
 use std::cell::RefCell;
 use std::collections::HashMap;
 use std::ffi::CString;
+use std::ops::RangeInclusive;
 use std::rc::{Rc, Weak};
 
 use std::fs::File;
@@ -2123,6 +2124,26 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
     }
 }
 
+fn is_vowel(c: u8) -> bool {
+    c == b'A'
+        || c == b'E'
+        || c == b'a'
+        || c == b'e'
+        || c == b'I'
+        || c == b'i'
+        || c == b'O'
+        || c == b'o'
+        || c == b'U'
+        || c == b'u'
+}
+
+enum QidVariant {
+    Unknown,
+    Short,
+    Long,
+    Invalid,
+}
+
 /// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes`
 /// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is
 /// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the
@@ -2130,31 +2151,111 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
 /// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first.
 const POSTFIX_QID_MAX_LEN: usize = 25;
 
-/// Parse a queue ID and return a tuple of (qid, remaining_text) or None.
+/// A short postfix QID can only have between 6 and 21 characters, 5 from usec of the timestamp and
+/// between 1 and 16 from the inode nr printed as '%lX'
+const SHORT_QID_RANGE: RangeInclusive<usize> = 6..=21;
+
+/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in
+/// microseconds -> 4 chars + 'z' + at least one char for the inode
+const LONG_QID_MIN: usize = 12;
+
+/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None.
 ///
 /// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are
 /// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52
-/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at
-/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows
-/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`.
+/// alphabet. The scan tries to determine if it's a short or long QID and cancels
+/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a
+/// long QID). It assumes that only non alphanumeric characters are the delimiter,
+/// so e.g. a long QID followed by a vowel would be detected as invalid.
 fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
+    // check the minimum length first
+    if data.len() < *SHORT_QID_RANGE.start() {
+        return None;
+    }
+
     // to simplify limit max to data.len()
     let max = POSTFIX_QID_MAX_LEN.min(data.len());
-    // take at most max, find the first non-alphanumeric byte
-    match data
-        .iter()
-        .take(max)
-        .position(|b| !b.is_ascii_alphanumeric())
-    {
-        // if there were less than 5 return nothing
-        // the QID always has at least 5 characters for the microseconds (see
-        // http://www.postfix.org/postconf.5.html#enable_long_queue_ids)
-        Some(n) if n < 5 => None,
-        // otherwise split at the first non-alphanumeric byte
-        Some(n) => Some(data.split_at(n)),
-        // or return 'max' length QID if no non-alphanumeric byte is found
-        None => Some(data.split_at(max)),
-    }
+
+    let mut z_seen = false;
+    let mut variant = QidVariant::Unknown;
+
+    let position = data.iter().take(max).position(|b| match variant {
+        QidVariant::Unknown => {
+            if b.is_ascii_hexdigit() {
+                if is_vowel(*b) {
+                    // since vowels are forbidden in long qids, it must be a short one
+                    variant = QidVariant::Short;
+                }
+                return false;
+            }
+
+            if b.is_ascii_alphanumeric() {
+                if is_vowel(*b) {
+                    // it's a non hexadecimal vowel, so it can't be a short or a long qid
+                    variant = QidVariant::Invalid;
+                    return true;
+                }
+
+                if *b == b'z' {
+                    z_seen = true;
+                }
+
+                variant = QidVariant::Long;
+                return false;
+            }
+
+            true
+        }
+        QidVariant::Short => !b.is_ascii_hexdigit(),
+        QidVariant::Long => {
+            if !b.is_ascii_alphanumeric() {
+                return true;
+            }
+
+            if is_vowel(*b) {
+                // vowel detected in a long qid, must be invalid
+                variant = QidVariant::Invalid;
+                return true;
+            }
+
+            if *b == b'z' {
+                z_seen = true;
+            }
+
+            false
+        }
+        QidVariant::Invalid => true,
+    });
+
+    let pos = match (variant, z_seen, position) {
+        // these must be short variants that don't have any vowels in them
+        (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
+        (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
+
+        // short variants
+        (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
+        (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
+
+        (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max),
+        (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n),
+
+        // short variant can't have z -> invalid
+        (QidVariant::Short, true, _) => None,
+
+        // long variant without z is invalid
+        (QidVariant::Long, false, _) => None,
+
+        // this can't happen since we can have only a z_seen in Long or when changing to Long
+        (QidVariant::Unknown, true, _) => {
+            eprintln!("internal error, unknown QID variant with 'z'");
+            None
+        }
+
+        // everything else is also invalid
+        _ => None,
+    };
+
+    pos.map(|pos| data.split_at(pos))
 }
 
 /// Parse a queue ID followed by the `": "` delimiter, returning (qid, remaining_text) or None.
@@ -2450,8 +2551,8 @@ mod tests {
         // postfix long queue ID (enable_long_queue_ids = yes) drawn from the
         // base-52 alphabet, i.e. containing non-hex letters
         assert_eq!(
-            parse_qid(b"4Zk8mP2gqRz: removed"),
-            Some((&b"4Zk8mP2gqRz"[..], &b": removed"[..])),
+            parse_qid(b"4Zk8mP2gqRz1: removed"),
+            Some((&b"4Zk8mP2gqRz1"[..], &b": removed"[..])),
         );
     }
 
diff --git a/tests/test_input_misleading_lines b/tests/test_input_misleading_lines
new file mode 100644
index 0000000..2512ec0
--- /dev/null
+++ b/tests/test_input_misleading_lines
@@ -0,0 +1,20 @@
+2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4
+2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4]
+2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4]
+2026-06-16T02:26:10.588801+02:00 pmg postfix/postscreen[407424]: CONNECT from [1.2.3.4]:44340 to [10.10.10.10]:25
+2026-06-16T02:26:10.589017+02:00 pmg postfix/postscreen[407424]: PASS OLD [1.2.3.4]:44340
+2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4
+2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4]
+2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id=<xyz>
+2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=<test@pmg.local>, size=1337, nrcpt=1 (queue active)
+2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+2026-06-16T02:26:10.673820+02:00 pmg pmg-smtp-filter[4049085]: 2026/06/16-02:26:10 CONNECT TCP Peer: "[127.0.0.1]:33030" Local: "[127.0.0.1]:10024"
+2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id=<xyz>
+2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999)
+2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for <test@pmg.local> to spam quarantine - 432143214321 (rule: Quarantine)
+2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0)
+2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=<test@pmg.local>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234))
+2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed
+2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
diff --git a/tests/test_output_misleading_lines b/tests/test_output_misleading_lines
new file mode 100644
index 0000000..b54ca32
--- /dev/null
+++ b/tests/test_output_misleading_lines
@@ -0,0 +1,38 @@
+# LogReader: 3647914
+# Query options
+# Start: 2020-12-18 15:00:00 (1608300000)
+# End: 2026-06-16 15:40:00 (1781617200)
+# End Query Options
+
+QENTRY: 123412341234
+CTIME: 6A309823
+SIZE: 1337
+CLIENT: unknown[1.2.3.4]
+MSGID: <xyz>
+TO:6A309823:123412341234:Q: from <test@pmg.local> to <test@pmg.local> (432143214321)
+SMTP:
+L00000001 2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4
+L00000002 2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4]
+L00000003 2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+L00000004 2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4]
+L0000000B 2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+FILTER: 123412341234
+L0000000D 2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id=<xyz>
+L0000000F 2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999)
+L00000010 2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for <test@pmg.local> to spam quarantine - 432143214321 (rule: Quarantine)
+L00000011 2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0)
+QMGR:
+L00000009 2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id=<xyz>
+L0000000A 2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=<test@pmg.local>, size=1337, nrcpt=1 (queue active)
+L00000012 2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=<test@pmg.local>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234))
+L00000013 2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed
+
+SMTPD: T6A309822L00000004
+CTIME: 6A309824
+CLIENT: unknown[1.2.3.4]
+LOGS:
+L00000007 2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4
+L00000008 2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4]
+L0000000E 2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+L00000014 2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+
diff --git a/tests/tests_misleading_lines.rs b/tests/tests_misleading_lines.rs
new file mode 100644
index 0000000..fe6a652
--- /dev/null
+++ b/tests/tests_misleading_lines.rs
@@ -0,0 +1,39 @@
+use std::fs::File;
+use std::io::BufReader;
+use std::process::Command;
+mod utils;
+
+// Some postfix lines look like they could be a QID, like 'warning: [..];' but
+// aren't. Test that they don't produce any stray messages on stderr when multiple
+// like this exist (meaning they would have been parsed as QIDs)
+#[test]
+fn misleading_lines_are_ignored() {
+    let output = Command::new("faketime")
+        .env("TZ", "Europe/Vienna")
+        .arg("2020-12-31 23:59:59")
+        .arg(utils::log_tracker_path())
+        .arg("-vv")
+        .arg("-s")
+        .arg("2020-12-18 15:00:00")
+        .arg("-e")
+        .arg("2026-06-16 15:40:00")
+        .arg("-i")
+        .arg("tests/test_input_misleading_lines")
+        .output()
+        .expect("failed to execute pmg-log-tracker");
+
+    assert_eq!(String::from_utf8_lossy(&output.stderr), "".to_string());
+
+    assert!(
+        output.status.success(),
+        "pmg-log-tracker failed: {}",
+        String::from_utf8_lossy(&output.stderr)
+    );
+
+    let expected_file =
+        File::open("tests/test_output_misleading_lines").expect("failed to open test_output");
+
+    let expected_output = BufReader::new(&expected_file);
+    let output_reader = BufReader::new(&output.stdout[..]);
+    utils::compare_output(output_reader, expected_output);
+}
-- 
2.47.3





      parent reply	other threads:[~2026-06-17  7:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17  7:47 [PATCH log-tracker v2 1/3] split out pmg-smtp-filter qid parsing Dominik Csapak
2026-06-17  7:47 ` [PATCH log-tracker v2 2/3] parse_qid{_prefix}: remove 'max' parameter Dominik Csapak
2026-06-17  7:47 ` Dominik Csapak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260617074934.955079-3-d.csapak@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal