public inbox for pmg-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: pmg-devel@lists.proxmox.com
Subject: [PATCH log-tracker v2 3/3] fix #7702: improve postfix QID parsing
Date: Wed, 17 Jun 2026 09:47:50 +0200	[thread overview]
Message-ID: <20260617074934.955079-3-d.csapak@proxmox.com> (raw)
In-Reply-To: <20260617074934.955079-1-d.csapak@proxmox.com>

since adding support for postfix's long QIDs, the criteria was that the
QIDs had to be alphanumeric (since no built-in base52 check exists).

This lead to false positives in some logs that had lines such as:
 postfix/smtpd[xxxx]: warning: hostname example.org does not resolve to address x.y.z.w

Where the 'warning:' was interpreted as QIDs.
(There are other things postfix can print here that were too interpreted
as QIDs, this is just an example).

To fix it, add a simple state machine during parsing that keeps track of
what should be encountered in a long/short QID and abort if there is an
impossible combination (e.g. vowels in a long QID).

Since adding new checks here could be performance critical, I
benchmarked it and it showed no significant performance changes.

Setup:
A ~2.5 GiB large syslog (on ram; otherwise we're likely IO bottlenecked)
filtered for some time range and a search for a from address:

Benchmark 1: current-master
  Time (mean ± σ):      4.017 s ±  0.013 s    [User: 3.765 s, System: 0.252 s]
  Range (min … max):    4.004 s …  4.041 s    10 runs

Benchmark 2: this-patch
  Time (mean ± σ):      3.998 s ±  0.034 s    [User: 3.752 s, System: 0.245 s]
  Range (min … max):    3.938 s …  4.073 s    10 runs

Summary
  this-patch ran
    1.00 ± 0.01 times faster than current-master

Need to adapt one test that was not a valid long qid.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v1:
* adapted to split out parameter removal
* improved comment
* updated commit message
* added a minimum length check at the beginning of parse_qid

 src/main.rs                        | 143 ++++++++++++++++++++++++-----
 tests/test_input_misleading_lines  |  20 ++++
 tests/test_output_misleading_lines |  38 ++++++++
 tests/tests_misleading_lines.rs    |  39 ++++++++
 4 files changed, 219 insertions(+), 21 deletions(-)
 create mode 100644 tests/test_input_misleading_lines
 create mode 100644 tests/test_output_misleading_lines
 create mode 100644 tests/tests_misleading_lines.rs

diff --git a/src/main.rs b/src/main.rs
index 3b815cb..cd5ad7e 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -1,6 +1,7 @@
 use std::cell::RefCell;
 use std::collections::HashMap;
 use std::ffi::CString;
+use std::ops::RangeInclusive;
 use std::rc::{Rc, Weak};
 
 use std::fs::File;
@@ -2123,6 +2124,26 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
     }
 }
 
+fn is_vowel(c: u8) -> bool {
+    c == b'A'
+        || c == b'E'
+        || c == b'a'
+        || c == b'e'
+        || c == b'I'
+        || c == b'i'
+        || c == b'O'
+        || c == b'o'
+        || c == b'U'
+        || c == b'u'
+}
+
+enum QidVariant {
+    Unknown,
+    Short,
+    Long,
+    Invalid,
+}
+
 /// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes`
 /// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is
 /// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the
@@ -2130,31 +2151,111 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
 /// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first.
 const POSTFIX_QID_MAX_LEN: usize = 25;
 
-/// Parse a queue ID and return a tuple of (qid, remaining_text) or None.
+/// A short postfix QID can only have between 6 and 21 characters, 5 from usec of the timestamp and
+/// between 1 and 16 from the inode nr printed as '%lX'
+const SHORT_QID_RANGE: RangeInclusive<usize> = 6..=21;
+
+/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in
+/// microseconds -> 4 chars + 'z' + at least one char for the inode
+const LONG_QID_MIN: usize = 12;
+
+/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None.
 ///
 /// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are
 /// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52
-/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at
-/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows
-/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`.
+/// alphabet. The scan tries to determine if it's a short or long QID and cancels
+/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a
+/// long QID). It assumes that only non alphanumeric characters are the delimiter,
+/// so e.g. a long QID followed by a vowel would be detected as invalid.
 fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
+    // check the minimum length first
+    if data.len() < *SHORT_QID_RANGE.start() {
+        return None;
+    }
+
     // to simplify limit max to data.len()
     let max = POSTFIX_QID_MAX_LEN.min(data.len());
-    // take at most max, find the first non-alphanumeric byte
-    match data
-        .iter()
-        .take(max)
-        .position(|b| !b.is_ascii_alphanumeric())
-    {
-        // if there were less than 5 return nothing
-        // the QID always has at least 5 characters for the microseconds (see
-        // http://www.postfix.org/postconf.5.html#enable_long_queue_ids)
-        Some(n) if n < 5 => None,
-        // otherwise split at the first non-alphanumeric byte
-        Some(n) => Some(data.split_at(n)),
-        // or return 'max' length QID if no non-alphanumeric byte is found
-        None => Some(data.split_at(max)),
-    }
+
+    let mut z_seen = false;
+    let mut variant = QidVariant::Unknown;
+
+    let position = data.iter().take(max).position(|b| match variant {
+        QidVariant::Unknown => {
+            if b.is_ascii_hexdigit() {
+                if is_vowel(*b) {
+                    // since vowels are forbidden in long qids, it must be a short one
+                    variant = QidVariant::Short;
+                }
+                return false;
+            }
+
+            if b.is_ascii_alphanumeric() {
+                if is_vowel(*b) {
+                    // it's a non hexadecimal vowel, so it can't be a short or a long qid
+                    variant = QidVariant::Invalid;
+                    return true;
+                }
+
+                if *b == b'z' {
+                    z_seen = true;
+                }
+
+                variant = QidVariant::Long;
+                return false;
+            }
+
+            true
+        }
+        QidVariant::Short => !b.is_ascii_hexdigit(),
+        QidVariant::Long => {
+            if !b.is_ascii_alphanumeric() {
+                return true;
+            }
+
+            if is_vowel(*b) {
+                // vowel detected in a long qid, must be invalid
+                variant = QidVariant::Invalid;
+                return true;
+            }
+
+            if *b == b'z' {
+                z_seen = true;
+            }
+
+            false
+        }
+        QidVariant::Invalid => true,
+    });
+
+    let pos = match (variant, z_seen, position) {
+        // these must be short variants that don't have any vowels in them
+        (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
+        (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
+
+        // short variants
+        (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
+        (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
+
+        (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max),
+        (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n),
+
+        // short variant can't have z -> invalid
+        (QidVariant::Short, true, _) => None,
+
+        // long variant without z is invalid
+        (QidVariant::Long, false, _) => None,
+
+        // this can't happen since we can have only a z_seen in Long or when changing to Long
+        (QidVariant::Unknown, true, _) => {
+            eprintln!("internal error, unknown QID variant with 'z'");
+            None
+        }
+
+        // everything else is also invalid
+        _ => None,
+    };
+
+    pos.map(|pos| data.split_at(pos))
 }
 
 /// Parse a queue ID followed by the `": "` delimiter, returning (qid, remaining_text) or None.
@@ -2450,8 +2551,8 @@ mod tests {
         // postfix long queue ID (enable_long_queue_ids = yes) drawn from the
         // base-52 alphabet, i.e. containing non-hex letters
         assert_eq!(
-            parse_qid(b"4Zk8mP2gqRz: removed"),
-            Some((&b"4Zk8mP2gqRz"[..], &b": removed"[..])),
+            parse_qid(b"4Zk8mP2gqRz1: removed"),
+            Some((&b"4Zk8mP2gqRz1"[..], &b": removed"[..])),
         );
     }
 
diff --git a/tests/test_input_misleading_lines b/tests/test_input_misleading_lines
new file mode 100644
index 0000000..2512ec0
--- /dev/null
+++ b/tests/test_input_misleading_lines
@@ -0,0 +1,20 @@
+2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4
+2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4]
+2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4]
+2026-06-16T02:26:10.588801+02:00 pmg postfix/postscreen[407424]: CONNECT from [1.2.3.4]:44340 to [10.10.10.10]:25
+2026-06-16T02:26:10.589017+02:00 pmg postfix/postscreen[407424]: PASS OLD [1.2.3.4]:44340
+2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4
+2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4]
+2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id=<xyz>
+2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=<test@pmg.local>, size=1337, nrcpt=1 (queue active)
+2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+2026-06-16T02:26:10.673820+02:00 pmg pmg-smtp-filter[4049085]: 2026/06/16-02:26:10 CONNECT TCP Peer: "[127.0.0.1]:33030" Local: "[127.0.0.1]:10024"
+2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id=<xyz>
+2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999)
+2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for <test@pmg.local> to spam quarantine - 432143214321 (rule: Quarantine)
+2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0)
+2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=<test@pmg.local>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234))
+2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed
+2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
diff --git a/tests/test_output_misleading_lines b/tests/test_output_misleading_lines
new file mode 100644
index 0000000..b54ca32
--- /dev/null
+++ b/tests/test_output_misleading_lines
@@ -0,0 +1,38 @@
+# LogReader: 3647914
+# Query options
+# Start: 2020-12-18 15:00:00 (1608300000)
+# End: 2026-06-16 15:40:00 (1781617200)
+# End Query Options
+
+QENTRY: 123412341234
+CTIME: 6A309823
+SIZE: 1337
+CLIENT: unknown[1.2.3.4]
+MSGID: <xyz>
+TO:6A309823:123412341234:Q: from <test@pmg.local> to <test@pmg.local> (432143214321)
+SMTP:
+L00000001 2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4
+L00000002 2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4]
+L00000003 2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+L00000004 2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4]
+L0000000B 2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+FILTER: 123412341234
+L0000000D 2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id=<xyz>
+L0000000F 2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999)
+L00000010 2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for <test@pmg.local> to spam quarantine - 432143214321 (rule: Quarantine)
+L00000011 2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0)
+QMGR:
+L00000009 2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id=<xyz>
+L0000000A 2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=<test@pmg.local>, size=1337, nrcpt=1 (queue active)
+L00000012 2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=<test@pmg.local>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234))
+L00000013 2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed
+
+SMTPD: T6A309822L00000004
+CTIME: 6A309824
+CLIENT: unknown[1.2.3.4]
+LOGS:
+L00000007 2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4
+L00000008 2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4]
+L0000000E 2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+L00000014 2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+
diff --git a/tests/tests_misleading_lines.rs b/tests/tests_misleading_lines.rs
new file mode 100644
index 0000000..fe6a652
--- /dev/null
+++ b/tests/tests_misleading_lines.rs
@@ -0,0 +1,39 @@
+use std::fs::File;
+use std::io::BufReader;
+use std::process::Command;
+mod utils;
+
+// Some postfix lines look like they could be a QID, like 'warning: [..];' but
+// aren't. Test that they don't produce any stray messages on stderr when multiple
+// like this exist (meaning they would have been parsed as QIDs)
+#[test]
+fn misleading_lines_are_ignored() {
+    let output = Command::new("faketime")
+        .env("TZ", "Europe/Vienna")
+        .arg("2020-12-31 23:59:59")
+        .arg(utils::log_tracker_path())
+        .arg("-vv")
+        .arg("-s")
+        .arg("2020-12-18 15:00:00")
+        .arg("-e")
+        .arg("2026-06-16 15:40:00")
+        .arg("-i")
+        .arg("tests/test_input_misleading_lines")
+        .output()
+        .expect("failed to execute pmg-log-tracker");
+
+    assert_eq!(String::from_utf8_lossy(&output.stderr), "".to_string());
+
+    assert!(
+        output.status.success(),
+        "pmg-log-tracker failed: {}",
+        String::from_utf8_lossy(&output.stderr)
+    );
+
+    let expected_file =
+        File::open("tests/test_output_misleading_lines").expect("failed to open test_output");
+
+    let expected_output = BufReader::new(&expected_file);
+    let output_reader = BufReader::new(&output.stdout[..]);
+    utils::compare_output(output_reader, expected_output);
+}
-- 
2.47.3





      parent reply	other threads:[~2026-06-17  7:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17  7:47 [PATCH log-tracker v2 1/3] split out pmg-smtp-filter qid parsing Dominik Csapak
2026-06-17  7:47 ` [PATCH log-tracker v2 2/3] parse_qid{_prefix}: remove 'max' parameter Dominik Csapak
2026-06-17  7:47 ` Dominik Csapak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260617074934.955079-3-d.csapak@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal