From: Dominik Csapak <d.csapak@proxmox.com>
To: pmg-devel@lists.proxmox.com
Subject: [PATCH log-tracker v2 3/3] fix #7702: improve postfix QID parsing
Date: Wed, 17 Jun 2026 09:47:50 +0200 [thread overview]
Message-ID: <20260617074934.955079-3-d.csapak@proxmox.com> (raw)
In-Reply-To: <20260617074934.955079-1-d.csapak@proxmox.com>
since adding support for postfix's long QIDs, the criteria was that the
QIDs had to be alphanumeric (since no built-in base52 check exists).
This lead to false positives in some logs that had lines such as:
postfix/smtpd[xxxx]: warning: hostname example.org does not resolve to address x.y.z.w
Where the 'warning:' was interpreted as QIDs.
(There are other things postfix can print here that were too interpreted
as QIDs, this is just an example).
To fix it, add a simple state machine during parsing that keeps track of
what should be encountered in a long/short QID and abort if there is an
impossible combination (e.g. vowels in a long QID).
Since adding new checks here could be performance critical, I
benchmarked it and it showed no significant performance changes.
Setup:
A ~2.5 GiB large syslog (on ram; otherwise we're likely IO bottlenecked)
filtered for some time range and a search for a from address:
Benchmark 1: current-master
Time (mean ± σ): 4.017 s ± 0.013 s [User: 3.765 s, System: 0.252 s]
Range (min … max): 4.004 s … 4.041 s 10 runs
Benchmark 2: this-patch
Time (mean ± σ): 3.998 s ± 0.034 s [User: 3.752 s, System: 0.245 s]
Range (min … max): 3.938 s … 4.073 s 10 runs
Summary
this-patch ran
1.00 ± 0.01 times faster than current-master
Need to adapt one test that was not a valid long qid.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v1:
* adapted to split out parameter removal
* improved comment
* updated commit message
* added a minimum length check at the beginning of parse_qid
src/main.rs | 143 ++++++++++++++++++++++++-----
tests/test_input_misleading_lines | 20 ++++
tests/test_output_misleading_lines | 38 ++++++++
tests/tests_misleading_lines.rs | 39 ++++++++
4 files changed, 219 insertions(+), 21 deletions(-)
create mode 100644 tests/test_input_misleading_lines
create mode 100644 tests/test_output_misleading_lines
create mode 100644 tests/tests_misleading_lines.rs
diff --git a/src/main.rs b/src/main.rs
index 3b815cb..cd5ad7e 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -1,6 +1,7 @@
use std::cell::RefCell;
use std::collections::HashMap;
use std::ffi::CString;
+use std::ops::RangeInclusive;
use std::rc::{Rc, Weak};
use std::fs::File;
@@ -2123,6 +2124,26 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
}
}
+fn is_vowel(c: u8) -> bool {
+ c == b'A'
+ || c == b'E'
+ || c == b'a'
+ || c == b'e'
+ || c == b'I'
+ || c == b'i'
+ || c == b'O'
+ || c == b'o'
+ || c == b'U'
+ || c == b'u'
+}
+
+enum QidVariant {
+ Unknown,
+ Short,
+ Long,
+ Invalid,
+}
+
/// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes`
/// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is
/// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the
@@ -2130,31 +2151,111 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
/// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first.
const POSTFIX_QID_MAX_LEN: usize = 25;
-/// Parse a queue ID and return a tuple of (qid, remaining_text) or None.
+/// A short postfix QID can only have between 6 and 21 characters, 5 from usec of the timestamp and
+/// between 1 and 16 from the inode nr printed as '%lX'
+const SHORT_QID_RANGE: RangeInclusive<usize> = 6..=21;
+
+/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in
+/// microseconds -> 4 chars + 'z' + at least one char for the inode
+const LONG_QID_MIN: usize = 12;
+
+/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None.
///
/// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are
/// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52
-/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at
-/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows
-/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`.
+/// alphabet. The scan tries to determine if it's a short or long QID and cancels
+/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a
+/// long QID). It assumes that only non alphanumeric characters are the delimiter,
+/// so e.g. a long QID followed by a vowel would be detected as invalid.
fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
+ // check the minimum length first
+ if data.len() < *SHORT_QID_RANGE.start() {
+ return None;
+ }
+
// to simplify limit max to data.len()
let max = POSTFIX_QID_MAX_LEN.min(data.len());
- // take at most max, find the first non-alphanumeric byte
- match data
- .iter()
- .take(max)
- .position(|b| !b.is_ascii_alphanumeric())
- {
- // if there were less than 5 return nothing
- // the QID always has at least 5 characters for the microseconds (see
- // http://www.postfix.org/postconf.5.html#enable_long_queue_ids)
- Some(n) if n < 5 => None,
- // otherwise split at the first non-alphanumeric byte
- Some(n) => Some(data.split_at(n)),
- // or return 'max' length QID if no non-alphanumeric byte is found
- None => Some(data.split_at(max)),
- }
+
+ let mut z_seen = false;
+ let mut variant = QidVariant::Unknown;
+
+ let position = data.iter().take(max).position(|b| match variant {
+ QidVariant::Unknown => {
+ if b.is_ascii_hexdigit() {
+ if is_vowel(*b) {
+ // since vowels are forbidden in long qids, it must be a short one
+ variant = QidVariant::Short;
+ }
+ return false;
+ }
+
+ if b.is_ascii_alphanumeric() {
+ if is_vowel(*b) {
+ // it's a non hexadecimal vowel, so it can't be a short or a long qid
+ variant = QidVariant::Invalid;
+ return true;
+ }
+
+ if *b == b'z' {
+ z_seen = true;
+ }
+
+ variant = QidVariant::Long;
+ return false;
+ }
+
+ true
+ }
+ QidVariant::Short => !b.is_ascii_hexdigit(),
+ QidVariant::Long => {
+ if !b.is_ascii_alphanumeric() {
+ return true;
+ }
+
+ if is_vowel(*b) {
+ // vowel detected in a long qid, must be invalid
+ variant = QidVariant::Invalid;
+ return true;
+ }
+
+ if *b == b'z' {
+ z_seen = true;
+ }
+
+ false
+ }
+ QidVariant::Invalid => true,
+ });
+
+ let pos = match (variant, z_seen, position) {
+ // these must be short variants that don't have any vowels in them
+ (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
+ (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
+
+ // short variants
+ (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
+ (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
+
+ (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max),
+ (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n),
+
+ // short variant can't have z -> invalid
+ (QidVariant::Short, true, _) => None,
+
+ // long variant without z is invalid
+ (QidVariant::Long, false, _) => None,
+
+ // this can't happen since we can have only a z_seen in Long or when changing to Long
+ (QidVariant::Unknown, true, _) => {
+ eprintln!("internal error, unknown QID variant with 'z'");
+ None
+ }
+
+ // everything else is also invalid
+ _ => None,
+ };
+
+ pos.map(|pos| data.split_at(pos))
}
/// Parse a queue ID followed by the `": "` delimiter, returning (qid, remaining_text) or None.
@@ -2450,8 +2551,8 @@ mod tests {
// postfix long queue ID (enable_long_queue_ids = yes) drawn from the
// base-52 alphabet, i.e. containing non-hex letters
assert_eq!(
- parse_qid(b"4Zk8mP2gqRz: removed"),
- Some((&b"4Zk8mP2gqRz"[..], &b": removed"[..])),
+ parse_qid(b"4Zk8mP2gqRz1: removed"),
+ Some((&b"4Zk8mP2gqRz1"[..], &b": removed"[..])),
);
}
diff --git a/tests/test_input_misleading_lines b/tests/test_input_misleading_lines
new file mode 100644
index 0000000..2512ec0
--- /dev/null
+++ b/tests/test_input_misleading_lines
@@ -0,0 +1,20 @@
+2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4
+2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4]
+2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4]
+2026-06-16T02:26:10.588801+02:00 pmg postfix/postscreen[407424]: CONNECT from [1.2.3.4]:44340 to [10.10.10.10]:25
+2026-06-16T02:26:10.589017+02:00 pmg postfix/postscreen[407424]: PASS OLD [1.2.3.4]:44340
+2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4
+2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4]
+2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id=<xyz>
+2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=<test@pmg.local>, size=1337, nrcpt=1 (queue active)
+2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+2026-06-16T02:26:10.673820+02:00 pmg pmg-smtp-filter[4049085]: 2026/06/16-02:26:10 CONNECT TCP Peer: "[127.0.0.1]:33030" Local: "[127.0.0.1]:10024"
+2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id=<xyz>
+2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999)
+2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for <test@pmg.local> to spam quarantine - 432143214321 (rule: Quarantine)
+2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0)
+2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=<test@pmg.local>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234))
+2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed
+2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
diff --git a/tests/test_output_misleading_lines b/tests/test_output_misleading_lines
new file mode 100644
index 0000000..b54ca32
--- /dev/null
+++ b/tests/test_output_misleading_lines
@@ -0,0 +1,38 @@
+# LogReader: 3647914
+# Query options
+# Start: 2020-12-18 15:00:00 (1608300000)
+# End: 2026-06-16 15:40:00 (1781617200)
+# End Query Options
+
+QENTRY: 123412341234
+CTIME: 6A309823
+SIZE: 1337
+CLIENT: unknown[1.2.3.4]
+MSGID: <xyz>
+TO:6A309823:123412341234:Q: from <test@pmg.local> to <test@pmg.local> (432143214321)
+SMTP:
+L00000001 2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4
+L00000002 2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4]
+L00000003 2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+L00000004 2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4]
+L0000000B 2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+FILTER: 123412341234
+L0000000D 2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id=<xyz>
+L0000000F 2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999)
+L00000010 2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for <test@pmg.local> to spam quarantine - 432143214321 (rule: Quarantine)
+L00000011 2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0)
+QMGR:
+L00000009 2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id=<xyz>
+L0000000A 2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=<test@pmg.local>, size=1337, nrcpt=1 (queue active)
+L00000012 2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=<test@pmg.local>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234))
+L00000013 2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed
+
+SMTPD: T6A309822L00000004
+CTIME: 6A309824
+CLIENT: unknown[1.2.3.4]
+LOGS:
+L00000007 2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4
+L00000008 2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4]
+L0000000E 2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+L00000014 2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+
diff --git a/tests/tests_misleading_lines.rs b/tests/tests_misleading_lines.rs
new file mode 100644
index 0000000..fe6a652
--- /dev/null
+++ b/tests/tests_misleading_lines.rs
@@ -0,0 +1,39 @@
+use std::fs::File;
+use std::io::BufReader;
+use std::process::Command;
+mod utils;
+
+// Some postfix lines look like they could be a QID, like 'warning: [..];' but
+// aren't. Test that they don't produce any stray messages on stderr when multiple
+// like this exist (meaning they would have been parsed as QIDs)
+#[test]
+fn misleading_lines_are_ignored() {
+ let output = Command::new("faketime")
+ .env("TZ", "Europe/Vienna")
+ .arg("2020-12-31 23:59:59")
+ .arg(utils::log_tracker_path())
+ .arg("-vv")
+ .arg("-s")
+ .arg("2020-12-18 15:00:00")
+ .arg("-e")
+ .arg("2026-06-16 15:40:00")
+ .arg("-i")
+ .arg("tests/test_input_misleading_lines")
+ .output()
+ .expect("failed to execute pmg-log-tracker");
+
+ assert_eq!(String::from_utf8_lossy(&output.stderr), "".to_string());
+
+ assert!(
+ output.status.success(),
+ "pmg-log-tracker failed: {}",
+ String::from_utf8_lossy(&output.stderr)
+ );
+
+ let expected_file =
+ File::open("tests/test_output_misleading_lines").expect("failed to open test_output");
+
+ let expected_output = BufReader::new(&expected_file);
+ let output_reader = BufReader::new(&output.stdout[..]);
+ utils::compare_output(output_reader, expected_output);
+}
--
2.47.3
prev parent reply other threads:[~2026-06-17 7:49 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-17 7:47 [PATCH log-tracker v2 1/3] split out pmg-smtp-filter qid parsing Dominik Csapak
2026-06-17 7:47 ` [PATCH log-tracker v2 2/3] parse_qid{_prefix}: remove 'max' parameter Dominik Csapak
2026-06-17 7:47 ` Dominik Csapak [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260617074934.955079-3-d.csapak@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=pmg-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox