From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 3E6741FF14F for ; Wed, 17 Jun 2026 09:49:45 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A73EB18472; Wed, 17 Jun 2026 09:49:44 +0200 (CEST) From: Dominik Csapak To: pmg-devel@lists.proxmox.com Subject: [PATCH log-tracker v2 3/3] fix #7702: improve postfix QID parsing Date: Wed, 17 Jun 2026 09:47:50 +0200 Message-ID: <20260617074934.955079-3-d.csapak@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260617074934.955079-1-d.csapak@proxmox.com> References: <20260617074934.955079-1-d.csapak@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.049 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: OMISACAHGIA4JXRKEYQ5GLBGOGREWSCM X-Message-ID-Hash: OMISACAHGIA4JXRKEYQ5GLBGOGREWSCM X-MailFrom: d.csapak@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Mail Gateway development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: since adding support for postfix's long QIDs, the criteria was that the QIDs had to be alphanumeric (since no built-in base52 check exists). This lead to false positives in some logs that had lines such as: postfix/smtpd[xxxx]: warning: hostname example.org does not resolve to address x.y.z.w Where the 'warning:' was interpreted as QIDs. (There are other things postfix can print here that were too interpreted as QIDs, this is just an example). To fix it, add a simple state machine during parsing that keeps track of what should be encountered in a long/short QID and abort if there is an impossible combination (e.g. vowels in a long QID). Since adding new checks here could be performance critical, I benchmarked it and it showed no significant performance changes. Setup: A ~2.5 GiB large syslog (on ram; otherwise we're likely IO bottlenecked) filtered for some time range and a search for a from address: Benchmark 1: current-master Time (mean ± σ): 4.017 s ± 0.013 s [User: 3.765 s, System: 0.252 s] Range (min … max): 4.004 s … 4.041 s 10 runs Benchmark 2: this-patch Time (mean ± σ): 3.998 s ± 0.034 s [User: 3.752 s, System: 0.245 s] Range (min … max): 3.938 s … 4.073 s 10 runs Summary this-patch ran 1.00 ± 0.01 times faster than current-master Need to adapt one test that was not a valid long qid. Signed-off-by: Dominik Csapak --- changes from v1: * adapted to split out parameter removal * improved comment * updated commit message * added a minimum length check at the beginning of parse_qid src/main.rs | 143 ++++++++++++++++++++++++----- tests/test_input_misleading_lines | 20 ++++ tests/test_output_misleading_lines | 38 ++++++++ tests/tests_misleading_lines.rs | 39 ++++++++ 4 files changed, 219 insertions(+), 21 deletions(-) create mode 100644 tests/test_input_misleading_lines create mode 100644 tests/test_output_misleading_lines create mode 100644 tests/tests_misleading_lines.rs diff --git a/src/main.rs b/src/main.rs index 3b815cb..cd5ad7e 100644 --- a/src/main.rs +++ b/src/main.rs @@ -1,6 +1,7 @@ use std::cell::RefCell; use std::collections::HashMap; use std::ffi::CString; +use std::ops::RangeInclusive; use std::rc::{Rc, Weak}; use std::fs::File; @@ -2123,6 +2124,26 @@ fn open_logfile(path: &Path) -> std::io::Result> { } } +fn is_vowel(c: u8) -> bool { + c == b'A' + || c == b'E' + || c == b'a' + || c == b'e' + || c == b'I' + || c == b'i' + || c == b'O' + || c == b'o' + || c == b'U' + || c == b'u' +} + +enum QidVariant { + Unknown, + Short, + Long, + Invalid, +} + /// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes` /// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is /// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the @@ -2130,31 +2151,111 @@ fn open_logfile(path: &Path) -> std::io::Result> { /// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first. const POSTFIX_QID_MAX_LEN: usize = 25; -/// Parse a queue ID and return a tuple of (qid, remaining_text) or None. +/// A short postfix QID can only have between 6 and 21 characters, 5 from usec of the timestamp and +/// between 1 and 16 from the inode nr printed as '%lX' +const SHORT_QID_RANGE: RangeInclusive = 6..=21; + +/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in +/// microseconds -> 4 chars + 'z' + at least one char for the inode +const LONG_QID_MIN: usize = 12; + +/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None. /// /// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are /// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52 -/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at -/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows -/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`. +/// alphabet. The scan tries to determine if it's a short or long QID and cancels +/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a +/// long QID). It assumes that only non alphanumeric characters are the delimiter, +/// so e.g. a long QID followed by a vowel would be detected as invalid. fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> { + // check the minimum length first + if data.len() < *SHORT_QID_RANGE.start() { + return None; + } + // to simplify limit max to data.len() let max = POSTFIX_QID_MAX_LEN.min(data.len()); - // take at most max, find the first non-alphanumeric byte - match data - .iter() - .take(max) - .position(|b| !b.is_ascii_alphanumeric()) - { - // if there were less than 5 return nothing - // the QID always has at least 5 characters for the microseconds (see - // http://www.postfix.org/postconf.5.html#enable_long_queue_ids) - Some(n) if n < 5 => None, - // otherwise split at the first non-alphanumeric byte - Some(n) => Some(data.split_at(n)), - // or return 'max' length QID if no non-alphanumeric byte is found - None => Some(data.split_at(max)), - } + + let mut z_seen = false; + let mut variant = QidVariant::Unknown; + + let position = data.iter().take(max).position(|b| match variant { + QidVariant::Unknown => { + if b.is_ascii_hexdigit() { + if is_vowel(*b) { + // since vowels are forbidden in long qids, it must be a short one + variant = QidVariant::Short; + } + return false; + } + + if b.is_ascii_alphanumeric() { + if is_vowel(*b) { + // it's a non hexadecimal vowel, so it can't be a short or a long qid + variant = QidVariant::Invalid; + return true; + } + + if *b == b'z' { + z_seen = true; + } + + variant = QidVariant::Long; + return false; + } + + true + } + QidVariant::Short => !b.is_ascii_hexdigit(), + QidVariant::Long => { + if !b.is_ascii_alphanumeric() { + return true; + } + + if is_vowel(*b) { + // vowel detected in a long qid, must be invalid + variant = QidVariant::Invalid; + return true; + } + + if *b == b'z' { + z_seen = true; + } + + false + } + QidVariant::Invalid => true, + }); + + let pos = match (variant, z_seen, position) { + // these must be short variants that don't have any vowels in them + (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max), + (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n), + + // short variants + (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max), + (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n), + + (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max), + (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n), + + // short variant can't have z -> invalid + (QidVariant::Short, true, _) => None, + + // long variant without z is invalid + (QidVariant::Long, false, _) => None, + + // this can't happen since we can have only a z_seen in Long or when changing to Long + (QidVariant::Unknown, true, _) => { + eprintln!("internal error, unknown QID variant with 'z'"); + None + } + + // everything else is also invalid + _ => None, + }; + + pos.map(|pos| data.split_at(pos)) } /// Parse a queue ID followed by the `": "` delimiter, returning (qid, remaining_text) or None. @@ -2450,8 +2551,8 @@ mod tests { // postfix long queue ID (enable_long_queue_ids = yes) drawn from the // base-52 alphabet, i.e. containing non-hex letters assert_eq!( - parse_qid(b"4Zk8mP2gqRz: removed"), - Some((&b"4Zk8mP2gqRz"[..], &b": removed"[..])), + parse_qid(b"4Zk8mP2gqRz1: removed"), + Some((&b"4Zk8mP2gqRz1"[..], &b": removed"[..])), ); } diff --git a/tests/test_input_misleading_lines b/tests/test_input_misleading_lines new file mode 100644 index 0000000..2512ec0 --- /dev/null +++ b/tests/test_input_misleading_lines @@ -0,0 +1,20 @@ +2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4 +2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4] +2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256 +2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4] +2026-06-16T02:26:10.588801+02:00 pmg postfix/postscreen[407424]: CONNECT from [1.2.3.4]:44340 to [10.10.10.10]:25 +2026-06-16T02:26:10.589017+02:00 pmg postfix/postscreen[407424]: PASS OLD [1.2.3.4]:44340 +2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4 +2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4] +2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id= +2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=, size=1337, nrcpt=1 (queue active) +2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8 +2026-06-16T02:26:10.673820+02:00 pmg pmg-smtp-filter[4049085]: 2026/06/16-02:26:10 CONNECT TCP Peer: "[127.0.0.1]:33030" Local: "[127.0.0.1]:10024" +2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id= +2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256 +2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999) +2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for to spam quarantine - 432143214321 (rule: Quarantine) +2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0) +2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234)) +2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed +2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8 diff --git a/tests/test_output_misleading_lines b/tests/test_output_misleading_lines new file mode 100644 index 0000000..b54ca32 --- /dev/null +++ b/tests/test_output_misleading_lines @@ -0,0 +1,38 @@ +# LogReader: 3647914 +# Query options +# Start: 2020-12-18 15:00:00 (1608300000) +# End: 2026-06-16 15:40:00 (1781617200) +# End Query Options + +QENTRY: 123412341234 +CTIME: 6A309823 +SIZE: 1337 +CLIENT: unknown[1.2.3.4] +MSGID: +TO:6A309823:123412341234:Q: from to (432143214321) +SMTP: +L00000001 2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4 +L00000002 2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4] +L00000003 2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256 +L00000004 2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4] +L0000000B 2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8 +FILTER: 123412341234 +L0000000D 2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id= +L0000000F 2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999) +L00000010 2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for to spam quarantine - 432143214321 (rule: Quarantine) +L00000011 2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0) +QMGR: +L00000009 2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id= +L0000000A 2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=, size=1337, nrcpt=1 (queue active) +L00000012 2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234)) +L00000013 2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed + +SMTPD: T6A309822L00000004 +CTIME: 6A309824 +CLIENT: unknown[1.2.3.4] +LOGS: +L00000007 2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4 +L00000008 2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4] +L0000000E 2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256 +L00000014 2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8 + diff --git a/tests/tests_misleading_lines.rs b/tests/tests_misleading_lines.rs new file mode 100644 index 0000000..fe6a652 --- /dev/null +++ b/tests/tests_misleading_lines.rs @@ -0,0 +1,39 @@ +use std::fs::File; +use std::io::BufReader; +use std::process::Command; +mod utils; + +// Some postfix lines look like they could be a QID, like 'warning: [..];' but +// aren't. Test that they don't produce any stray messages on stderr when multiple +// like this exist (meaning they would have been parsed as QIDs) +#[test] +fn misleading_lines_are_ignored() { + let output = Command::new("faketime") + .env("TZ", "Europe/Vienna") + .arg("2020-12-31 23:59:59") + .arg(utils::log_tracker_path()) + .arg("-vv") + .arg("-s") + .arg("2020-12-18 15:00:00") + .arg("-e") + .arg("2026-06-16 15:40:00") + .arg("-i") + .arg("tests/test_input_misleading_lines") + .output() + .expect("failed to execute pmg-log-tracker"); + + assert_eq!(String::from_utf8_lossy(&output.stderr), "".to_string()); + + assert!( + output.status.success(), + "pmg-log-tracker failed: {}", + String::from_utf8_lossy(&output.stderr) + ); + + let expected_file = + File::open("tests/test_output_misleading_lines").expect("failed to open test_output"); + + let expected_output = BufReader::new(&expected_file); + let output_reader = BufReader::new(&output.stdout[..]); + utils::compare_output(output_reader, expected_output); +} -- 2.47.3