* [PATCH log-tracker 1/2] split out pmg-smtp-filter qid parsing
@ 2026-06-16 15:24 Dominik Csapak
2026-06-16 15:24 ` [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing Dominik Csapak
0 siblings, 1 reply; 3+ messages in thread
From: Dominik Csapak @ 2026-06-16 15:24 UTC (permalink / raw)
To: pmg-devel
By splitting out the parsing of the pmg-smtp-filter QIDs, we can more
rigorously parse them since the parser does not have to overlap with
postfix's QIDs'. We have a bit of code duplication, but more clarity
when reading the code and better checks.
Since we know how long the pmg-smtp-filter QIDs must be, we can also
filter by the length.
Since we know the lengths of a pmg-smtp-filter QID, we don't need to
pass the max-len from outside.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/main.rs | 67 ++++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 61 insertions(+), 6 deletions(-)
diff --git a/src/main.rs b/src/main.rs
index 0b6e92f..89d2768 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -111,7 +111,7 @@ fn main() -> Result<(), Error> {
// we match 4 entries, all beginning with a QID
// accept mail, move mail, block mail and the processing time
fn handle_pmg_smtp_filter_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
- let (qid, data) = match parse_qid_prefix(msg, 25) {
+ let (qid, data) = match parse_pmg_smtp_filter_qid_prefix(msg) {
Some((q, m)) => (q, m),
None => return,
};
@@ -167,7 +167,7 @@ fn handle_pmg_smtp_filter_message(msg: &[u8], parser: &mut Parser, complete_line
None => return,
};
let data = &data[qid_index + 13..];
- let (qid, _) = match parse_qid(data, 25) {
+ let (qid, _) = match parse_pmg_smtp_filter_qid(data) {
Some(t) => t,
None => return,
};
@@ -407,7 +407,7 @@ fn handle_lmtp_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
}
// this is the QID of the associated pmg-smtp-filter
- let (qid, _) = match parse_qid(data, 25) {
+ let (qid, _) = match parse_pmg_smtp_filter_qid(data) {
Some(t) => t,
None => return,
};
@@ -543,7 +543,7 @@ fn handle_smtpd_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
return;
}
let data = &data[14..];
- if let Some((qid, data)) = parse_qid(data, 25) {
+ if let Some((qid, data)) = parse_pmg_smtp_filter_qid(data) {
let fe = get_or_create_fentry(&mut parser.fentries, qid);
// set the FEntry to before-queue filtered
fe.borrow_mut().is_bq = true;
@@ -597,7 +597,7 @@ fn handle_smtpd_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
if let Some(qid_index) = find(data, b"(") {
let data = &data[qid_index + 1..];
- if let Some((qid, _)) = parse_qid(data, 25) {
+ if let Some((qid, _)) = parse_pmg_smtp_filter_qid(data) {
let fe = get_or_create_fentry(&mut parser.fentries, qid);
// set the FEntry to before-queue filtered
fe.borrow_mut().is_bq = true;
@@ -2194,6 +2194,42 @@ fn parse_number(data: &[u8], max_digits: usize) -> Option<(usize, &[u8])> {
}
}
+/// pmg-smtp-filter QIDs must be at least 7 hex characters (see src/PMG/MailQueue.pm in pmg-api)
+const PMG_SMTP_FILTER_QID_MIN_LEN: usize = 7;
+
+const PMG_SMTP_FILTER_QID_MAX_LEN: usize = 25;
+
+/// Parse a queue ID followed by the `": "` delimiter, returning (qid, remaining_text) or None.
+///
+/// Requiring the delimiter keeps foreign lines, like the output a custom check script logs under
+/// the pmg-smtp-filter identifier, from panicking the parser or being recorded under a bogus ID.
+fn parse_pmg_smtp_filter_qid_prefix(msg: &[u8]) -> Option<(&[u8], &[u8])> {
+ let (qid, data) = parse_pmg_smtp_filter_qid(msg)?;
+ Some((qid, data.strip_prefix(b": ")?))
+}
+
+/// Parse a queue ID from pmg-smtp-filter and return a tuple of (qid, remaining_text) or None.
+///
+/// Queue IDs are hexadecimal (`[0-9A-Fa-f]`) so only parse until the first non-hexadecimal
+/// character is encountered.
+fn parse_pmg_smtp_filter_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
+ // to simplify limit max to data.len()
+ let max = PMG_SMTP_FILTER_QID_MAX_LEN.min(data.len());
+
+ if max < PMG_SMTP_FILTER_QID_MIN_LEN {
+ return None;
+ }
+ // take at most max, find the first non-hexdigit byte
+ match data.iter().take(max).position(|b| !b.is_ascii_hexdigit()) {
+ // too short
+ Some(n) if n < PMG_SMTP_FILTER_QID_MIN_LEN => None,
+ // otherwise split at the first non-hexadecimal character
+ Some(n) => Some(data.split_at(n)),
+ // or return 'max' length QID if no non-hexadecimal character is found
+ None => Some(data.split_at(max)),
+ }
+}
+
/// Parse time. Returns a tuple of (parsed_time, remaining_text) or None.
fn parse_time(
data: &'_ [u8],
@@ -2396,6 +2432,8 @@ fn find_lowercase(data: &[u8], needle: &[u8]) -> Option<usize> {
#[cfg(test)]
mod tests {
+ use crate::parse_pmg_smtp_filter_qid;
+
use super::{POSTFIX_QID_MAX_LEN, parse_qid, parse_qid_prefix, rotated_logfile};
#[test]
@@ -2427,11 +2465,28 @@ mod tests {
);
}
+ #[test]
+ fn parse_filter_ids() {
+ assert_eq!(
+ parse_pmg_smtp_filter_qid(b"3802E45DFA503808B06"),
+ Some((&b"3802E45DFA503808B06"[..], &b""[..])),
+ );
+ }
+
+ #[test]
+ fn parse_filter_id_failing() {
+ // these tests should all return None so not parsing
+ assert_eq!(parse_pmg_smtp_filter_qid(b"3"), None);
+ assert_eq!(parse_pmg_smtp_filter_qid(b""), None);
+ assert_eq!(parse_pmg_smtp_filter_qid(b"Warning"), None);
+ assert_eq!(parse_pmg_smtp_filter_qid(b"ThisIsAtest"), None);
+ }
+
#[test]
fn parse_filter_id_terminated_by_paren() {
// pmg-smtp-filter ID, terminated by ')'
assert_eq!(
- parse_qid(b"3802E45DFA503808B06)", 25),
+ parse_pmg_smtp_filter_qid(b"3802E45DFA503808B06)"),
Some((&b"3802E45DFA503808B06"[..], &b")"[..])),
);
}
--
2.47.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing
2026-06-16 15:24 [PATCH log-tracker 1/2] split out pmg-smtp-filter qid parsing Dominik Csapak
@ 2026-06-16 15:24 ` Dominik Csapak
2026-06-16 16:26 ` Stoiko Ivanov
0 siblings, 1 reply; 3+ messages in thread
From: Dominik Csapak @ 2026-06-16 15:24 UTC (permalink / raw)
To: pmg-devel
since adding support for postfix's long QIDs, the criteria was that the
QIDs had to be alphanumeric (since no built-in base52 check exists).
This lead to false positives in some logs that had lines such as:
postfix/smtpd[xxxx]: warning: hostname example.org does not resolve to address x.y.z.w
Where the 'warning:' was interpreted as QIDs.
(There are other things postfix can print here that were too interpreted
as QIDs, this is just an example).
To fix it, add a simple state machine during parsing that keeps track of
what should be encountered in a long/short QID and abort if there is an
impossible combination (e.g. vowels in a long QID).
Since adding new checks here could be performance critical, I
benchmarked it and it showed no significant performance changes.
Setup:
A ~2.5 GiB large syslog (on ram; otherwise we're likely IO bottlenecked)
filtered for some time range and a search for a from address:
Benchmark 1: current-master
Time (mean ± σ): 4.017 s ± 0.013 s [User: 3.765 s, System: 0.252 s]
Range (min … max): 4.004 s … 4.041 s 10 runs
Benchmark 2: this-patch
Time (mean ± σ): 3.998 s ± 0.034 s [User: 3.752 s, System: 0.245 s]
Range (min … max): 3.938 s … 4.073 s 10 runs
Summary
this-patch ran
1.00 ± 0.01 times faster than current-master
While at it, we can remove the max length parameter from the
parse_qid{_prefix} functions, since they were all called with the same
length limit anyway.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/main.rs | 173 ++++++++++++++++++++++-------
tests/test_input_misleading_lines | 20 ++++
tests/test_output_misleading_lines | 38 +++++++
tests/tests_misleading_lines.rs | 39 +++++++
4 files changed, 230 insertions(+), 40 deletions(-)
create mode 100644 tests/test_input_misleading_lines
create mode 100644 tests/test_output_misleading_lines
create mode 100644 tests/tests_misleading_lines.rs
diff --git a/src/main.rs b/src/main.rs
index 89d2768..db74648 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -1,6 +1,7 @@
use std::cell::RefCell;
use std::collections::HashMap;
use std::ffi::CString;
+use std::ops::RangeInclusive;
use std::rc::{Rc, Weak};
use std::fs::File;
@@ -269,7 +270,7 @@ fn handle_postscreen_message(msg: &[u8], parser: &mut Parser, complete_line: &[u
// these only appear in the 'after-queue filter' case or when the mail is
// 'accepted' in the 'before-queue filter' case
fn handle_qmgr_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
- let (qid, data) = match parse_qid_prefix(msg, POSTFIX_QID_MAX_LEN) {
+ let (qid, data) = match parse_qid_prefix(msg) {
Some(t) => t,
None => return,
};
@@ -329,7 +330,7 @@ fn handle_lmtp_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
return;
}
- let (qid, data) = match parse_qid_prefix(msg, POSTFIX_QID_MAX_LEN) {
+ let (qid, data) = match parse_qid_prefix(msg) {
Some((q, t)) => (q, t),
None => return,
};
@@ -638,7 +639,7 @@ fn handle_smtpd_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
// with none of the other messages matching, we try for a QID to match the
// corresponding QEntry to the SEntry
- let (qid, data) = match parse_qid_prefix(msg, POSTFIX_QID_MAX_LEN) {
+ let (qid, data) = match parse_qid_prefix(msg) {
Some(t) => t,
None => return,
};
@@ -668,7 +669,7 @@ fn handle_smtpd_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
// happens before the mail is passed to qmgr (after-queue or before-queue
// accepted only)
fn handle_cleanup_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
- let (qid, data) = match parse_qid_prefix(msg, POSTFIX_QID_MAX_LEN) {
+ let (qid, data) = match parse_qid_prefix(msg) {
Some(t) => t,
None => return,
};
@@ -2123,6 +2124,26 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
}
}
+fn is_vowel(c: u8) -> bool {
+ c == b'A'
+ || c == b'E'
+ || c == b'a'
+ || c == b'e'
+ || c == b'I'
+ || c == b'i'
+ || c == b'O'
+ || c == b'o'
+ || c == b'U'
+ || c == b'u'
+}
+
+enum QidVariant {
+ Unknown,
+ Short,
+ Long,
+ Invalid,
+}
+
/// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes`
/// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is
/// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the
@@ -2130,39 +2151,114 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
/// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first.
const POSTFIX_QID_MAX_LEN: usize = 25;
-/// Parse a queue ID and return a tuple of (qid, remaining_text) or None.
+/// A short postfix QID can only have between 6 and 21 characters, 5 from the timestamp in usec and
+/// between 1 and 16 from the inode nr printed as '%lX'
+const SHORT_QID_RANGE: RangeInclusive<usize> = 6..=21;
+
+/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in
+/// microseconds -> 4 chars + 'z' + at least one char for the inode
+const LONG_QID_MIN: usize = 12;
+
+/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None.
///
/// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are
/// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52
-/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at
-/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows
-/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`.
-fn parse_qid(data: &[u8], max: usize) -> Option<(&[u8], &[u8])> {
+/// alphabet. The scan tries to determine if it's a short or long QID and cancels
+/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a
+/// long QID). It assumes that only non alphanumeric characters are the delimiter,
+/// so e.g. a long QID followed by a vowel would be detected as invalid.
+fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
// to simplify limit max to data.len()
- let max = max.min(data.len());
- // take at most max, find the first non-alphanumeric byte
- match data
- .iter()
- .take(max)
- .position(|b| !b.is_ascii_alphanumeric())
- {
- // if there were less than 5 return nothing
- // the QID always has at least 5 characters for the microseconds (see
- // http://www.postfix.org/postconf.5.html#enable_long_queue_ids)
- Some(n) if n < 5 => None,
- // otherwise split at the first non-alphanumeric byte
- Some(n) => Some(data.split_at(n)),
- // or return 'max' length QID if no non-alphanumeric byte is found
- None => Some(data.split_at(max)),
- }
+ let max = POSTFIX_QID_MAX_LEN.min(data.len());
+
+ let mut z_seen = false;
+ let mut variant = QidVariant::Unknown;
+
+ let position = data.iter().take(max).position(|b| match variant {
+ QidVariant::Unknown => {
+ if b.is_ascii_hexdigit() {
+ if is_vowel(*b) {
+ // since vowels are forbidden in long qids, it must be a short one
+ variant = QidVariant::Short;
+ }
+ return false;
+ }
+
+ if b.is_ascii_alphanumeric() {
+ if is_vowel(*b) {
+ // it's a non hexadecimal vowel, so it can't be a short or a long qid
+ variant = QidVariant::Invalid;
+ return true;
+ }
+
+ if *b == b'z' {
+ z_seen = true;
+ }
+
+ variant = QidVariant::Long;
+ return false;
+ }
+
+ true
+ }
+ QidVariant::Short => !b.is_ascii_hexdigit(),
+ QidVariant::Long => {
+ if !b.is_ascii_alphanumeric() {
+ return true;
+ }
+
+ if is_vowel(*b) {
+ // vowel detected in a long qid, must be invalid
+ variant = QidVariant::Invalid;
+ return true;
+ }
+
+ if *b == b'z' {
+ z_seen = true;
+ }
+
+ false
+ }
+ QidVariant::Invalid => true,
+ });
+
+ let pos = match (variant, z_seen, position) {
+ // these must be short variants that don't have any vowels in them
+ (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
+ (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
+
+ // short variants
+ (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
+ (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
+
+ (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max),
+ (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n),
+
+ // short variant can't have z -> invalid
+ (QidVariant::Short, true, _) => None,
+
+ // long variant without z is invalid
+ (QidVariant::Long, false, _) => None,
+
+ // this can't happen since we can have only a z_seen in Long or when changing to Long
+ (QidVariant::Unknown, true, _) => {
+ eprintln!("internal error, unknown QID variant with 'z'");
+ None
+ }
+
+ // everything else is also invalid
+ _ => None,
+ };
+
+ pos.map(|pos| data.split_at(pos))
}
/// Parse a queue ID followed by the `": "` delimiter, returning (qid, remaining_text) or None.
///
/// Requiring the delimiter keeps foreign lines, like the output a custom check script logs under
/// the pmg-smtp-filter identifier, from panicking the parser or being recorded under a bogus ID.
-fn parse_qid_prefix(msg: &[u8], max: usize) -> Option<(&[u8], &[u8])> {
- let (qid, data) = parse_qid(msg, max)?;
+fn parse_qid_prefix(msg: &[u8]) -> Option<(&[u8], &[u8])> {
+ let (qid, data) = parse_qid(msg)?;
Some((qid, data.strip_prefix(b": ")?))
}
@@ -2434,13 +2530,13 @@ fn find_lowercase(data: &[u8], needle: &[u8]) -> Option<usize> {
mod tests {
use crate::parse_pmg_smtp_filter_qid;
- use super::{POSTFIX_QID_MAX_LEN, parse_qid, parse_qid_prefix, rotated_logfile};
+ use super::{parse_qid, parse_qid_prefix, rotated_logfile};
#[test]
fn parse_short_hex_qid() {
// legacy, short hexadecimal postfix queue ID
assert_eq!(
- parse_qid(b"0022C3801B5: removed", POSTFIX_QID_MAX_LEN),
+ parse_qid(b"0022C3801B5: removed"),
Some((&b"0022C3801B5"[..], &b": removed"[..])),
);
}
@@ -2450,8 +2546,8 @@ mod tests {
// postfix long queue ID (enable_long_queue_ids = yes) drawn from the
// base-52 alphabet, i.e. containing non-hex letters
assert_eq!(
- parse_qid(b"4Zk8mP2gqRz: removed", POSTFIX_QID_MAX_LEN),
- Some((&b"4Zk8mP2gqRz"[..], &b": removed"[..])),
+ parse_qid(b"4Zk8mP2gqRz1: removed"),
+ Some((&b"4Zk8mP2gqRz1"[..], &b": removed"[..])),
);
}
@@ -2460,7 +2556,7 @@ mod tests {
// worst case long queue ID: 7 sec + 4 usec + 'z' + 12-char base-51 inode
// (64-bit) = 24 chars; it must not be truncated by POSTFIX_QID_MAX_LEN
assert_eq!(
- parse_qid(b"4Zk8mP72gqRzLp7Wn3Yt8Kc5: removed", POSTFIX_QID_MAX_LEN),
+ parse_qid(b"4Zk8mP72gqRzLp7Wn3Yt8Kc5: removed"),
Some((&b"4Zk8mP72gqRzLp7Wn3Yt8Kc5"[..], &b": removed"[..])),
);
}
@@ -2494,13 +2590,13 @@ mod tests {
#[test]
fn reject_too_short_qid() {
// fewer than 5 leading queue-id characters is not a valid queue ID
- assert_eq!(parse_qid(b"ab: x", POSTFIX_QID_MAX_LEN), None);
+ assert_eq!(parse_qid(b"ab: x"), None);
}
#[test]
fn qid_prefix_parses_regular_entries() {
assert_eq!(
- parse_qid_prefix(b"0022C3801B5: removed", POSTFIX_QID_MAX_LEN),
+ parse_qid_prefix(b"0022C3801B5: removed"),
Some((&b"0022C3801B5"[..], &b"removed"[..])),
);
}
@@ -2509,14 +2605,11 @@ mod tests {
fn qid_prefix_rejects_lines_without_delimiter() {
// foreign lines under a matched syslog identifier, like the output of a custom check
// script, must not be mistaken for an entry: a bare delimiter at the end of the line ...
- assert_eq!(parse_qid_prefix(b"DEBUG:", 25), None);
+ assert_eq!(parse_qid_prefix(b"DEBUG:"), None);
// ... a delimiter without the following space ...
- assert_eq!(
- parse_qid_prefix(b"DEBUG:Module loader, version 1088", 25),
- None
- );
+ assert_eq!(parse_qid_prefix(b"DEBUG:Module loader, version 1088"), None);
// ... and no delimiter at all
- assert_eq!(parse_qid_prefix(b"DEBUGOUTPUT", 25), None);
+ assert_eq!(parse_qid_prefix(b"DEBUGOUTPUT"), None);
}
#[test]
diff --git a/tests/test_input_misleading_lines b/tests/test_input_misleading_lines
new file mode 100644
index 0000000..2512ec0
--- /dev/null
+++ b/tests/test_input_misleading_lines
@@ -0,0 +1,20 @@
+2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4
+2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4]
+2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4]
+2026-06-16T02:26:10.588801+02:00 pmg postfix/postscreen[407424]: CONNECT from [1.2.3.4]:44340 to [10.10.10.10]:25
+2026-06-16T02:26:10.589017+02:00 pmg postfix/postscreen[407424]: PASS OLD [1.2.3.4]:44340
+2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4
+2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4]
+2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id=<xyz>
+2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=<test@pmg.local>, size=1337, nrcpt=1 (queue active)
+2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+2026-06-16T02:26:10.673820+02:00 pmg pmg-smtp-filter[4049085]: 2026/06/16-02:26:10 CONNECT TCP Peer: "[127.0.0.1]:33030" Local: "[127.0.0.1]:10024"
+2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id=<xyz>
+2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999)
+2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for <test@pmg.local> to spam quarantine - 432143214321 (rule: Quarantine)
+2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0)
+2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=<test@pmg.local>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234))
+2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed
+2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
diff --git a/tests/test_output_misleading_lines b/tests/test_output_misleading_lines
new file mode 100644
index 0000000..b54ca32
--- /dev/null
+++ b/tests/test_output_misleading_lines
@@ -0,0 +1,38 @@
+# LogReader: 3647914
+# Query options
+# Start: 2020-12-18 15:00:00 (1608300000)
+# End: 2026-06-16 15:40:00 (1781617200)
+# End Query Options
+
+QENTRY: 123412341234
+CTIME: 6A309823
+SIZE: 1337
+CLIENT: unknown[1.2.3.4]
+MSGID: <xyz>
+TO:6A309823:123412341234:Q: from <test@pmg.local> to <test@pmg.local> (432143214321)
+SMTP:
+L00000001 2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4
+L00000002 2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4]
+L00000003 2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+L00000004 2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4]
+L0000000B 2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+FILTER: 123412341234
+L0000000D 2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id=<xyz>
+L0000000F 2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999)
+L00000010 2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for <test@pmg.local> to spam quarantine - 432143214321 (rule: Quarantine)
+L00000011 2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0)
+QMGR:
+L00000009 2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id=<xyz>
+L0000000A 2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=<test@pmg.local>, size=1337, nrcpt=1 (queue active)
+L00000012 2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=<test@pmg.local>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234))
+L00000013 2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed
+
+SMTPD: T6A309822L00000004
+CTIME: 6A309824
+CLIENT: unknown[1.2.3.4]
+LOGS:
+L00000007 2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4
+L00000008 2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4]
+L0000000E 2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+L00000014 2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+
diff --git a/tests/tests_misleading_lines.rs b/tests/tests_misleading_lines.rs
new file mode 100644
index 0000000..fe6a652
--- /dev/null
+++ b/tests/tests_misleading_lines.rs
@@ -0,0 +1,39 @@
+use std::fs::File;
+use std::io::BufReader;
+use std::process::Command;
+mod utils;
+
+// Some postfix lines look like they could be a QID, like 'warning: [..];' but
+// aren't. Test that they don't produce any stray messages on stderr when multiple
+// like this exist (meaning they would have been parsed as QIDs)
+#[test]
+fn misleading_lines_are_ignored() {
+ let output = Command::new("faketime")
+ .env("TZ", "Europe/Vienna")
+ .arg("2020-12-31 23:59:59")
+ .arg(utils::log_tracker_path())
+ .arg("-vv")
+ .arg("-s")
+ .arg("2020-12-18 15:00:00")
+ .arg("-e")
+ .arg("2026-06-16 15:40:00")
+ .arg("-i")
+ .arg("tests/test_input_misleading_lines")
+ .output()
+ .expect("failed to execute pmg-log-tracker");
+
+ assert_eq!(String::from_utf8_lossy(&output.stderr), "".to_string());
+
+ assert!(
+ output.status.success(),
+ "pmg-log-tracker failed: {}",
+ String::from_utf8_lossy(&output.stderr)
+ );
+
+ let expected_file =
+ File::open("tests/test_output_misleading_lines").expect("failed to open test_output");
+
+ let expected_output = BufReader::new(&expected_file);
+ let output_reader = BufReader::new(&output.stdout[..]);
+ utils::compare_output(output_reader, expected_output);
+}
--
2.47.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing
2026-06-16 15:24 ` [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing Dominik Csapak
@ 2026-06-16 16:26 ` Stoiko Ivanov
0 siblings, 0 replies; 3+ messages in thread
From: Stoiko Ivanov @ 2026-06-16 16:26 UTC (permalink / raw)
To: Dominik Csapak; +Cc: pmg-devel
Thanks for tackling this and the analysis and benchmarks!
fwiw the code reads nicer to me with the added parsing!- a few
questions/suggestions inline:
On Tue, 16 Jun 2026 17:24:07 +0200
Dominik Csapak <d.csapak@proxmox.com> wrote:
> since adding support for postfix's long QIDs, the criteria was that the
> QIDs had to be alphanumeric (since no built-in base52 check exists).
this could be added - but to use it we'd need to split short-qid and
long-qid parsing as you did here anyways!
>..snip..
> While at it, we can remove the max length parameter from the
> parse_qid{_prefix} functions, since they were all called with the same
> length limit anyway.
the dropping of the parameter could be a separate patch imho (but fine to
apply as is if it's cumbersome to split)
>
> /// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes`
> /// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is
> /// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the
> @@ -2130,39 +2151,114 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
> /// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first.
> const POSTFIX_QID_MAX_LEN: usize = 25;
>
> -/// Parse a queue ID and return a tuple of (qid, remaining_text) or None.
<> +/// A short postfix QID can only have between 6 and 21 characters, 5 from the timestamp in usec and
nit - that's the usec portion of the timestamp (not the whole), based on
the comment in:
https://github.com/vdukhovni/postfix/blob/5f26cb870afd39245203703984f54643a001cf2e/postfix/src/global/mail_queue.h#L153
> +/// between 1 and 16 from the inode nr printed as '%lX'
> +const SHORT_QID_RANGE: RangeInclusive<usize> = 6..=21;
> +
> +/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in
> +/// microseconds -> 4 chars + 'z' + at least one char for the inode
> +const LONG_QID_MIN: usize = 12;
not sure - but would it be more consistent/readable if we use a range here
as well (despite the upper bound already being enforced by
POSTFIX_QID_MAX_LEN) - at least I wondered for a second why we use 2
bounds for the short id's, and only a lower for the long ones below?
> +
> +/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None.
> ///
> /// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are
> /// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52
> -/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at
> -/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows
> -/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`.
> -fn parse_qid(data: &[u8], max: usize) -> Option<(&[u8], &[u8])> {
> +/// alphabet. The scan tries to determine if it's a short or long QID and cancels
> +/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a
> +/// long QID). It assumes that only non alphanumeric characters are the delimiter,
> +/// so e.g. a long QID followed by a vowel would be detected as invalid.
> +fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
> // to simplify limit max to data.len()
> - let max = max.min(data.len());
> - // take at most max, find the first non-alphanumeric byte
> - match data
> - .iter()
> - .take(max)
> - .position(|b| !b.is_ascii_alphanumeric())
> - {
> - // if there were less than 5 return nothing
> - // the QID always has at least 5 characters for the microseconds (see
> - // http://www.postfix.org/postconf.5.html#enable_long_queue_ids)
> - Some(n) if n < 5 => None,
> - // otherwise split at the first non-alphanumeric byte
> - Some(n) => Some(data.split_at(n)),
> - // or return 'max' length QID if no non-alphanumeric byte is found
> - None => Some(data.split_at(max)),
> - }
> + let max = POSTFIX_QID_MAX_LEN.min(data.len());
> +
> + let mut z_seen = false;
> + let mut variant = QidVariant::Unknown;
> +
> + let position = data.iter().take(max).position(|b| match variant {
> + QidVariant::Unknown => {
> + if b.is_ascii_hexdigit() {
pre-existing - and also in your patch 1/2 (pmg-smtp-filter also uses
%X as format-placeholder) - but the hexdigits only use upper-case A-F -
could make sense to restrict this here too (and add is_uppercase_hexdigit
above)?
> + if is_vowel(*b) {
> + // since vowels are forbidden in long qids, it must be a short one
> + variant = QidVariant::Short;
> + }
> + return false;
> + }
> +
> + if b.is_ascii_alphanumeric() {
> + if is_vowel(*b) {
> + // it's a non hexadecimal vowel, so it can't be a short or a long qid
> + variant = QidVariant::Invalid;
> + return true;
> + }
> +
> + if *b == b'z' {
> + z_seen = true;
> + }
this would only happen if 'z' is the first character in a queue-id? - that
would be invalid as well?
> +
> + variant = QidVariant::Long;
> + return false;
> + }
> +
> + true
> + }
> + QidVariant::Short => !b.is_ascii_hexdigit(),
> + QidVariant::Long => {
> + if !b.is_ascii_alphanumeric() {
> + return true;
> + }
> +
> + if is_vowel(*b) {
> + // vowel detected in a long qid, must be invalid
> + variant = QidVariant::Invalid;
> + return true;
> + }
> +
> + if *b == b'z' {
> + z_seen = true;
> + }
> +
> + false
> + }
> + QidVariant::Invalid => true,
> + });
> +
> + let pos = match (variant, z_seen, position) {
> + // these must be short variants that don't have any vowels in them
> + (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
> + (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
> +
> + // short variants
> + (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
> + (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
> +
> + (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max),
> + (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n),
> +
> + // short variant can't have z -> invalid
> + (QidVariant::Short, true, _) => None,
> +
> + // long variant without z is invalid
> + (QidVariant::Long, false, _) => None,
> +
> + // this can't happen since we can have only a z_seen in Long or when changing to Long
> + (QidVariant::Unknown, true, _) => {
> + eprintln!("internal error, unknown QID variant with 'z'");
> + None
> + }
> +
> + // everything else is also invalid
> + _ => None,
> + };
> +
> + pos.map(|pos| data.split_at(pos))
> }
>
>..snip..
the rest looks ok to me - and from a quick check the test-case explains
the issue quite well.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-16 16:27 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 15:24 [PATCH log-tracker 1/2] split out pmg-smtp-filter qid parsing Dominik Csapak
2026-06-16 15:24 ` [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing Dominik Csapak
2026-06-16 16:26 ` Stoiko Ivanov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.