* [PATCH log-tracker 1/2] split out pmg-smtp-filter qid parsing
@ 2026-06-16 15:24 Dominik Csapak
2026-06-16 15:24 ` [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing Dominik Csapak
0 siblings, 1 reply; 4+ messages in thread
From: Dominik Csapak @ 2026-06-16 15:24 UTC (permalink / raw)
To: pmg-devel
By splitting out the parsing of the pmg-smtp-filter QIDs, we can more
rigorously parse them since the parser does not have to overlap with
postfix's QIDs'. We have a bit of code duplication, but more clarity
when reading the code and better checks.
Since we know how long the pmg-smtp-filter QIDs must be, we can also
filter by the length.
Since we know the lengths of a pmg-smtp-filter QID, we don't need to
pass the max-len from outside.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/main.rs | 67 ++++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 61 insertions(+), 6 deletions(-)
diff --git a/src/main.rs b/src/main.rs
index 0b6e92f..89d2768 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -111,7 +111,7 @@ fn main() -> Result<(), Error> {
// we match 4 entries, all beginning with a QID
// accept mail, move mail, block mail and the processing time
fn handle_pmg_smtp_filter_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
- let (qid, data) = match parse_qid_prefix(msg, 25) {
+ let (qid, data) = match parse_pmg_smtp_filter_qid_prefix(msg) {
Some((q, m)) => (q, m),
None => return,
};
@@ -167,7 +167,7 @@ fn handle_pmg_smtp_filter_message(msg: &[u8], parser: &mut Parser, complete_line
None => return,
};
let data = &data[qid_index + 13..];
- let (qid, _) = match parse_qid(data, 25) {
+ let (qid, _) = match parse_pmg_smtp_filter_qid(data) {
Some(t) => t,
None => return,
};
@@ -407,7 +407,7 @@ fn handle_lmtp_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
}
// this is the QID of the associated pmg-smtp-filter
- let (qid, _) = match parse_qid(data, 25) {
+ let (qid, _) = match parse_pmg_smtp_filter_qid(data) {
Some(t) => t,
None => return,
};
@@ -543,7 +543,7 @@ fn handle_smtpd_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
return;
}
let data = &data[14..];
- if let Some((qid, data)) = parse_qid(data, 25) {
+ if let Some((qid, data)) = parse_pmg_smtp_filter_qid(data) {
let fe = get_or_create_fentry(&mut parser.fentries, qid);
// set the FEntry to before-queue filtered
fe.borrow_mut().is_bq = true;
@@ -597,7 +597,7 @@ fn handle_smtpd_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
if let Some(qid_index) = find(data, b"(") {
let data = &data[qid_index + 1..];
- if let Some((qid, _)) = parse_qid(data, 25) {
+ if let Some((qid, _)) = parse_pmg_smtp_filter_qid(data) {
let fe = get_or_create_fentry(&mut parser.fentries, qid);
// set the FEntry to before-queue filtered
fe.borrow_mut().is_bq = true;
@@ -2194,6 +2194,42 @@ fn parse_number(data: &[u8], max_digits: usize) -> Option<(usize, &[u8])> {
}
}
+/// pmg-smtp-filter QIDs must be at least 7 hex characters (see src/PMG/MailQueue.pm in pmg-api)
+const PMG_SMTP_FILTER_QID_MIN_LEN: usize = 7;
+
+const PMG_SMTP_FILTER_QID_MAX_LEN: usize = 25;
+
+/// Parse a queue ID followed by the `": "` delimiter, returning (qid, remaining_text) or None.
+///
+/// Requiring the delimiter keeps foreign lines, like the output a custom check script logs under
+/// the pmg-smtp-filter identifier, from panicking the parser or being recorded under a bogus ID.
+fn parse_pmg_smtp_filter_qid_prefix(msg: &[u8]) -> Option<(&[u8], &[u8])> {
+ let (qid, data) = parse_pmg_smtp_filter_qid(msg)?;
+ Some((qid, data.strip_prefix(b": ")?))
+}
+
+/// Parse a queue ID from pmg-smtp-filter and return a tuple of (qid, remaining_text) or None.
+///
+/// Queue IDs are hexadecimal (`[0-9A-Fa-f]`) so only parse until the first non-hexadecimal
+/// character is encountered.
+fn parse_pmg_smtp_filter_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
+ // to simplify limit max to data.len()
+ let max = PMG_SMTP_FILTER_QID_MAX_LEN.min(data.len());
+
+ if max < PMG_SMTP_FILTER_QID_MIN_LEN {
+ return None;
+ }
+ // take at most max, find the first non-hexdigit byte
+ match data.iter().take(max).position(|b| !b.is_ascii_hexdigit()) {
+ // too short
+ Some(n) if n < PMG_SMTP_FILTER_QID_MIN_LEN => None,
+ // otherwise split at the first non-hexadecimal character
+ Some(n) => Some(data.split_at(n)),
+ // or return 'max' length QID if no non-hexadecimal character is found
+ None => Some(data.split_at(max)),
+ }
+}
+
/// Parse time. Returns a tuple of (parsed_time, remaining_text) or None.
fn parse_time(
data: &'_ [u8],
@@ -2396,6 +2432,8 @@ fn find_lowercase(data: &[u8], needle: &[u8]) -> Option<usize> {
#[cfg(test)]
mod tests {
+ use crate::parse_pmg_smtp_filter_qid;
+
use super::{POSTFIX_QID_MAX_LEN, parse_qid, parse_qid_prefix, rotated_logfile};
#[test]
@@ -2427,11 +2465,28 @@ mod tests {
);
}
+ #[test]
+ fn parse_filter_ids() {
+ assert_eq!(
+ parse_pmg_smtp_filter_qid(b"3802E45DFA503808B06"),
+ Some((&b"3802E45DFA503808B06"[..], &b""[..])),
+ );
+ }
+
+ #[test]
+ fn parse_filter_id_failing() {
+ // these tests should all return None so not parsing
+ assert_eq!(parse_pmg_smtp_filter_qid(b"3"), None);
+ assert_eq!(parse_pmg_smtp_filter_qid(b""), None);
+ assert_eq!(parse_pmg_smtp_filter_qid(b"Warning"), None);
+ assert_eq!(parse_pmg_smtp_filter_qid(b"ThisIsAtest"), None);
+ }
+
#[test]
fn parse_filter_id_terminated_by_paren() {
// pmg-smtp-filter ID, terminated by ')'
assert_eq!(
- parse_qid(b"3802E45DFA503808B06)", 25),
+ parse_pmg_smtp_filter_qid(b"3802E45DFA503808B06)"),
Some((&b"3802E45DFA503808B06"[..], &b")"[..])),
);
}
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing
2026-06-16 15:24 [PATCH log-tracker 1/2] split out pmg-smtp-filter qid parsing Dominik Csapak
@ 2026-06-16 15:24 ` Dominik Csapak
2026-06-16 16:26 ` Stoiko Ivanov
0 siblings, 1 reply; 4+ messages in thread
From: Dominik Csapak @ 2026-06-16 15:24 UTC (permalink / raw)
To: pmg-devel
since adding support for postfix's long QIDs, the criteria was that the
QIDs had to be alphanumeric (since no built-in base52 check exists).
This lead to false positives in some logs that had lines such as:
postfix/smtpd[xxxx]: warning: hostname example.org does not resolve to address x.y.z.w
Where the 'warning:' was interpreted as QIDs.
(There are other things postfix can print here that were too interpreted
as QIDs, this is just an example).
To fix it, add a simple state machine during parsing that keeps track of
what should be encountered in a long/short QID and abort if there is an
impossible combination (e.g. vowels in a long QID).
Since adding new checks here could be performance critical, I
benchmarked it and it showed no significant performance changes.
Setup:
A ~2.5 GiB large syslog (on ram; otherwise we're likely IO bottlenecked)
filtered for some time range and a search for a from address:
Benchmark 1: current-master
Time (mean ± σ): 4.017 s ± 0.013 s [User: 3.765 s, System: 0.252 s]
Range (min … max): 4.004 s … 4.041 s 10 runs
Benchmark 2: this-patch
Time (mean ± σ): 3.998 s ± 0.034 s [User: 3.752 s, System: 0.245 s]
Range (min … max): 3.938 s … 4.073 s 10 runs
Summary
this-patch ran
1.00 ± 0.01 times faster than current-master
While at it, we can remove the max length parameter from the
parse_qid{_prefix} functions, since they were all called with the same
length limit anyway.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/main.rs | 173 ++++++++++++++++++++++-------
tests/test_input_misleading_lines | 20 ++++
tests/test_output_misleading_lines | 38 +++++++
tests/tests_misleading_lines.rs | 39 +++++++
4 files changed, 230 insertions(+), 40 deletions(-)
create mode 100644 tests/test_input_misleading_lines
create mode 100644 tests/test_output_misleading_lines
create mode 100644 tests/tests_misleading_lines.rs
diff --git a/src/main.rs b/src/main.rs
index 89d2768..db74648 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -1,6 +1,7 @@
use std::cell::RefCell;
use std::collections::HashMap;
use std::ffi::CString;
+use std::ops::RangeInclusive;
use std::rc::{Rc, Weak};
use std::fs::File;
@@ -269,7 +270,7 @@ fn handle_postscreen_message(msg: &[u8], parser: &mut Parser, complete_line: &[u
// these only appear in the 'after-queue filter' case or when the mail is
// 'accepted' in the 'before-queue filter' case
fn handle_qmgr_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
- let (qid, data) = match parse_qid_prefix(msg, POSTFIX_QID_MAX_LEN) {
+ let (qid, data) = match parse_qid_prefix(msg) {
Some(t) => t,
None => return,
};
@@ -329,7 +330,7 @@ fn handle_lmtp_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
return;
}
- let (qid, data) = match parse_qid_prefix(msg, POSTFIX_QID_MAX_LEN) {
+ let (qid, data) = match parse_qid_prefix(msg) {
Some((q, t)) => (q, t),
None => return,
};
@@ -638,7 +639,7 @@ fn handle_smtpd_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
// with none of the other messages matching, we try for a QID to match the
// corresponding QEntry to the SEntry
- let (qid, data) = match parse_qid_prefix(msg, POSTFIX_QID_MAX_LEN) {
+ let (qid, data) = match parse_qid_prefix(msg) {
Some(t) => t,
None => return,
};
@@ -668,7 +669,7 @@ fn handle_smtpd_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
// happens before the mail is passed to qmgr (after-queue or before-queue
// accepted only)
fn handle_cleanup_message(msg: &[u8], parser: &mut Parser, complete_line: &[u8]) {
- let (qid, data) = match parse_qid_prefix(msg, POSTFIX_QID_MAX_LEN) {
+ let (qid, data) = match parse_qid_prefix(msg) {
Some(t) => t,
None => return,
};
@@ -2123,6 +2124,26 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
}
}
+fn is_vowel(c: u8) -> bool {
+ c == b'A'
+ || c == b'E'
+ || c == b'a'
+ || c == b'e'
+ || c == b'I'
+ || c == b'i'
+ || c == b'O'
+ || c == b'o'
+ || c == b'U'
+ || c == b'u'
+}
+
+enum QidVariant {
+ Unknown,
+ Short,
+ Long,
+ Invalid,
+}
+
/// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes`
/// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is
/// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the
@@ -2130,39 +2151,114 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
/// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first.
const POSTFIX_QID_MAX_LEN: usize = 25;
-/// Parse a queue ID and return a tuple of (qid, remaining_text) or None.
+/// A short postfix QID can only have between 6 and 21 characters, 5 from the timestamp in usec and
+/// between 1 and 16 from the inode nr printed as '%lX'
+const SHORT_QID_RANGE: RangeInclusive<usize> = 6..=21;
+
+/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in
+/// microseconds -> 4 chars + 'z' + at least one char for the inode
+const LONG_QID_MIN: usize = 12;
+
+/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None.
///
/// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are
/// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52
-/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at
-/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows
-/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`.
-fn parse_qid(data: &[u8], max: usize) -> Option<(&[u8], &[u8])> {
+/// alphabet. The scan tries to determine if it's a short or long QID and cancels
+/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a
+/// long QID). It assumes that only non alphanumeric characters are the delimiter,
+/// so e.g. a long QID followed by a vowel would be detected as invalid.
+fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
// to simplify limit max to data.len()
- let max = max.min(data.len());
- // take at most max, find the first non-alphanumeric byte
- match data
- .iter()
- .take(max)
- .position(|b| !b.is_ascii_alphanumeric())
- {
- // if there were less than 5 return nothing
- // the QID always has at least 5 characters for the microseconds (see
- // http://www.postfix.org/postconf.5.html#enable_long_queue_ids)
- Some(n) if n < 5 => None,
- // otherwise split at the first non-alphanumeric byte
- Some(n) => Some(data.split_at(n)),
- // or return 'max' length QID if no non-alphanumeric byte is found
- None => Some(data.split_at(max)),
- }
+ let max = POSTFIX_QID_MAX_LEN.min(data.len());
+
+ let mut z_seen = false;
+ let mut variant = QidVariant::Unknown;
+
+ let position = data.iter().take(max).position(|b| match variant {
+ QidVariant::Unknown => {
+ if b.is_ascii_hexdigit() {
+ if is_vowel(*b) {
+ // since vowels are forbidden in long qids, it must be a short one
+ variant = QidVariant::Short;
+ }
+ return false;
+ }
+
+ if b.is_ascii_alphanumeric() {
+ if is_vowel(*b) {
+ // it's a non hexadecimal vowel, so it can't be a short or a long qid
+ variant = QidVariant::Invalid;
+ return true;
+ }
+
+ if *b == b'z' {
+ z_seen = true;
+ }
+
+ variant = QidVariant::Long;
+ return false;
+ }
+
+ true
+ }
+ QidVariant::Short => !b.is_ascii_hexdigit(),
+ QidVariant::Long => {
+ if !b.is_ascii_alphanumeric() {
+ return true;
+ }
+
+ if is_vowel(*b) {
+ // vowel detected in a long qid, must be invalid
+ variant = QidVariant::Invalid;
+ return true;
+ }
+
+ if *b == b'z' {
+ z_seen = true;
+ }
+
+ false
+ }
+ QidVariant::Invalid => true,
+ });
+
+ let pos = match (variant, z_seen, position) {
+ // these must be short variants that don't have any vowels in them
+ (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
+ (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
+
+ // short variants
+ (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
+ (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
+
+ (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max),
+ (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n),
+
+ // short variant can't have z -> invalid
+ (QidVariant::Short, true, _) => None,
+
+ // long variant without z is invalid
+ (QidVariant::Long, false, _) => None,
+
+ // this can't happen since we can have only a z_seen in Long or when changing to Long
+ (QidVariant::Unknown, true, _) => {
+ eprintln!("internal error, unknown QID variant with 'z'");
+ None
+ }
+
+ // everything else is also invalid
+ _ => None,
+ };
+
+ pos.map(|pos| data.split_at(pos))
}
/// Parse a queue ID followed by the `": "` delimiter, returning (qid, remaining_text) or None.
///
/// Requiring the delimiter keeps foreign lines, like the output a custom check script logs under
/// the pmg-smtp-filter identifier, from panicking the parser or being recorded under a bogus ID.
-fn parse_qid_prefix(msg: &[u8], max: usize) -> Option<(&[u8], &[u8])> {
- let (qid, data) = parse_qid(msg, max)?;
+fn parse_qid_prefix(msg: &[u8]) -> Option<(&[u8], &[u8])> {
+ let (qid, data) = parse_qid(msg)?;
Some((qid, data.strip_prefix(b": ")?))
}
@@ -2434,13 +2530,13 @@ fn find_lowercase(data: &[u8], needle: &[u8]) -> Option<usize> {
mod tests {
use crate::parse_pmg_smtp_filter_qid;
- use super::{POSTFIX_QID_MAX_LEN, parse_qid, parse_qid_prefix, rotated_logfile};
+ use super::{parse_qid, parse_qid_prefix, rotated_logfile};
#[test]
fn parse_short_hex_qid() {
// legacy, short hexadecimal postfix queue ID
assert_eq!(
- parse_qid(b"0022C3801B5: removed", POSTFIX_QID_MAX_LEN),
+ parse_qid(b"0022C3801B5: removed"),
Some((&b"0022C3801B5"[..], &b": removed"[..])),
);
}
@@ -2450,8 +2546,8 @@ mod tests {
// postfix long queue ID (enable_long_queue_ids = yes) drawn from the
// base-52 alphabet, i.e. containing non-hex letters
assert_eq!(
- parse_qid(b"4Zk8mP2gqRz: removed", POSTFIX_QID_MAX_LEN),
- Some((&b"4Zk8mP2gqRz"[..], &b": removed"[..])),
+ parse_qid(b"4Zk8mP2gqRz1: removed"),
+ Some((&b"4Zk8mP2gqRz1"[..], &b": removed"[..])),
);
}
@@ -2460,7 +2556,7 @@ mod tests {
// worst case long queue ID: 7 sec + 4 usec + 'z' + 12-char base-51 inode
// (64-bit) = 24 chars; it must not be truncated by POSTFIX_QID_MAX_LEN
assert_eq!(
- parse_qid(b"4Zk8mP72gqRzLp7Wn3Yt8Kc5: removed", POSTFIX_QID_MAX_LEN),
+ parse_qid(b"4Zk8mP72gqRzLp7Wn3Yt8Kc5: removed"),
Some((&b"4Zk8mP72gqRzLp7Wn3Yt8Kc5"[..], &b": removed"[..])),
);
}
@@ -2494,13 +2590,13 @@ mod tests {
#[test]
fn reject_too_short_qid() {
// fewer than 5 leading queue-id characters is not a valid queue ID
- assert_eq!(parse_qid(b"ab: x", POSTFIX_QID_MAX_LEN), None);
+ assert_eq!(parse_qid(b"ab: x"), None);
}
#[test]
fn qid_prefix_parses_regular_entries() {
assert_eq!(
- parse_qid_prefix(b"0022C3801B5: removed", POSTFIX_QID_MAX_LEN),
+ parse_qid_prefix(b"0022C3801B5: removed"),
Some((&b"0022C3801B5"[..], &b"removed"[..])),
);
}
@@ -2509,14 +2605,11 @@ mod tests {
fn qid_prefix_rejects_lines_without_delimiter() {
// foreign lines under a matched syslog identifier, like the output of a custom check
// script, must not be mistaken for an entry: a bare delimiter at the end of the line ...
- assert_eq!(parse_qid_prefix(b"DEBUG:", 25), None);
+ assert_eq!(parse_qid_prefix(b"DEBUG:"), None);
// ... a delimiter without the following space ...
- assert_eq!(
- parse_qid_prefix(b"DEBUG:Module loader, version 1088", 25),
- None
- );
+ assert_eq!(parse_qid_prefix(b"DEBUG:Module loader, version 1088"), None);
// ... and no delimiter at all
- assert_eq!(parse_qid_prefix(b"DEBUGOUTPUT", 25), None);
+ assert_eq!(parse_qid_prefix(b"DEBUGOUTPUT"), None);
}
#[test]
diff --git a/tests/test_input_misleading_lines b/tests/test_input_misleading_lines
new file mode 100644
index 0000000..2512ec0
--- /dev/null
+++ b/tests/test_input_misleading_lines
@@ -0,0 +1,20 @@
+2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4
+2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4]
+2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4]
+2026-06-16T02:26:10.588801+02:00 pmg postfix/postscreen[407424]: CONNECT from [1.2.3.4]:44340 to [10.10.10.10]:25
+2026-06-16T02:26:10.589017+02:00 pmg postfix/postscreen[407424]: PASS OLD [1.2.3.4]:44340
+2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4
+2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4]
+2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id=<xyz>
+2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=<test@pmg.local>, size=1337, nrcpt=1 (queue active)
+2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+2026-06-16T02:26:10.673820+02:00 pmg pmg-smtp-filter[4049085]: 2026/06/16-02:26:10 CONNECT TCP Peer: "[127.0.0.1]:33030" Local: "[127.0.0.1]:10024"
+2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id=<xyz>
+2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999)
+2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for <test@pmg.local> to spam quarantine - 432143214321 (rule: Quarantine)
+2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0)
+2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=<test@pmg.local>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234))
+2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed
+2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
diff --git a/tests/test_output_misleading_lines b/tests/test_output_misleading_lines
new file mode 100644
index 0000000..b54ca32
--- /dev/null
+++ b/tests/test_output_misleading_lines
@@ -0,0 +1,38 @@
+# LogReader: 3647914
+# Query options
+# Start: 2020-12-18 15:00:00 (1608300000)
+# End: 2026-06-16 15:40:00 (1781617200)
+# End Query Options
+
+QENTRY: 123412341234
+CTIME: 6A309823
+SIZE: 1337
+CLIENT: unknown[1.2.3.4]
+MSGID: <xyz>
+TO:6A309823:123412341234:Q: from <test@pmg.local> to <test@pmg.local> (432143214321)
+SMTP:
+L00000001 2026-06-16T02:26:09.196080+02:00 pmg postfix/smtpd[3971220]: warning: hostname example.com does not resolve to address 1.2.3.4
+L00000002 2026-06-16T02:26:09.196262+02:00 pmg postfix/smtpd[3971220]: connect from unknown[1.2.3.4]
+L00000003 2026-06-16T02:26:10.026205+02:00 pmg postfix/smtpd[3971220]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+L00000004 2026-06-16T02:26:10.464588+02:00 pmg postfix/smtpd[3971220]: 123412341234: client=unknown[1.2.3.4]
+L0000000B 2026-06-16T02:26:10.673044+02:00 pmg postfix/smtpd[3971220]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+FILTER: 123412341234
+L0000000D 2026-06-16T02:26:10.727199+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: new mail message-id=<xyz>
+L0000000F 2026-06-16T02:26:11.753241+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: SA score=9999/5 time=0.328 bayes=undefined autolearn=disabled hits=SOME_RULE(9999)
+L00000010 2026-06-16T02:26:11.767261+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: moved mail for <test@pmg.local> to spam quarantine - 432143214321 (rule: Quarantine)
+L00000011 2026-06-16T02:26:11.770191+02:00 pmg pmg-smtp-filter[4049085]: 123412341234: processing time: 2.054 seconds (0.908, 0.079, 0)
+QMGR:
+L00000009 2026-06-16T02:26:10.662240+02:00 pmg postfix/cleanup[3949936]: 123412341234: message-id=<xyz>
+L0000000A 2026-06-16T02:26:10.672845+02:00 pmg postfix/qmgr[407346]: 123412341234: from=<test@pmg.local>, size=1337, nrcpt=1 (queue active)
+L00000012 2026-06-16T02:26:11.770749+02:00 pmg postfix/lmtp[3918143]: 123412341234: to=<test@pmg.local>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.3, delays=0.21/0/0.04/1.1, dsn=2.5.0, status=sent (250 2.5.0 OK (123412341234))
+L00000013 2026-06-16T02:26:11.771119+02:00 pmg postfix/qmgr[407346]: 123412341234: removed
+
+SMTPD: T6A309822L00000004
+CTIME: 6A309824
+CLIENT: unknown[1.2.3.4]
+LOGS:
+L00000007 2026-06-16T02:26:10.591164+02:00 pmg postfix/smtpd[3894722]: warning: hostname example.com does not resolve to address 1.2.3.4
+L00000008 2026-06-16T02:26:10.591339+02:00 pmg postfix/smtpd[3894722]: connect from unknown[1.2.3.4]
+L0000000E 2026-06-16T02:26:11.460707+02:00 pmg postfix/smtpd[3894722]: Anonymous TLS connection established from unknown[1.2.3.4]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange secp256r1 server-signature RSA-PSS (4096 bits) server-digest SHA256
+L00000014 2026-06-16T02:26:12.123422+02:00 pmg postfix/smtpd[3894722]: disconnect from unknown[1.2.3.4] ehlo=2 starttls=1 mail=1 rcpt=1 bdat=2 quit=1 commands=8
+
diff --git a/tests/tests_misleading_lines.rs b/tests/tests_misleading_lines.rs
new file mode 100644
index 0000000..fe6a652
--- /dev/null
+++ b/tests/tests_misleading_lines.rs
@@ -0,0 +1,39 @@
+use std::fs::File;
+use std::io::BufReader;
+use std::process::Command;
+mod utils;
+
+// Some postfix lines look like they could be a QID, like 'warning: [..];' but
+// aren't. Test that they don't produce any stray messages on stderr when multiple
+// like this exist (meaning they would have been parsed as QIDs)
+#[test]
+fn misleading_lines_are_ignored() {
+ let output = Command::new("faketime")
+ .env("TZ", "Europe/Vienna")
+ .arg("2020-12-31 23:59:59")
+ .arg(utils::log_tracker_path())
+ .arg("-vv")
+ .arg("-s")
+ .arg("2020-12-18 15:00:00")
+ .arg("-e")
+ .arg("2026-06-16 15:40:00")
+ .arg("-i")
+ .arg("tests/test_input_misleading_lines")
+ .output()
+ .expect("failed to execute pmg-log-tracker");
+
+ assert_eq!(String::from_utf8_lossy(&output.stderr), "".to_string());
+
+ assert!(
+ output.status.success(),
+ "pmg-log-tracker failed: {}",
+ String::from_utf8_lossy(&output.stderr)
+ );
+
+ let expected_file =
+ File::open("tests/test_output_misleading_lines").expect("failed to open test_output");
+
+ let expected_output = BufReader::new(&expected_file);
+ let output_reader = BufReader::new(&output.stdout[..]);
+ utils::compare_output(output_reader, expected_output);
+}
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing
2026-06-16 15:24 ` [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing Dominik Csapak
@ 2026-06-16 16:26 ` Stoiko Ivanov
2026-06-17 6:44 ` Dominik Csapak
0 siblings, 1 reply; 4+ messages in thread
From: Stoiko Ivanov @ 2026-06-16 16:26 UTC (permalink / raw)
To: Dominik Csapak; +Cc: pmg-devel
Thanks for tackling this and the analysis and benchmarks!
fwiw the code reads nicer to me with the added parsing!- a few
questions/suggestions inline:
On Tue, 16 Jun 2026 17:24:07 +0200
Dominik Csapak <d.csapak@proxmox.com> wrote:
> since adding support for postfix's long QIDs, the criteria was that the
> QIDs had to be alphanumeric (since no built-in base52 check exists).
this could be added - but to use it we'd need to split short-qid and
long-qid parsing as you did here anyways!
>..snip..
> While at it, we can remove the max length parameter from the
> parse_qid{_prefix} functions, since they were all called with the same
> length limit anyway.
the dropping of the parameter could be a separate patch imho (but fine to
apply as is if it's cumbersome to split)
>
> /// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes`
> /// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is
> /// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the
> @@ -2130,39 +2151,114 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
> /// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first.
> const POSTFIX_QID_MAX_LEN: usize = 25;
>
> -/// Parse a queue ID and return a tuple of (qid, remaining_text) or None.
<> +/// A short postfix QID can only have between 6 and 21 characters, 5 from the timestamp in usec and
nit - that's the usec portion of the timestamp (not the whole), based on
the comment in:
https://github.com/vdukhovni/postfix/blob/5f26cb870afd39245203703984f54643a001cf2e/postfix/src/global/mail_queue.h#L153
> +/// between 1 and 16 from the inode nr printed as '%lX'
> +const SHORT_QID_RANGE: RangeInclusive<usize> = 6..=21;
> +
> +/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in
> +/// microseconds -> 4 chars + 'z' + at least one char for the inode
> +const LONG_QID_MIN: usize = 12;
not sure - but would it be more consistent/readable if we use a range here
as well (despite the upper bound already being enforced by
POSTFIX_QID_MAX_LEN) - at least I wondered for a second why we use 2
bounds for the short id's, and only a lower for the long ones below?
> +
> +/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None.
> ///
> /// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are
> /// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52
> -/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at
> -/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows
> -/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`.
> -fn parse_qid(data: &[u8], max: usize) -> Option<(&[u8], &[u8])> {
> +/// alphabet. The scan tries to determine if it's a short or long QID and cancels
> +/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a
> +/// long QID). It assumes that only non alphanumeric characters are the delimiter,
> +/// so e.g. a long QID followed by a vowel would be detected as invalid.
> +fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
> // to simplify limit max to data.len()
> - let max = max.min(data.len());
> - // take at most max, find the first non-alphanumeric byte
> - match data
> - .iter()
> - .take(max)
> - .position(|b| !b.is_ascii_alphanumeric())
> - {
> - // if there were less than 5 return nothing
> - // the QID always has at least 5 characters for the microseconds (see
> - // http://www.postfix.org/postconf.5.html#enable_long_queue_ids)
> - Some(n) if n < 5 => None,
> - // otherwise split at the first non-alphanumeric byte
> - Some(n) => Some(data.split_at(n)),
> - // or return 'max' length QID if no non-alphanumeric byte is found
> - None => Some(data.split_at(max)),
> - }
> + let max = POSTFIX_QID_MAX_LEN.min(data.len());
> +
> + let mut z_seen = false;
> + let mut variant = QidVariant::Unknown;
> +
> + let position = data.iter().take(max).position(|b| match variant {
> + QidVariant::Unknown => {
> + if b.is_ascii_hexdigit() {
pre-existing - and also in your patch 1/2 (pmg-smtp-filter also uses
%X as format-placeholder) - but the hexdigits only use upper-case A-F -
could make sense to restrict this here too (and add is_uppercase_hexdigit
above)?
> + if is_vowel(*b) {
> + // since vowels are forbidden in long qids, it must be a short one
> + variant = QidVariant::Short;
> + }
> + return false;
> + }
> +
> + if b.is_ascii_alphanumeric() {
> + if is_vowel(*b) {
> + // it's a non hexadecimal vowel, so it can't be a short or a long qid
> + variant = QidVariant::Invalid;
> + return true;
> + }
> +
> + if *b == b'z' {
> + z_seen = true;
> + }
this would only happen if 'z' is the first character in a queue-id? - that
would be invalid as well?
> +
> + variant = QidVariant::Long;
> + return false;
> + }
> +
> + true
> + }
> + QidVariant::Short => !b.is_ascii_hexdigit(),
> + QidVariant::Long => {
> + if !b.is_ascii_alphanumeric() {
> + return true;
> + }
> +
> + if is_vowel(*b) {
> + // vowel detected in a long qid, must be invalid
> + variant = QidVariant::Invalid;
> + return true;
> + }
> +
> + if *b == b'z' {
> + z_seen = true;
> + }
> +
> + false
> + }
> + QidVariant::Invalid => true,
> + });
> +
> + let pos = match (variant, z_seen, position) {
> + // these must be short variants that don't have any vowels in them
> + (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
> + (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
> +
> + // short variants
> + (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
> + (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
> +
> + (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max),
> + (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n),
> +
> + // short variant can't have z -> invalid
> + (QidVariant::Short, true, _) => None,
> +
> + // long variant without z is invalid
> + (QidVariant::Long, false, _) => None,
> +
> + // this can't happen since we can have only a z_seen in Long or when changing to Long
> + (QidVariant::Unknown, true, _) => {
> + eprintln!("internal error, unknown QID variant with 'z'");
> + None
> + }
> +
> + // everything else is also invalid
> + _ => None,
> + };
> +
> + pos.map(|pos| data.split_at(pos))
> }
>
>..snip..
the rest looks ok to me - and from a quick check the test-case explains
the issue quite well.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing
2026-06-16 16:26 ` Stoiko Ivanov
@ 2026-06-17 6:44 ` Dominik Csapak
0 siblings, 0 replies; 4+ messages in thread
From: Dominik Csapak @ 2026-06-17 6:44 UTC (permalink / raw)
To: Stoiko Ivanov; +Cc: pmg-devel
On 6/16/26 6:25 PM, Stoiko Ivanov wrote:
> Thanks for tackling this and the analysis and benchmarks!
>
> fwiw the code reads nicer to me with the added parsing!- a few
> questions/suggestions inline:
>
> On Tue, 16 Jun 2026 17:24:07 +0200
> Dominik Csapak <d.csapak@proxmox.com> wrote:
>
>> since adding support for postfix's long QIDs, the criteria was that the
>> QIDs had to be alphanumeric (since no built-in base52 check exists).
> this could be added - but to use it we'd need to split short-qid and
> long-qid parsing as you did here anyways!
>> ..snip..
>> While at it, we can remove the max length parameter from the
>> parse_qid{_prefix} functions, since they were all called with the same
>> length limit anyway.
> the dropping of the parameter could be a separate patch imho (but fine to
> apply as is if it's cumbersome to split)
sure, if i'll send a new version i split that out
>
>
>>
>> /// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes`
>> /// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is
>> /// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the
>> @@ -2130,39 +2151,114 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
>> /// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first.
>> const POSTFIX_QID_MAX_LEN: usize = 25;
>>
>> -/// Parse a queue ID and return a tuple of (qid, remaining_text) or None.
> <> +/// A short postfix QID can only have between 6 and 21 characters, 5 from the timestamp in usec and
> nit - that's the usec portion of the timestamp (not the whole), based on
> the comment in:
> https://github.com/vdukhovni/postfix/blob/5f26cb870afd39245203703984f54643a001cf2e/postfix/src/global/mail_queue.h#L153
sure, that's what i meant, just not very well expressed^^
>> +/// between 1 and 16 from the inode nr printed as '%lX'
>> +const SHORT_QID_RANGE: RangeInclusive<usize> = 6..=21;
>> +
>> +/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in
>> +/// microseconds -> 4 chars + 'z' + at least one char for the inode
>> +const LONG_QID_MIN: usize = 12;
> not sure - but would it be more consistent/readable if we use a range here
> as well (despite the upper bound already being enforced by
> POSTFIX_QID_MAX_LEN) - at least I wondered for a second why we use 2
> bounds for the short id's, and only a lower for the long ones below?
since the maximum of the long qid is the overall limit, it makes no
sense to check it later again? (the position can't be greater than
that?)
>
>> +
>> +/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None.
>> ///
>> /// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are
>> /// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52
>> -/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at
>> -/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows
>> -/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`.
>> -fn parse_qid(data: &[u8], max: usize) -> Option<(&[u8], &[u8])> {
>> +/// alphabet. The scan tries to determine if it's a short or long QID and cancels
>> +/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a
>> +/// long QID). It assumes that only non alphanumeric characters are the delimiter,
>> +/// so e.g. a long QID followed by a vowel would be detected as invalid.
>> +fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
>> // to simplify limit max to data.len()
>> - let max = max.min(data.len());
>> - // take at most max, find the first non-alphanumeric byte
>> - match data
>> - .iter()
>> - .take(max)
>> - .position(|b| !b.is_ascii_alphanumeric())
>> - {
>> - // if there were less than 5 return nothing
>> - // the QID always has at least 5 characters for the microseconds (see
>> - // http://www.postfix.org/postconf.5.html#enable_long_queue_ids)
>> - Some(n) if n < 5 => None,
>> - // otherwise split at the first non-alphanumeric byte
>> - Some(n) => Some(data.split_at(n)),
>> - // or return 'max' length QID if no non-alphanumeric byte is found
>> - None => Some(data.split_at(max)),
>> - }
>> + let max = POSTFIX_QID_MAX_LEN.min(data.len());
>> +
>> + let mut z_seen = false;
>> + let mut variant = QidVariant::Unknown;
>> +
>> + let position = data.iter().take(max).position(|b| match variant {
>> + QidVariant::Unknown => {
>> + if b.is_ascii_hexdigit() {
> pre-existing - and also in your patch 1/2 (pmg-smtp-filter also uses
> %X as format-placeholder) - but the hexdigits only use upper-case A-F -
> could make sense to restrict this here too (and add is_uppercase_hexdigit
> above)?
there is no 'is_uppercase_hexdigit' in the std lib, so we'd have to add
our own... but yeah we could do it. i don't think it gains us much though
>
>> + if is_vowel(*b) {
>> + // since vowels are forbidden in long qids, it must be a short one
>> + variant = QidVariant::Short;
>> + }
>> + return false;
>> + }
>> +
>> + if b.is_ascii_alphanumeric() {
>> + if is_vowel(*b) {
>> + // it's a non hexadecimal vowel, so it can't be a short or a long qid
>> + variant = QidVariant::Invalid;
>> + return true;
>> + }
>> +
>> + if *b == b'z' {
>> + z_seen = true;
>> + }
> this would only happen if 'z' is the first character in a queue-id? - that
> would be invalid as well?
>
no, because as long as it's not clear if it's a short or long qid we
stay in the unknown state, for example:
1111111111z222
would be a valid long qid, where the first non hexdigit char is the 'z'
>> +
>> + variant = QidVariant::Long;
>> + return false;
>> + }
>> +
>> + true
>> + }
>> + QidVariant::Short => !b.is_ascii_hexdigit(),
>> + QidVariant::Long => {
>> + if !b.is_ascii_alphanumeric() {
>> + return true;
>> + }
>> +
>> + if is_vowel(*b) {
>> + // vowel detected in a long qid, must be invalid
>> + variant = QidVariant::Invalid;
>> + return true;
>> + }
>> +
>> + if *b == b'z' {
>> + z_seen = true;
>> + }
>> +
>> + false
>> + }
>> + QidVariant::Invalid => true,
>> + });
>> +
>> + let pos = match (variant, z_seen, position) {
>> + // these must be short variants that don't have any vowels in them
>> + (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
>> + (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
>> +
>> + // short variants
>> + (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
>> + (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
>> +
>> + (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max),
>> + (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n),
>> +
>> + // short variant can't have z -> invalid
>> + (QidVariant::Short, true, _) => None,
>> +
>> + // long variant without z is invalid
>> + (QidVariant::Long, false, _) => None,
>> +
>> + // this can't happen since we can have only a z_seen in Long or when changing to Long
>> + (QidVariant::Unknown, true, _) => {
>> + eprintln!("internal error, unknown QID variant with 'z'");
>> + None
>> + }
>> +
>> + // everything else is also invalid
>> + _ => None,
>> + };
>> +
>> + pos.map(|pos| data.split_at(pos))
>> }
>>
>> ..snip..
>
> the rest looks ok to me - and from a quick check the test-case explains
> the issue quite well.
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-17 6:45 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 15:24 [PATCH log-tracker 1/2] split out pmg-smtp-filter qid parsing Dominik Csapak
2026-06-16 15:24 ` [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing Dominik Csapak
2026-06-16 16:26 ` Stoiko Ivanov
2026-06-17 6:44 ` Dominik Csapak
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.