From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pmg-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 9CC451FF141
	for <inbox@lore.proxmox.com>; Tue, 16 Jun 2026 18:27:28 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 4D629EC36;
	Tue, 16 Jun 2026 18:27:28 +0200 (CEST)
Date: Tue, 16 Jun 2026 18:26:50 +0200
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>
Subject: Re: [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing
Message-ID: <20260616182650.61fcee21@rosa.proxmox.com>
In-Reply-To: <20260616152451.3684124-2-d.csapak@proxmox.com>
References: <20260616152451.3684124-1-d.csapak@proxmox.com>
	<20260616152451.3684124-2-d.csapak@proxmox.com>
X-Mailer: Claws Mail 4.3.1 (GTK 3.24.49; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1781627155812
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.083 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: 2MZDUYM3CZXSYIFWVHYIIDRHYBKE4C6U
X-Message-ID-Hash: 2MZDUYM3CZXSYIFWVHYIIDRHYBKE4C6U
X-MailFrom: s.ivanov@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
CC: pmg-devel@lists.proxmox.com
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox Mail Gateway development discussion
 <pmg-devel.lists.proxmox.com>
List-Help: <mailto:pmg-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pmg-devel-owner@lists.proxmox.com>
List-Post: <mailto:pmg-devel@lists.proxmox.com>
List-Subscribe: <mailto:pmg-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pmg-devel-leave@lists.proxmox.com>

Thanks for tackling this and the analysis and benchmarks!

fwiw the code reads nicer to me with the added parsing!- a few
questions/suggestions inline:

On Tue, 16 Jun 2026 17:24:07 +0200
Dominik Csapak <d.csapak@proxmox.com> wrote:

> since adding support for postfix's long QIDs, the criteria was that the
> QIDs had to be alphanumeric (since no built-in base52 check exists).
this could be added - but to use it we'd need to split short-qid and
long-qid parsing as you did here anyways!
>..snip..
> While at it, we can remove the max length parameter from the
> parse_qid{_prefix} functions, since they were all called with the same
> length limit anyway.
the dropping of the parameter could be a separate patch imho (but fine to
apply as is if it's cumbersome to split)


> 
>  /// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes`
>  /// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is
>  /// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the
> @@ -2130,39 +2151,114 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
>  /// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first.
>  const POSTFIX_QID_MAX_LEN: usize = 25;
>  
> -/// Parse a queue ID and return a tuple of (qid, remaining_text) or None.
<> +/// A short postfix QID can only have between 6 and 21 characters, 5 from the timestamp in usec and
nit - that's the usec portion of the timestamp (not the whole), based on
the comment in:
https://github.com/vdukhovni/postfix/blob/5f26cb870afd39245203703984f54643a001cf2e/postfix/src/global/mail_queue.h#L153
> +/// between 1 and 16 from the inode nr printed as '%lX'
> +const SHORT_QID_RANGE: RangeInclusive<usize> = 6..=21;
> +
> +/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in
> +/// microseconds -> 4 chars + 'z' + at least one char for the inode
> +const LONG_QID_MIN: usize = 12;
not sure - but would it be more consistent/readable if we use a range here
as well (despite the upper bound already being enforced by
POSTFIX_QID_MAX_LEN) - at least I wondered for a second why we use 2
bounds for the short id's, and only a lower for the long ones below? 

> +
> +/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None.
>  ///
>  /// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are
>  /// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52
> -/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at
> -/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows
> -/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`.
> -fn parse_qid(data: &[u8], max: usize) -> Option<(&[u8], &[u8])> {
> +/// alphabet. The scan tries to determine if it's a short or long QID and cancels
> +/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a
> +/// long QID). It assumes that only non alphanumeric characters are the delimiter,
> +/// so e.g. a long QID followed by a vowel would be detected as invalid.
> +fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
>      // to simplify limit max to data.len()
> -    let max = max.min(data.len());
> -    // take at most max, find the first non-alphanumeric byte
> -    match data
> -        .iter()
> -        .take(max)
> -        .position(|b| !b.is_ascii_alphanumeric())
> -    {
> -        // if there were less than 5 return nothing
> -        // the QID always has at least 5 characters for the microseconds (see
> -        // http://www.postfix.org/postconf.5.html#enable_long_queue_ids)
> -        Some(n) if n < 5 => None,
> -        // otherwise split at the first non-alphanumeric byte
> -        Some(n) => Some(data.split_at(n)),
> -        // or return 'max' length QID if no non-alphanumeric byte is found
> -        None => Some(data.split_at(max)),
> -    }
> +    let max = POSTFIX_QID_MAX_LEN.min(data.len());
> +
> +    let mut z_seen = false;
> +    let mut variant = QidVariant::Unknown;
> +
> +    let position = data.iter().take(max).position(|b| match variant {
> +        QidVariant::Unknown => {
> +            if b.is_ascii_hexdigit() {
pre-existing - and also in your patch 1/2 (pmg-smtp-filter also uses
%X as format-placeholder) - but the hexdigits only use upper-case A-F -
could make sense to restrict this here too (and add is_uppercase_hexdigit
above)?

> +                if is_vowel(*b) {
> +                    // since vowels are forbidden in long qids, it must be a short one
> +                    variant = QidVariant::Short;
> +                }
> +                return false;
> +            }
> +
> +            if b.is_ascii_alphanumeric() {
> +                if is_vowel(*b) {
> +                    // it's a non hexadecimal vowel, so it can't be a short or a long qid
> +                    variant = QidVariant::Invalid;
> +                    return true;
> +                }
> +
> +                if *b == b'z' {
> +                    z_seen = true;
> +                }
this would only happen if 'z' is the first character in a queue-id? - that
would be invalid as well?

> +
> +                variant = QidVariant::Long;
> +                return false;
> +            }
> +
> +            true
> +        }
> +        QidVariant::Short => !b.is_ascii_hexdigit(),
> +        QidVariant::Long => {
> +            if !b.is_ascii_alphanumeric() {
> +                return true;
> +            }
> +
> +            if is_vowel(*b) {
> +                // vowel detected in a long qid, must be invalid
> +                variant = QidVariant::Invalid;
> +                return true;
> +            }
> +
> +            if *b == b'z' {
> +                z_seen = true;
> +            }
> +
> +            false
> +        }
> +        QidVariant::Invalid => true,
> +    });
> +
> +    let pos = match (variant, z_seen, position) {
> +        // these must be short variants that don't have any vowels in them
> +        (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
> +        (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
> +
> +        // short variants
> +        (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
> +        (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
> +
> +        (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max),
> +        (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n),
> +
> +        // short variant can't have z -> invalid
> +        (QidVariant::Short, true, _) => None,
> +
> +        // long variant without z is invalid
> +        (QidVariant::Long, false, _) => None,
> +
> +        // this can't happen since we can have only a z_seen in Long or when changing to Long
> +        (QidVariant::Unknown, true, _) => {
> +            eprintln!("internal error, unknown QID variant with 'z'");
> +            None
> +        }
> +
> +        // everything else is also invalid
> +        _ => None,
> +    };
> +
> +    pos.map(|pos| data.split_at(pos))
>  }
>  
>..snip..

the rest looks ok to me - and from a quick check the test-case explains
the issue quite well.