From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pmg-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 248551FF153
	for <inbox@lore.proxmox.com>; Wed, 17 Jun 2026 08:45:40 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 7DFA016D2D;
	Wed, 17 Jun 2026 08:45:39 +0200 (CEST)
Message-ID: <b7ccd925-5779-4d5e-8e1c-a073d37cbac7@proxmox.com>
Date: Wed, 17 Jun 2026 08:44:51 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird Beta
Subject: Re: [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing
To: Stoiko Ivanov <s.ivanov@proxmox.com>
References: <20260616152451.3684124-1-d.csapak@proxmox.com>
 <20260616152451.3684124-2-d.csapak@proxmox.com>
 <20260616182650.61fcee21@rosa.proxmox.com>
Content-Language: en-US
From: Dominik Csapak <d.csapak@proxmox.com>
In-Reply-To: <20260616182650.61fcee21@rosa.proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1781678645244
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.049 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
	URIBL_BLOCKED           0.001 ADMINISTRATOR NOTICE: The query to URIBL was
 blocked.  See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
 for more information. [postfix.org]
Message-ID-Hash: XY4JXX42TKV32VFNVIVCQSKYGLESAUIP
X-Message-ID-Hash: XY4JXX42TKV32VFNVIVCQSKYGLESAUIP
X-MailFrom: d.csapak@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
CC: pmg-devel@lists.proxmox.com
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox Mail Gateway development discussion
 <pmg-devel.lists.proxmox.com>
List-Help: <mailto:pmg-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pmg-devel-owner@lists.proxmox.com>
List-Post: <mailto:pmg-devel@lists.proxmox.com>
List-Subscribe: <mailto:pmg-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pmg-devel-leave@lists.proxmox.com>



On 6/16/26 6:25 PM, Stoiko Ivanov wrote:
> Thanks for tackling this and the analysis and benchmarks!
> 
> fwiw the code reads nicer to me with the added parsing!- a few
> questions/suggestions inline:
> 
> On Tue, 16 Jun 2026 17:24:07 +0200
> Dominik Csapak <d.csapak@proxmox.com> wrote:
> 
>> since adding support for postfix's long QIDs, the criteria was that the
>> QIDs had to be alphanumeric (since no built-in base52 check exists).
> this could be added - but to use it we'd need to split short-qid and
> long-qid parsing as you did here anyways!
>> ..snip..
>> While at it, we can remove the max length parameter from the
>> parse_qid{_prefix} functions, since they were all called with the same
>> length limit anyway.
> the dropping of the parameter could be a separate patch imho (but fine to
> apply as is if it's cumbersome to split)


sure, if i'll send a new version i split that out

> 
> 
>>
>>   /// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes`
>>   /// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is
>>   /// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the
>> @@ -2130,39 +2151,114 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
>>   /// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first.
>>   const POSTFIX_QID_MAX_LEN: usize = 25;
>>   
>> -/// Parse a queue ID and return a tuple of (qid, remaining_text) or None.
> <> +/// A short postfix QID can only have between 6 and 21 characters, 5 from the timestamp in usec and
> nit - that's the usec portion of the timestamp (not the whole), based on
> the comment in:
> https://github.com/vdukhovni/postfix/blob/5f26cb870afd39245203703984f54643a001cf2e/postfix/src/global/mail_queue.h#L153

sure, that's what i meant, just not very well expressed^^

>> +/// between 1 and 16 from the inode nr printed as '%lX'
>> +const SHORT_QID_RANGE: RangeInclusive<usize> = 6..=21;
>> +
>> +/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in
>> +/// microseconds -> 4 chars + 'z' + at least one char for the inode
>> +const LONG_QID_MIN: usize = 12;
> not sure - but would it be more consistent/readable if we use a range here
> as well (despite the upper bound already being enforced by
> POSTFIX_QID_MAX_LEN) - at least I wondered for a second why we use 2
> bounds for the short id's, and only a lower for the long ones below?

since the maximum of the long qid is the overall limit, it makes no
sense to check it later again? (the position can't be greater than
that?)

> 
>> +
>> +/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None.
>>   ///
>>   /// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are
>>   /// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52
>> -/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at
>> -/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows
>> -/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`.
>> -fn parse_qid(data: &[u8], max: usize) -> Option<(&[u8], &[u8])> {
>> +/// alphabet. The scan tries to determine if it's a short or long QID and cancels
>> +/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a
>> +/// long QID). It assumes that only non alphanumeric characters are the delimiter,
>> +/// so e.g. a long QID followed by a vowel would be detected as invalid.
>> +fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
>>       // to simplify limit max to data.len()
>> -    let max = max.min(data.len());
>> -    // take at most max, find the first non-alphanumeric byte
>> -    match data
>> -        .iter()
>> -        .take(max)
>> -        .position(|b| !b.is_ascii_alphanumeric())
>> -    {
>> -        // if there were less than 5 return nothing
>> -        // the QID always has at least 5 characters for the microseconds (see
>> -        // http://www.postfix.org/postconf.5.html#enable_long_queue_ids)
>> -        Some(n) if n < 5 => None,
>> -        // otherwise split at the first non-alphanumeric byte
>> -        Some(n) => Some(data.split_at(n)),
>> -        // or return 'max' length QID if no non-alphanumeric byte is found
>> -        None => Some(data.split_at(max)),
>> -    }
>> +    let max = POSTFIX_QID_MAX_LEN.min(data.len());
>> +
>> +    let mut z_seen = false;
>> +    let mut variant = QidVariant::Unknown;
>> +
>> +    let position = data.iter().take(max).position(|b| match variant {
>> +        QidVariant::Unknown => {
>> +            if b.is_ascii_hexdigit() {
> pre-existing - and also in your patch 1/2 (pmg-smtp-filter also uses
> %X as format-placeholder) - but the hexdigits only use upper-case A-F -
> could make sense to restrict this here too (and add is_uppercase_hexdigit
> above)?

there is no 'is_uppercase_hexdigit' in the std lib, so we'd have to add 
our own... but yeah we could do it. i don't think it gains us much though

> 
>> +                if is_vowel(*b) {
>> +                    // since vowels are forbidden in long qids, it must be a short one
>> +                    variant = QidVariant::Short;
>> +                }
>> +                return false;
>> +            }
>> +
>> +            if b.is_ascii_alphanumeric() {
>> +                if is_vowel(*b) {
>> +                    // it's a non hexadecimal vowel, so it can't be a short or a long qid
>> +                    variant = QidVariant::Invalid;
>> +                    return true;
>> +                }
>> +
>> +                if *b == b'z' {
>> +                    z_seen = true;
>> +                }
> this would only happen if 'z' is the first character in a queue-id? - that
> would be invalid as well?
> 

no, because as long as it's not clear if it's a short or long qid we
stay in the unknown state, for example:

1111111111z222

would be a valid long qid, where the first non hexdigit char is the 'z'

>> +
>> +                variant = QidVariant::Long;
>> +                return false;
>> +            }
>> +
>> +            true
>> +        }
>> +        QidVariant::Short => !b.is_ascii_hexdigit(),
>> +        QidVariant::Long => {
>> +            if !b.is_ascii_alphanumeric() {
>> +                return true;
>> +            }
>> +
>> +            if is_vowel(*b) {
>> +                // vowel detected in a long qid, must be invalid
>> +                variant = QidVariant::Invalid;
>> +                return true;
>> +            }
>> +
>> +            if *b == b'z' {
>> +                z_seen = true;
>> +            }
>> +
>> +            false
>> +        }
>> +        QidVariant::Invalid => true,
>> +    });
>> +
>> +    let pos = match (variant, z_seen, position) {
>> +        // these must be short variants that don't have any vowels in them
>> +        (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
>> +        (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
>> +
>> +        // short variants
>> +        (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
>> +        (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
>> +
>> +        (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max),
>> +        (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n),
>> +
>> +        // short variant can't have z -> invalid
>> +        (QidVariant::Short, true, _) => None,
>> +
>> +        // long variant without z is invalid
>> +        (QidVariant::Long, false, _) => None,
>> +
>> +        // this can't happen since we can have only a z_seen in Long or when changing to Long
>> +        (QidVariant::Unknown, true, _) => {
>> +            eprintln!("internal error, unknown QID variant with 'z'");
>> +            None
>> +        }
>> +
>> +        // everything else is also invalid
>> +        _ => None,
>> +    };
>> +
>> +    pos.map(|pos| data.split_at(pos))
>>   }
>>   
>> ..snip..
> 
> the rest looks ok to me - and from a quick check the test-case explains
> the issue quite well.
>