Re: [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing

public inbox for pmg-devel@lists.proxmox.com
 help / color / mirror / Atom feed

From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>
Cc: pmg-devel@lists.proxmox.com
Subject: Re: [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing
Date: Tue, 16 Jun 2026 18:26:50 +0200	[thread overview]
Message-ID: <20260616182650.61fcee21@rosa.proxmox.com> (raw)
In-Reply-To: <20260616152451.3684124-2-d.csapak@proxmox.com>

Thanks for tackling this and the analysis and benchmarks!

fwiw the code reads nicer to me with the added parsing!- a few
questions/suggestions inline:

On Tue, 16 Jun 2026 17:24:07 +0200
Dominik Csapak <d.csapak@proxmox.com> wrote:

> since adding support for postfix's long QIDs, the criteria was that the
> QIDs had to be alphanumeric (since no built-in base52 check exists).
this could be added - but to use it we'd need to split short-qid and
long-qid parsing as you did here anyways!
>..snip..
> While at it, we can remove the max length parameter from the
> parse_qid{_prefix} functions, since they were all called with the same
> length limit anyway.
the dropping of the parameter could be a separate patch imho (but fine to
apply as is if it's cumbersome to split)


> 
>  /// Maximum length of a postfix queue ID. With `enable_long_queue_ids = yes`
>  /// (see http://www.postfix.org/postconf.5.html#enable_long_queue_ids) the ID is
>  /// the Unix-epoch seconds (6 base-52 chars, a 7th from year 2596+), 4 for the
> @@ -2130,39 +2151,114 @@ fn open_logfile(path: &Path) -> std::io::Result<Box<dyn BufRead>> {
>  /// inode), so up to 24 chars; 25 has headroom and the delimiter ends it first.
>  const POSTFIX_QID_MAX_LEN: usize = 25;
>  
> -/// Parse a queue ID and return a tuple of (qid, remaining_text) or None.
<> +/// A short postfix QID can only have between 6 and 21 characters, 5 from the timestamp in usec and
nit - that's the usec portion of the timestamp (not the whole), based on
the comment in:
https://github.com/vdukhovni/postfix/blob/5f26cb870afd39245203703984f54643a001cf2e/postfix/src/global/mail_queue.h#L153
> +/// between 1 and 16 from the inode nr printed as '%lX'
> +const SHORT_QID_RANGE: RangeInclusive<usize> = 6..=21;
> +
> +/// A long postfix QID must be at least 12 characters: time in seconds -> 6+ chars + time in
> +/// microseconds -> 4 chars + 'z' + at least one char for the inode
> +const LONG_QID_MIN: usize = 12;
not sure - but would it be more consistent/readable if we use a range here
as well (despite the upper bound already being enforced by
POSTFIX_QID_MAX_LEN) - at least I wondered for a second why we use 2
bounds for the short id's, and only a lower for the long ones below? 

> +
> +/// Parse a queue ID from postfix and return a tuple of (qid, remaining_text) or None.
>  ///
>  /// Queue IDs are alphanumeric (`[0-9A-Za-z]`): legacy postfix queue IDs are
>  /// hexadecimal, postfix long queue IDs (`enable_long_queue_ids`) use a base-52
> -/// alphabet, and pmg-smtp-filter IDs are likewise alphanumeric. The scan stops at
> -/// the first non-alphanumeric byte (the `:` or `)` delimiter that normally follows
> -/// the queue ID); a run longer than `max` with no delimiter is truncated to `max`.
> -fn parse_qid(data: &[u8], max: usize) -> Option<(&[u8], &[u8])> {
> +/// alphabet. The scan tries to determine if it's a short or long QID and cancels
> +/// the scan if it detects either the end or an invalid state. (e.g. a vowel in a
> +/// long QID). It assumes that only non alphanumeric characters are the delimiter,
> +/// so e.g. a long QID followed by a vowel would be detected as invalid.
> +fn parse_qid(data: &[u8]) -> Option<(&[u8], &[u8])> {
>      // to simplify limit max to data.len()
> -    let max = max.min(data.len());
> -    // take at most max, find the first non-alphanumeric byte
> -    match data
> -        .iter()
> -        .take(max)
> -        .position(|b| !b.is_ascii_alphanumeric())
> -    {
> -        // if there were less than 5 return nothing
> -        // the QID always has at least 5 characters for the microseconds (see
> -        // http://www.postfix.org/postconf.5.html#enable_long_queue_ids)
> -        Some(n) if n < 5 => None,
> -        // otherwise split at the first non-alphanumeric byte
> -        Some(n) => Some(data.split_at(n)),
> -        // or return 'max' length QID if no non-alphanumeric byte is found
> -        None => Some(data.split_at(max)),
> -    }
> +    let max = POSTFIX_QID_MAX_LEN.min(data.len());
> +
> +    let mut z_seen = false;
> +    let mut variant = QidVariant::Unknown;
> +
> +    let position = data.iter().take(max).position(|b| match variant {
> +        QidVariant::Unknown => {
> +            if b.is_ascii_hexdigit() {
pre-existing - and also in your patch 1/2 (pmg-smtp-filter also uses
%X as format-placeholder) - but the hexdigits only use upper-case A-F -
could make sense to restrict this here too (and add is_uppercase_hexdigit
above)?

> +                if is_vowel(*b) {
> +                    // since vowels are forbidden in long qids, it must be a short one
> +                    variant = QidVariant::Short;
> +                }
> +                return false;
> +            }
> +
> +            if b.is_ascii_alphanumeric() {
> +                if is_vowel(*b) {
> +                    // it's a non hexadecimal vowel, so it can't be a short or a long qid
> +                    variant = QidVariant::Invalid;
> +                    return true;
> +                }
> +
> +                if *b == b'z' {
> +                    z_seen = true;
> +                }
this would only happen if 'z' is the first character in a queue-id? - that
would be invalid as well?

> +
> +                variant = QidVariant::Long;
> +                return false;
> +            }
> +
> +            true
> +        }
> +        QidVariant::Short => !b.is_ascii_hexdigit(),
> +        QidVariant::Long => {
> +            if !b.is_ascii_alphanumeric() {
> +                return true;
> +            }
> +
> +            if is_vowel(*b) {
> +                // vowel detected in a long qid, must be invalid
> +                variant = QidVariant::Invalid;
> +                return true;
> +            }
> +
> +            if *b == b'z' {
> +                z_seen = true;
> +            }
> +
> +            false
> +        }
> +        QidVariant::Invalid => true,
> +    });
> +
> +    let pos = match (variant, z_seen, position) {
> +        // these must be short variants that don't have any vowels in them
> +        (QidVariant::Unknown, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
> +        (QidVariant::Unknown, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
> +
> +        // short variants
> +        (QidVariant::Short, false, None) if SHORT_QID_RANGE.contains(&max) => Some(max),
> +        (QidVariant::Short, false, Some(n)) if SHORT_QID_RANGE.contains(&n) => Some(n),
> +
> +        (QidVariant::Long, true, None) if max >= LONG_QID_MIN => Some(max),
> +        (QidVariant::Long, true, Some(n)) if n >= LONG_QID_MIN => Some(n),
> +
> +        // short variant can't have z -> invalid
> +        (QidVariant::Short, true, _) => None,
> +
> +        // long variant without z is invalid
> +        (QidVariant::Long, false, _) => None,
> +
> +        // this can't happen since we can have only a z_seen in Long or when changing to Long
> +        (QidVariant::Unknown, true, _) => {
> +            eprintln!("internal error, unknown QID variant with 'z'");
> +            None
> +        }
> +
> +        // everything else is also invalid
> +        _ => None,
> +    };
> +
> +    pos.map(|pos| data.split_at(pos))
>  }
>  
>..snip..

the rest looks ok to me - and from a quick check the test-case explains
the issue quite well.

next prev parent reply	other threads:[~2026-06-16 16:27 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16 15:24 [PATCH log-tracker 1/2] split out pmg-smtp-filter qid parsing Dominik Csapak
2026-06-16 15:24 ` [PATCH log-tracker 2/2] fix #7702: improve postfix QID parsing Dominik Csapak
2026-06-16 16:26   ` Stoiko Ivanov [this message]
2026-06-17  6:44     ` Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260616182650.61fcee21@rosa.proxmox.com \
    --to=s.ivanov@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal