From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <s.ivanov@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 689E195A6
 for <pmg-devel@lists.proxmox.com>; Thu, 17 Nov 2022 16:06:50 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 2C9B82FC7F
 for <pmg-devel@lists.proxmox.com>; Thu, 17 Nov 2022 16:06:50 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pmg-devel@lists.proxmox.com>; Thu, 17 Nov 2022 16:06:46 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id DD1C444D86
 for <pmg-devel@lists.proxmox.com>; Thu, 17 Nov 2022 16:06:45 +0100 (CET)
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: pmg-devel@lists.proxmox.com
Date: Thu, 17 Nov 2022 16:06:03 +0100
Message-Id: <20221117150611.253644-1-s.ivanov@proxmox.com>
X-Mailer: git-send-email 2.30.2
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results: =?UTF-8?Q?0=0A=09?=AWL 0.169 Adjusted
 score from AWL reputation of From: =?UTF-8?Q?address=0A=09?=BAYES_00 -1.9
 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict
 =?UTF-8?Q?Alignment=0A=09?=SPF_HELO_NONE 0.001 SPF: HELO does not publish an
 SPF =?UTF-8?Q?Record=0A=09?=SPF_PASS -0.001 SPF: sender matches SPF record
Subject: [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for
 non-ascii tests and mails
X-BeenThere: pmg-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Mail Gateway development discussion
 <pmg-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pmg-devel>, 
 <mailto:pmg-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pmg-devel/>
List-Post: <mailto:pmg-devel@lists.proxmox.com>
List-Help: <mailto:pmg-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pmg-devel>, 
 <mailto:pmg-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 17 Nov 2022 15:06:50 -0000

v1->v2:
* dropped already applied patches
* added a patch for one further glitch in ModField/Notify actions (when
  parsing/replacing non-ascii characters) - patch 1/5+2/5
* added support for utf-8 data in the mailflow additionally for:
** quarantine API handlng
** user BL/WL (the GUI still needs adaptation to parse e-mail-addresses
   more liberally - but else it seems to work)
** pmgqm (spamreports)
** statistics

still missing support for:
* LDAP
* Who Objects

huge thanks to Dominik for taking the time to review and test the v1!

original cover-letter for v1:
this patchseries partially fixes #2465 and #2541, two quite often reported
issues, which are causing quite a disappointing experience for users
in non-ascii only environments

the main assumption of the patches are:
* envelope addresses are either ascii or utf-8 (latter only with smtputf8)
* thus we can unconditionally de-/encode envelope addresses for database
  results/lookups
* the matching in the rule-objects will see the relevant parts of the mail
  as properly encoded perl-strings (with multi-byte characters - e.g. the
  euro sign as \x{20ac} instead of \x{e2}\x{82}\x{ac})
(I did a bit of testing to verify them, by e.g. sending an ISO-8859-1
encoded mail and matching for an umlaut in the subject)

While going through the RuleDB classes I remembered, that we have a few
pieces of legacy objects (Attach, ReportSpam, Counter actions) there, and
went ahead with deprecating them (initially I simply deleted them, but
decided to be more cautious and just log the deprecation until 8.0, when
we can drop them explicitly). They cannot be instantiated currently (short
of a direct insert into the database) - but I don't know if they were ever
used in pre 5.0 times in their current form. - patch 2/5.

Out of scope of the series for now:
* utf-8 support in the LDAP subsystem (deployments with a configured LDAP
  profile still won't be able to process smtputf8 mails) - mostly until I
  get around to create test-environment with the appropriate schema for
  having non-ascii mail-addresses
* Domain/Email objects - did not find the time to consider how to store
  them most sensibly (puny-code, utf-8) and if the choice should be
  carried over to all of our 'email' formats (it probably shouldn't)

patches 1/5 and 4/5 address 2 small bugs I ran into while testing

Given that I quite often miss a few fine points or use-cases I'd be very
grateful for some more experimenting/testing!

Stoiko Ivanov (8):
  utils: return perl string from decode_rfc1522
  ruledb: properly substitute prox_vars in headers
  fix #2541 ruledb: encode relevant values as utf-8 in database
  ruledb: encode e-mail addresses for syslog
  partially fix #2465: handle smtputf8 addresses in the rule-system
  quarantine: handle utf8 data
  pmgqm: handle smtputf8 data
  statistics: handle utf8 data.

 src/PMG/API2/Quarantine.pm      | 16 ++++----
 src/PMG/CLI/pmgqm.pm            | 24 ++++++------
 src/PMG/HTMLMail.pm             |  7 ++--
 src/PMG/MailQueue.pm            | 10 +++--
 src/PMG/Quarantine.pm           | 13 ++++---
 src/PMG/RuleDB.pm               | 24 ++++++++----
 src/PMG/RuleDB/Accept.pm        |  2 +-
 src/PMG/RuleDB/BCC.pm           | 23 +++++++++--
 src/PMG/RuleDB/Block.pm         |  2 +-
 src/PMG/RuleDB/Disclaimer.pm    |  2 +-
 src/PMG/RuleDB/Group.pm         |  4 +-
 src/PMG/RuleDB/MatchField.pm    |  8 +++-
 src/PMG/RuleDB/MatchFilename.pm |  5 ++-
 src/PMG/RuleDB/ModField.pm      | 19 +++-------
 src/PMG/RuleDB/Notify.pm        | 24 +++++++++---
 src/PMG/RuleDB/Quarantine.pm    | 19 ++++++++--
 src/PMG/RuleDB/Remove.pm        | 20 +++++++---
 src/PMG/RuleDB/Rule.pm          |  2 +-
 src/PMG/RuleDB/Spam.pm          | 17 +++++----
 src/PMG/RuleDB/WhoRegex.pm      |  5 ++-
 src/PMG/Statistic.pm            | 67 ++++++++++++++++++++++++---------
 src/PMG/Utils.pm                | 39 ++++++++++++++++---
 src/bin/pmg-smtp-filter         |  7 ++--
 23 files changed, 243 insertions(+), 116 deletions(-)

-- 
2.30.2