From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 689E195A6 for ; Thu, 17 Nov 2022 16:06:50 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2C9B82FC7F for ; Thu, 17 Nov 2022 16:06:50 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 17 Nov 2022 16:06:46 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id DD1C444D86 for ; Thu, 17 Nov 2022 16:06:45 +0100 (CET) From: Stoiko Ivanov To: pmg-devel@lists.proxmox.com Date: Thu, 17 Nov 2022 16:06:03 +0100 Message-Id: <20221117150611.253644-1-s.ivanov@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: =?UTF-8?Q?0=0A=09?=AWL 0.169 Adjusted score from AWL reputation of From: =?UTF-8?Q?address=0A=09?=BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict =?UTF-8?Q?Alignment=0A=09?=SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF =?UTF-8?Q?Record=0A=09?=SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for non-ascii tests and mails X-BeenThere: pmg-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Mail Gateway development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2022 15:06:50 -0000 v1->v2: * dropped already applied patches * added a patch for one further glitch in ModField/Notify actions (when parsing/replacing non-ascii characters) - patch 1/5+2/5 * added support for utf-8 data in the mailflow additionally for: ** quarantine API handlng ** user BL/WL (the GUI still needs adaptation to parse e-mail-addresses more liberally - but else it seems to work) ** pmgqm (spamreports) ** statistics still missing support for: * LDAP * Who Objects huge thanks to Dominik for taking the time to review and test the v1! original cover-letter for v1: this patchseries partially fixes #2465 and #2541, two quite often reported issues, which are causing quite a disappointing experience for users in non-ascii only environments the main assumption of the patches are: * envelope addresses are either ascii or utf-8 (latter only with smtputf8) * thus we can unconditionally de-/encode envelope addresses for database results/lookups * the matching in the rule-objects will see the relevant parts of the mail as properly encoded perl-strings (with multi-byte characters - e.g. the euro sign as \x{20ac} instead of \x{e2}\x{82}\x{ac}) (I did a bit of testing to verify them, by e.g. sending an ISO-8859-1 encoded mail and matching for an umlaut in the subject) While going through the RuleDB classes I remembered, that we have a few pieces of legacy objects (Attach, ReportSpam, Counter actions) there, and went ahead with deprecating them (initially I simply deleted them, but decided to be more cautious and just log the deprecation until 8.0, when we can drop them explicitly). They cannot be instantiated currently (short of a direct insert into the database) - but I don't know if they were ever used in pre 5.0 times in their current form. - patch 2/5. Out of scope of the series for now: * utf-8 support in the LDAP subsystem (deployments with a configured LDAP profile still won't be able to process smtputf8 mails) - mostly until I get around to create test-environment with the appropriate schema for having non-ascii mail-addresses * Domain/Email objects - did not find the time to consider how to store them most sensibly (puny-code, utf-8) and if the choice should be carried over to all of our 'email' formats (it probably shouldn't) patches 1/5 and 4/5 address 2 small bugs I ran into while testing Given that I quite often miss a few fine points or use-cases I'd be very grateful for some more experimenting/testing! Stoiko Ivanov (8): utils: return perl string from decode_rfc1522 ruledb: properly substitute prox_vars in headers fix #2541 ruledb: encode relevant values as utf-8 in database ruledb: encode e-mail addresses for syslog partially fix #2465: handle smtputf8 addresses in the rule-system quarantine: handle utf8 data pmgqm: handle smtputf8 data statistics: handle utf8 data. src/PMG/API2/Quarantine.pm | 16 ++++---- src/PMG/CLI/pmgqm.pm | 24 ++++++------ src/PMG/HTMLMail.pm | 7 ++-- src/PMG/MailQueue.pm | 10 +++-- src/PMG/Quarantine.pm | 13 ++++--- src/PMG/RuleDB.pm | 24 ++++++++---- src/PMG/RuleDB/Accept.pm | 2 +- src/PMG/RuleDB/BCC.pm | 23 +++++++++-- src/PMG/RuleDB/Block.pm | 2 +- src/PMG/RuleDB/Disclaimer.pm | 2 +- src/PMG/RuleDB/Group.pm | 4 +- src/PMG/RuleDB/MatchField.pm | 8 +++- src/PMG/RuleDB/MatchFilename.pm | 5 ++- src/PMG/RuleDB/ModField.pm | 19 +++------- src/PMG/RuleDB/Notify.pm | 24 +++++++++--- src/PMG/RuleDB/Quarantine.pm | 19 ++++++++-- src/PMG/RuleDB/Remove.pm | 20 +++++++--- src/PMG/RuleDB/Rule.pm | 2 +- src/PMG/RuleDB/Spam.pm | 17 +++++---- src/PMG/RuleDB/WhoRegex.pm | 5 ++- src/PMG/Statistic.pm | 67 ++++++++++++++++++++++++--------- src/PMG/Utils.pm | 39 ++++++++++++++++--- src/bin/pmg-smtp-filter | 7 ++-- 23 files changed, 243 insertions(+), 116 deletions(-) -- 2.30.2