public inbox for pmg-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0
@ 2023-03-13 21:23 Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 1/7] ruledb: spam: adapt to spamassassin 4.0.0 Stoiko Ivanov
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

SpamAssassin 4.0.0 was released back in December 2022 and seems to work
nicely for the greatest part.

This patchset adapts pmg-api where necessary (patch 1/7), updates the
templates for less diff (2/7), and enables the new features depending
on the spam section of pmg.conf (mostly only enabling modules which do
DNSBL lookups based on the rbl_checks setting)

for testing the ExtractText plugin I used the gtube test string in
a libreoffice document (saving it as .doc, .docx, .pdf, .odt, .rtf)
as it's still quite new functionality I added an explicit config-option
for it (also to provide some visibility)

pmg-api:
Stoiko Ivanov (7):
  ruledb: spam: adapt to spamassassin 4.0.0
  templates: sync spamassassin templates with 4.0.0 upstream
  templates: add template for spamassassin's v342.pre
  templates: add template for spamassassin's v400.pre
  config: add spam option for extract_text
  templates: enable DecodeShortUrls for SpamAssassin 4.0.0
  templates: enable DMARC plugin in v400.pre.in

 debian/control            |  9 +++++-
 src/Makefile              |  2 ++
 src/PMG/Config.pm         | 12 +++++++
 src/PMG/RuleDB/Spam.pm    | 10 +++---
 src/templates/init.pre.in | 26 ++++++++++++---
 src/templates/v310.pre.in | 18 +++++------
 src/templates/v320.pre.in |  5 +++
 src/templates/v342.pre.in | 39 ++++++++++++++++++++++
 src/templates/v400.pre.in | 68 +++++++++++++++++++++++++++++++++++++++
 9 files changed, 170 insertions(+), 19 deletions(-)
 create mode 100644 src/templates/v342.pre.in
 create mode 100644 src/templates/v400.pre.in

pmg-gui:
Stoiko Ivanov (1):
  spamdetector: add extract_text option

 js/SpamDetectorOptions.js | 2 ++
 1 file changed, 2 insertions(+)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 1/7] ruledb: spam: adapt to spamassassin 4.0.0
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 2/7] templates: sync spamassassin templates with 4.0.0 upstream Stoiko Ivanov
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

find_all_addrs_in_line was changed to require an instantiated
Mail::SpamAssassin instance in:
https://github.com/apache/spamassassin/commit/139adfb5901b27fa13dccbf3a66c53ca7613f733
(read-only git mirror of the authoritative SVN)

Noticed while using `mutt` and bouncing mails, which adds Resent
headers.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/PMG/RuleDB/Spam.pm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/PMG/RuleDB/Spam.pm b/src/PMG/RuleDB/Spam.pm
index 14d7bea..a0b8f26 100644
--- a/src/PMG/RuleDB/Spam.pm
+++ b/src/PMG/RuleDB/Spam.pm
@@ -300,13 +300,13 @@ sub __get_addr {
 # because we do not call spamassassin in canes of commtouch match
 # see Mail::Spamassassin:PerMsgStatus for details
 sub __all_from_addrs {
-    my ($head) = @_;
+    my ($head, $spamtest) = @_;
 
     my @addrs;
 
     my $resent = $head->get('Resent-From');
     if (defined($resent) && $resent =~ /\S/) {
-	@addrs = Mail::SpamAssassin->find_all_addrs_in_line($resent);
+	@addrs = $spamtest->find_all_addrs_in_line($resent);
     } else {
 	@addrs = map { tr/././s; $_ } grep { $_ ne '' }
         (__get_addr($head, 'From'),		# std
@@ -330,6 +330,8 @@ sub analyze_spam {
 
     $maxspamsize = 200*1024 if !$maxspamsize;
 
+    my $spamtest = $queue->{sa};
+
     my ($sa_score, $sa_max, $sa_scores, $sa_sumary, $list, $autolearn, $bayes, $loglist);
     $list = '';
     $loglist = '';
@@ -345,7 +347,7 @@ sub analyze_spam {
     }
 
     my $fromhash = { $queue->{from} => 1 }; 
-    foreach my $f (__all_from_addrs($entity->head())) {
+    foreach my $f (__all_from_addrs($entity->head(), $spamtest)) {
 	$fromhash->{$f} = 1;
     }
     $queue->{all_from_addrs} = [ keys %$fromhash ];
@@ -373,8 +375,6 @@ sub analyze_spam {
 
     my ($csec, $usec) = gettimeofday ();
 
-    my $spamtest = $queue->{sa};
-
     # only run SA in testmode or when clamav_heuristic did not confirm spam (score < 5)
     if ($msginfo->{testmode} || ($sa_score < 5)) {
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 2/7] templates: sync spamassassin templates with 4.0.0 upstream
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 1/7] ruledb: spam: adapt to spamassassin 4.0.0 Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 3/7] templates: add template for spamassassin's v342.pre Stoiko Ivanov
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

to minimize the diff and disable vanished modules

no functional change intended

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/templates/init.pre.in | 26 ++++++++++++++++++++++----
 src/templates/v310.pre.in | 18 +++++++++---------
 src/templates/v320.pre.in |  5 +++++
 3 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/src/templates/init.pre.in b/src/templates/init.pre.in
index 04ca4d6..98d17b4 100644
--- a/src/templates/init.pre.in
+++ b/src/templates/init.pre.in
@@ -7,24 +7,42 @@
 # in SpamAssassin 3.0.x releases.  It will not be installed if you
 # already have a file in place called "init.pre".
 #
+# There are now multiple files read to enable plugins in the 
+# /etc/mail/spamassassin directory; previously only one, "init.pre" was 
+# read.  Now both "init.pre", "v310.pre", and any other files ending in
+# ".pre" will be read.  As future releases are made, new plugins will be
+# added to new files, named according to the release they're added in.
 ###########################################################################
 
+# Version compatibility - Welcomelist/Blocklist
+# In SpamAssassin 4.0, rules containing "whitelist" or "blacklist" have been
+# renamed to contain more racially neutral "welcomelist" and "blocklist"
+# terms.  When this compatibility flag is enabled, old rule names from stock
+# rules will not hit anymore alongside the new ones.  For more information,
+# see: https://wiki.apache.org/spamassassin/WelcomelistBlocklist
+#
+enable_compat welcomelist_blocklist
+
 # RelayCountry - add metadata for Bayes learning, marking the countries
 # a message was relayed through
 #
+# Note: This requires the Geo::IP Perl module
+#
 # loadplugin Mail::SpamAssassin::Plugin::RelayCountry
 
 [% IF pmg.spam.rbl_checks %]
+# URIDNSBL - look up URLs found in the message against several DNS
+# blocklists.
+#
 loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
 [% END %]
 
-# Hashcash - perform hashcash verification.
-#
-loadplugin Mail::SpamAssassin::Plugin::Hashcash
 
 [% IF pmg.spam.rbl_checks %]
+# SPF - perform SPF verification.
+#
 loadplugin Mail::SpamAssassin::Plugin::SPF
 [% END %]
 
 # always load dkim to improve accuracy
-loadplugin Mail::SpamAssassin::Plugin::DKIM
\ No newline at end of file
+loadplugin Mail::SpamAssassin::Plugin::DKIM
diff --git a/src/templates/v310.pre.in b/src/templates/v310.pre.in
index d72c347..696142d 100644
--- a/src/templates/v310.pre.in
+++ b/src/templates/v310.pre.in
@@ -9,6 +9,11 @@
 # so you can modify it to enable some disabled-by-default plugins below,
 # if you so wish.
 #
+# There are now multiple files read to enable plugins in the
+# /etc/mail/spamassassin directory; previously only one, "init.pre" was
+# read.  Now both "init.pre", "v310.pre", and any other files ending in
+# ".pre" will be read.  As future releases are made, new plugins will be
+# added to new files, named according to the release they're added in.
 ###########################################################################
 
 [% IF pmg.spam.rbl_checks %]
@@ -40,18 +45,13 @@ loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold
 #
 loadplugin Mail::SpamAssassin::Plugin::TextCat
 
-# WhitelistSubject - Whitelist/Blacklist certain subject regular expressions
+# AccessDB - lookup from-addresses in access database
 #
-loadplugin Mail::SpamAssassin::Plugin::WhiteListSubject
+#loadplugin Mail::SpamAssassin::Plugin::AccessDB
 
-###########################################################################
-# experimental plugins
-
-# DomainKeys - perform DomainKeys verification
-#
-# External modules required for use, see INSTALL for more information.
+# WelcomelistSubject - Welcomelist/Blocklist certain subject regular expressions
 #
-#loadplugin Mail::SpamAssassin::Plugin::DomainKeys
+loadplugin Mail::SpamAssassin::Plugin::WelcomeListSubject
 
 # MIMEHeader - apply regexp rules against MIME headers in the message
 #
diff --git a/src/templates/v320.pre.in b/src/templates/v320.pre.in
index db49b07..846c73a 100644
--- a/src/templates/v320.pre.in
+++ b/src/templates/v320.pre.in
@@ -9,6 +9,11 @@
 # so you can modify it to enable some disabled-by-default plugins below,
 # if you so wish.
 #
+# There are now multiple files read to enable plugins in the
+# /etc/mail/spamassassin directory; previously only one, "init.pre" was
+# read.  Now both "init.pre", "v310.pre", and any other files ending in
+# ".pre" will be read.  As future releases are made, new plugins will be
+# added to new files, named according to the release they're added in.
 ###########################################################################
 
 # Check - Provides main check functionality
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 3/7] templates: add template for spamassassin's v342.pre
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 1/7] ruledb: spam: adapt to spamassassin 4.0.0 Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 2/7] templates: sync spamassassin templates with 4.0.0 upstream Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 4/7] templates: add template for spamassassin's v400.pre Stoiko Ivanov
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

The file is taken from upstream, the only change is that we only
enable the HashBL module if rbl_checks are enabled in pmg.conf

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/Makefile              |  1 +
 src/PMG/Config.pm         |  3 +++
 src/templates/v342.pre.in | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)
 create mode 100644 src/templates/v342.pre.in

diff --git a/src/Makefile b/src/Makefile
index 49c7974..414b1ef 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -38,6 +38,7 @@ TEMPLATES =				\
 	local.cf.in			\
 	v310.pre.in			\
 	v320.pre.in			\
+	v342.pre.in			\
 	razor-agent.conf.in		\
 	freshclam.conf.in		\
 	clamd.conf.in 			\
diff --git a/src/PMG/Config.pm b/src/PMG/Config.pm
index a0b1866..dce1513 100755
--- a/src/PMG/Config.pm
+++ b/src/PMG/Config.pm
@@ -1515,6 +1515,9 @@ sub rewrite_config_spam {
     $changes = 1 if $self->rewrite_config_file(
 	'v320.pre.in', '/etc/mail/spamassassin/v320.pre');
 
+    $changes = 1 if $self->rewrite_config_file(
+	'v342.pre.in', '/etc/mail/spamassassin/v342.pre');
+
     if ($use_razor) {
 	mkdir "/root/.razor";
 
diff --git a/src/templates/v342.pre.in b/src/templates/v342.pre.in
new file mode 100644
index 0000000..10dcaa1
--- /dev/null
+++ b/src/templates/v342.pre.in
@@ -0,0 +1,39 @@
+# This is the right place to customize your installation of SpamAssassin.
+#
+# See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
+# tweaked.
+#
+# This file was installed during the installation of SpamAssassin 3.4.2,
+# and contains plugin loading commands for the new plugins added in that
+# release.  It will not be overwritten during future SpamAssassin installs,
+# so you can modify it to enable some disabled-by-default plugins below,
+# if you so wish.
+#
+# There are now multiple files read to enable plugins in the
+# /etc/mail/spamassassin directory; previously only one, "init.pre" was
+# read.  Now both "init.pre", "v310.pre", and any other files ending in
+# ".pre" will be read.  As future releases are made, new plugins will be
+# added to new files, named according to the release they're added in.
+###########################################################################
+
+# HashBL - Query hashed/unhashed strings, emails, uris etc from DNS lists
+#
+[% IF pmg.spam.rbl_checks %]
+loadplugin Mail::SpamAssassin::Plugin::HashBL
+[% END %]
+
+# ResourceLimits - assure your spamd child processes
+# do not exceed specified CPU or memory limit
+#
+# loadplugin Mail::SpamAssassin::Plugin::ResourceLimits
+
+# FromNameSpoof - help stop spam that tries to spoof other domains using 
+# the from name
+#
+# loadplugin Mail::SpamAssassin::Plugin::FromNameSpoof
+
+# Phishing - finds uris used in phishing campaigns detected by
+# OpenPhish or PhishTank feeds.
+#
+# loadplugin Mail::SpamAssassin::Plugin::Phishing
+
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 4/7] templates: add template for spamassassin's v400.pre
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (2 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 3/7] templates: add template for spamassassin's v342.pre Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 5/7] config: add spam option for extract_text Stoiko Ivanov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

The individual new features will be enabled in seperate commits

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/Makefile              |  1 +
 src/PMG/Config.pm         |  3 +++
 src/templates/v400.pre.in | 37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 41 insertions(+)
 create mode 100644 src/templates/v400.pre.in

diff --git a/src/Makefile b/src/Makefile
index 414b1ef..0b424e9 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -39,6 +39,7 @@ TEMPLATES =				\
 	v310.pre.in			\
 	v320.pre.in			\
 	v342.pre.in			\
+	v400.pre.in			\
 	razor-agent.conf.in		\
 	freshclam.conf.in		\
 	clamd.conf.in 			\
diff --git a/src/PMG/Config.pm b/src/PMG/Config.pm
index dce1513..5dcffb7 100755
--- a/src/PMG/Config.pm
+++ b/src/PMG/Config.pm
@@ -1518,6 +1518,9 @@ sub rewrite_config_spam {
     $changes = 1 if $self->rewrite_config_file(
 	'v342.pre.in', '/etc/mail/spamassassin/v342.pre');
 
+    $changes = 1 if $self->rewrite_config_file(
+	'v400.pre.in', '/etc/mail/spamassassin/v400.pre');
+
     if ($use_razor) {
 	mkdir "/root/.razor";
 
diff --git a/src/templates/v400.pre.in b/src/templates/v400.pre.in
new file mode 100644
index 0000000..052e73e
--- /dev/null
+++ b/src/templates/v400.pre.in
@@ -0,0 +1,37 @@
+# This is the right place to customize your installation of SpamAssassin.
+#
+# See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
+# tweaked.
+#
+# This file was installed during the installation of SpamAssassin 4.0.0,
+# and contains plugin loading commands for the new plugins added in that
+# release.  It will not be overwritten during future SpamAssassin installs,
+# so you can modify it to enable some disabled-by-default plugins below,
+# if you so wish.
+#
+# There are now multiple files read to enable plugins in the
+# /etc/mail/spamassassin directory; previously only one, "init.pre" was
+# read.  Now both "init.pre", "v310.pre", and any other files ending in
+# ".pre" will be read.  As future releases are made, new plugins will be
+# added to new files, named according to the release they're added in.
+###########################################################################
+
+# ExtractText - Extract text from documents or images for matching
+#
+# Requires manual configuration, see plugin documentation.
+#
+# loadplugin Mail::SpamAssassin::Plugin::ExtractText
+
+# DecodeShortUrl - Check for shortened URLs
+#
+# Note that this plugin will send HTTP requests to different URL shortener
+# services.  Enabling caching is recommended, see plugin documentation.
+#
+# loadplugin Mail::SpamAssassin::Plugin::DecodeShortURLs
+
+# DMARC - Check DMARC compliance
+#
+# Requires Mail::DMARC module and working SPF and DKIM Plugins.
+#
+# loadplugin Mail::SpamAssassin::Plugin::DMARC
+
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 5/7] config: add spam option for extract_text
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (3 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 4/7] templates: add template for spamassassin's v400.pre Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 6/7] templates: enable DecodeShortUrls for SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

toggling the configuration options for the ExtractText SA plugin (see
[0]).

The config is copied from the module itself, the informational headers
were not added, as I don't see too much gain, apart from verifying
that the plugin is working.

the external dependencies for the plugin to work are added as
Recommends, as it is a possible config to not have them installed and
simply disable the option

[0] https://metacpan.org/pod/Mail::SpamAssassin::Plugin::ExtractText
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 debian/control            |  9 ++++++++-
 src/PMG/Config.pm         |  6 ++++++
 src/templates/v400.pre.in | 34 ++++++++++++++++++++++++++++++----
 3 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/debian/control b/debian/control
index 93ad72c..d2ed7da 100644
--- a/debian/control
+++ b/debian/control
@@ -98,7 +98,14 @@ Depends: apt (>= 2~),
          ucf,
          ${misc:Depends},
          ${perl:Depends},
-Recommends: ifupdown2, proxmox-offline-mirror-helper
+Recommends: antiword,
+            docx2txt,
+            ifupdown2,
+            odt2txt,
+            poppler-utils,
+            proxmox-offline-mirror-helper,
+            tesseract-ocr,
+            unrtf
 Suggests: zfsutils-linux
 Description: Proxmox Mailgateway API Server Implementation
  This implements a REST API to configure Proxmox Mailgateway.
diff --git a/src/PMG/Config.pm b/src/PMG/Config.pm
index 5dcffb7..699a622 100755
--- a/src/PMG/Config.pm
+++ b/src/PMG/Config.pm
@@ -211,6 +211,11 @@ sub properties {
 	    minimum => 64,
 	    default => 256*1024,
 	},
+	extract_text => {
+	    description => "Extract text from attachments (doc, pdf, rtf, images) and scan for spam.",
+	    type => 'boolean',
+	    default => 0,
+	},
     };
 }
 
@@ -225,6 +230,7 @@ sub options {
 	bounce_score => { optional => 1 },
 	rbl_checks => { optional => 1 },
 	maxspamsize => { optional => 1 },
+	extract_text => { optional => 1 },
     };
 }
 
diff --git a/src/templates/v400.pre.in b/src/templates/v400.pre.in
index 052e73e..4d68d6c 100644
--- a/src/templates/v400.pre.in
+++ b/src/templates/v400.pre.in
@@ -16,11 +16,37 @@
 # added to new files, named according to the release they're added in.
 ###########################################################################
 
+
+[% IF pmg.spam.extract_text %]
 # ExtractText - Extract text from documents or images for matching
-#
-# Requires manual configuration, see plugin documentation.
-#
-# loadplugin Mail::SpamAssassin::Plugin::ExtractText
+# informational headers and hits not configured
+loadplugin Mail::SpamAssassin::Plugin::ExtractText
+
+ifplugin Mail::SpamAssassin::Plugin::ExtractText
+
+  extracttext_external  pdftotext  /usr/bin/pdftotext -nopgbrk -layout -enc UTF-8 {} -
+  extracttext_use       pdftotext  .pdf application/pdf
+
+  # http://docx2txt.sourceforge.net
+  extracttext_external  docx2txt   /usr/bin/docx2txt {} -
+  extracttext_use       docx2txt   .docx application/docx
+
+  extracttext_external  antiword   /usr/bin/antiword -t -w 0 -m UTF-8.txt {}
+  extracttext_use       antiword   .doc application/(?:vnd\.?)?ms-?word.*
+
+  extracttext_external  unrtf      /usr/bin/unrtf --nopict {}
+  extracttext_use       unrtf      .doc .rtf application/rtf text/rtf
+
+  extracttext_external  odt2txt    /usr/bin/odt2txt --encoding=UTF-8 {}
+  extracttext_use       odt2txt    .odt .ott application/.*?opendocument.*text
+  extracttext_use       odt2txt    .sdw .stw application/(?:x-)?soffice application/(?:x-)?starwriter
+
+  extracttext_external  tesseract  {OMP_THREAD_LIMIT=1} /usr/bin/tesseract -c page_separator= {} -
+  extracttext_use       tesseract  .jpg .png .bmp .tif .tiff image/(?:jpeg|png|x-ms-bmp|tiff)
+
+endif
+
+[% END %]
 
 # DecodeShortUrl - Check for shortened URLs
 #
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 6/7] templates: enable DecodeShortUrls for SpamAssassin 4.0.0
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (4 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 5/7] config: add spam option for extract_text Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 7/7] templates: enable DMARC plugin in v400.pre.in Stoiko Ivanov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

enabled if they system has rbl_checks enabled.
The module resolves url-shortener (e.g. bit.ly) chains.
the KAM rulset has a number of url-shorteners configured
(KAM_urlshorteners.cf).

While the functionality also works without the configured caching
module, it worked well in my tests.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/templates/v400.pre.in | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/templates/v400.pre.in b/src/templates/v400.pre.in
index 4d68d6c..233493d 100644
--- a/src/templates/v400.pre.in
+++ b/src/templates/v400.pre.in
@@ -48,12 +48,17 @@ endif
 
 [% END %]
 
+
+[% IF pmg.spam.rbl_checks %]
 # DecodeShortUrl - Check for shortened URLs
 #
 # Note that this plugin will send HTTP requests to different URL shortener
 # services.  Enabling caching is recommended, see plugin documentation.
 #
-# loadplugin Mail::SpamAssassin::Plugin::DecodeShortURLs
+loadplugin Mail::SpamAssassin::Plugin::DecodeShortURLs
+url_shortener_cache_type dbi
+url_shortener_cache_dsn dbi:SQLite:dbname=/var/lib/pmg/decode_short_urls.db
+[% END %]
 
 # DMARC - Check DMARC compliance
 #
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 7/7] templates: enable DMARC plugin in v400.pre.in
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (5 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 6/7] templates: enable DecodeShortUrls for SpamAssassin 4.0.0 Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-gui 1/1] spamdetector: add extract_text option Stoiko Ivanov
  2023-03-15 15:55 ` [pmg-devel] applied: [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Thomas Lamprecht
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

This module needs Mail::DMARC (libmail-dmarc-perl) as prerequisite.
It is currently only available in sid and bookworm, but can be
trivially rebuild for bullseye.

the dmarc tests are skipped if only internal relays are used/present
in the headers, so I could not explicitly test this

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/templates/v400.pre.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/templates/v400.pre.in b/src/templates/v400.pre.in
index 233493d..e09e807 100644
--- a/src/templates/v400.pre.in
+++ b/src/templates/v400.pre.in
@@ -64,5 +64,5 @@ url_shortener_cache_dsn dbi:SQLite:dbname=/var/lib/pmg/decode_short_urls.db
 #
 # Requires Mail::DMARC module and working SPF and DKIM Plugins.
 #
-# loadplugin Mail::SpamAssassin::Plugin::DMARC
+loadplugin Mail::SpamAssassin::Plugin::DMARC
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-gui 1/1] spamdetector: add extract_text option
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (6 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 7/7] templates: enable DMARC plugin in v400.pre.in Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-27 18:06   ` [pmg-devel] applied: " Thomas Lamprecht
  2023-03-15 15:55 ` [pmg-devel] applied: [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Thomas Lamprecht
  8 siblings, 1 reply; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 js/SpamDetectorOptions.js | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/js/SpamDetectorOptions.js b/js/SpamDetectorOptions.js
index 2a4059c..58eaee9 100644
--- a/js/SpamDetectorOptions.js
+++ b/js/SpamDetectorOptions.js
@@ -19,6 +19,8 @@ Ext.define('PMG.SpamDetectorOptions', {
 	me.add_boolean_row('use_razor', gettext('Use Razor2 checks'),
 			   { defaultValue: 1 });
 
+	me.add_boolean_row('extract_text', gettext('Extract Text from Attachments'));
+
 	me.add_integer_row('maxspamsize', gettext('Max Spam Size (bytes)'),
 			   {
  defaultValue: 256*1024,
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] applied: [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (7 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-gui 1/1] spamdetector: add extract_text option Stoiko Ivanov
@ 2023-03-15 15:55 ` Thomas Lamprecht
  8 siblings, 0 replies; 11+ messages in thread
From: Thomas Lamprecht @ 2023-03-15 15:55 UTC (permalink / raw)
  To: Stoiko Ivanov, pmg-devel

Am 13/03/2023 um 22:23 schrieb Stoiko Ivanov:
> SpamAssassin 4.0.0 was released back in December 2022 and seems to work
> nicely for the greatest part.
> 
> This patchset adapts pmg-api where necessary (patch 1/7), updates the
> templates for less diff (2/7), and enables the new features depending
> on the spam section of pmg.conf (mostly only enabling modules which do
> DNSBL lookups based on the rbl_checks setting)
> 
> for testing the ExtractText plugin I used the gtube test string in
> a libreoffice document (saving it as .doc, .docx, .pdf, .odt, .rtf)
> as it's still quite new functionality I added an explicit config-option
> for it (also to provide some visibility)
> 
> pmg-api:
> Stoiko Ivanov (7):
>   ruledb: spam: adapt to spamassassin 4.0.0
>   templates: sync spamassassin templates with 4.0.0 upstream
>   templates: add template for spamassassin's v342.pre
>   templates: add template for spamassassin's v400.pre
>   config: add spam option for extract_text
>   templates: enable DecodeShortUrls for SpamAssassin 4.0.0
>   templates: enable DMARC plugin in v400.pre.in
> 
>  debian/control            |  9 +++++-
>  src/Makefile              |  2 ++
>  src/PMG/Config.pm         | 12 +++++++
>  src/PMG/RuleDB/Spam.pm    | 10 +++---
>  src/templates/init.pre.in | 26 ++++++++++++---
>  src/templates/v310.pre.in | 18 +++++------
>  src/templates/v320.pre.in |  5 +++
>  src/templates/v342.pre.in | 39 ++++++++++++++++++++++
>  src/templates/v400.pre.in | 68 +++++++++++++++++++++++++++++++++++++++
>  9 files changed, 170 insertions(+), 19 deletions(-)
>  create mode 100644 src/templates/v342.pre.in
>  create mode 100644 src/templates/v400.pre.in
> 
> pmg-gui:
> Stoiko Ivanov (1):
>   spamdetector: add extract_text option
> 
>  js/SpamDetectorOptions.js | 2 ++
>  1 file changed, 2 insertions(+)
> 


applied series, dependency/breaks commits not yet pushed, thanks!




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] applied: [PATCH pmg-gui 1/1] spamdetector: add extract_text option
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-gui 1/1] spamdetector: add extract_text option Stoiko Ivanov
@ 2023-03-27 18:06   ` Thomas Lamprecht
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Lamprecht @ 2023-03-27 18:06 UTC (permalink / raw)
  To: Stoiko Ivanov, pmg-devel

Am 13/03/2023 um 22:23 schrieb Stoiko Ivanov:
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
>  js/SpamDetectorOptions.js | 2 ++
>  1 file changed, 2 insertions(+)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-03-27 18:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 1/7] ruledb: spam: adapt to spamassassin 4.0.0 Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 2/7] templates: sync spamassassin templates with 4.0.0 upstream Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 3/7] templates: add template for spamassassin's v342.pre Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 4/7] templates: add template for spamassassin's v400.pre Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 5/7] config: add spam option for extract_text Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 6/7] templates: enable DecodeShortUrls for SpamAssassin 4.0.0 Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 7/7] templates: enable DMARC plugin in v400.pre.in Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-gui 1/1] spamdetector: add extract_text option Stoiko Ivanov
2023-03-27 18:06   ` [pmg-devel] applied: " Thomas Lamprecht
2023-03-15 15:55 ` [pmg-devel] applied: [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal