all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0
@ 2023-03-13 21:23 Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 1/7] ruledb: spam: adapt to spamassassin 4.0.0 Stoiko Ivanov
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

SpamAssassin 4.0.0 was released back in December 2022 and seems to work
nicely for the greatest part.

This patchset adapts pmg-api where necessary (patch 1/7), updates the
templates for less diff (2/7), and enables the new features depending
on the spam section of pmg.conf (mostly only enabling modules which do
DNSBL lookups based on the rbl_checks setting)

for testing the ExtractText plugin I used the gtube test string in
a libreoffice document (saving it as .doc, .docx, .pdf, .odt, .rtf)
as it's still quite new functionality I added an explicit config-option
for it (also to provide some visibility)

pmg-api:
Stoiko Ivanov (7):
  ruledb: spam: adapt to spamassassin 4.0.0
  templates: sync spamassassin templates with 4.0.0 upstream
  templates: add template for spamassassin's v342.pre
  templates: add template for spamassassin's v400.pre
  config: add spam option for extract_text
  templates: enable DecodeShortUrls for SpamAssassin 4.0.0
  templates: enable DMARC plugin in v400.pre.in

 debian/control            |  9 +++++-
 src/Makefile              |  2 ++
 src/PMG/Config.pm         | 12 +++++++
 src/PMG/RuleDB/Spam.pm    | 10 +++---
 src/templates/init.pre.in | 26 ++++++++++++---
 src/templates/v310.pre.in | 18 +++++------
 src/templates/v320.pre.in |  5 +++
 src/templates/v342.pre.in | 39 ++++++++++++++++++++++
 src/templates/v400.pre.in | 68 +++++++++++++++++++++++++++++++++++++++
 9 files changed, 170 insertions(+), 19 deletions(-)
 create mode 100644 src/templates/v342.pre.in
 create mode 100644 src/templates/v400.pre.in

pmg-gui:
Stoiko Ivanov (1):
  spamdetector: add extract_text option

 js/SpamDetectorOptions.js | 2 ++
 1 file changed, 2 insertions(+)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 1/7] ruledb: spam: adapt to spamassassin 4.0.0
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 2/7] templates: sync spamassassin templates with 4.0.0 upstream Stoiko Ivanov
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

find_all_addrs_in_line was changed to require an instantiated
Mail::SpamAssassin instance in:
https://github.com/apache/spamassassin/commit/139adfb5901b27fa13dccbf3a66c53ca7613f733
(read-only git mirror of the authoritative SVN)

Noticed while using `mutt` and bouncing mails, which adds Resent
headers.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/PMG/RuleDB/Spam.pm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/PMG/RuleDB/Spam.pm b/src/PMG/RuleDB/Spam.pm
index 14d7bea..a0b8f26 100644
--- a/src/PMG/RuleDB/Spam.pm
+++ b/src/PMG/RuleDB/Spam.pm
@@ -300,13 +300,13 @@ sub __get_addr {
 # because we do not call spamassassin in canes of commtouch match
 # see Mail::Spamassassin:PerMsgStatus for details
 sub __all_from_addrs {
-    my ($head) = @_;
+    my ($head, $spamtest) = @_;
 
     my @addrs;
 
     my $resent = $head->get('Resent-From');
     if (defined($resent) && $resent =~ /\S/) {
-	@addrs = Mail::SpamAssassin->find_all_addrs_in_line($resent);
+	@addrs = $spamtest->find_all_addrs_in_line($resent);
     } else {
 	@addrs = map { tr/././s; $_ } grep { $_ ne '' }
         (__get_addr($head, 'From'),		# std
@@ -330,6 +330,8 @@ sub analyze_spam {
 
     $maxspamsize = 200*1024 if !$maxspamsize;
 
+    my $spamtest = $queue->{sa};
+
     my ($sa_score, $sa_max, $sa_scores, $sa_sumary, $list, $autolearn, $bayes, $loglist);
     $list = '';
     $loglist = '';
@@ -345,7 +347,7 @@ sub analyze_spam {
     }
 
     my $fromhash = { $queue->{from} => 1 }; 
-    foreach my $f (__all_from_addrs($entity->head())) {
+    foreach my $f (__all_from_addrs($entity->head(), $spamtest)) {
 	$fromhash->{$f} = 1;
     }
     $queue->{all_from_addrs} = [ keys %$fromhash ];
@@ -373,8 +375,6 @@ sub analyze_spam {
 
     my ($csec, $usec) = gettimeofday ();
 
-    my $spamtest = $queue->{sa};
-
     # only run SA in testmode or when clamav_heuristic did not confirm spam (score < 5)
     if ($msginfo->{testmode} || ($sa_score < 5)) {
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 2/7] templates: sync spamassassin templates with 4.0.0 upstream
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 1/7] ruledb: spam: adapt to spamassassin 4.0.0 Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 3/7] templates: add template for spamassassin's v342.pre Stoiko Ivanov
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

to minimize the diff and disable vanished modules

no functional change intended

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/templates/init.pre.in | 26 ++++++++++++++++++++++----
 src/templates/v310.pre.in | 18 +++++++++---------
 src/templates/v320.pre.in |  5 +++++
 3 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/src/templates/init.pre.in b/src/templates/init.pre.in
index 04ca4d6..98d17b4 100644
--- a/src/templates/init.pre.in
+++ b/src/templates/init.pre.in
@@ -7,24 +7,42 @@
 # in SpamAssassin 3.0.x releases.  It will not be installed if you
 # already have a file in place called "init.pre".
 #
+# There are now multiple files read to enable plugins in the 
+# /etc/mail/spamassassin directory; previously only one, "init.pre" was 
+# read.  Now both "init.pre", "v310.pre", and any other files ending in
+# ".pre" will be read.  As future releases are made, new plugins will be
+# added to new files, named according to the release they're added in.
 ###########################################################################
 
+# Version compatibility - Welcomelist/Blocklist
+# In SpamAssassin 4.0, rules containing "whitelist" or "blacklist" have been
+# renamed to contain more racially neutral "welcomelist" and "blocklist"
+# terms.  When this compatibility flag is enabled, old rule names from stock
+# rules will not hit anymore alongside the new ones.  For more information,
+# see: https://wiki.apache.org/spamassassin/WelcomelistBlocklist
+#
+enable_compat welcomelist_blocklist
+
 # RelayCountry - add metadata for Bayes learning, marking the countries
 # a message was relayed through
 #
+# Note: This requires the Geo::IP Perl module
+#
 # loadplugin Mail::SpamAssassin::Plugin::RelayCountry
 
 [% IF pmg.spam.rbl_checks %]
+# URIDNSBL - look up URLs found in the message against several DNS
+# blocklists.
+#
 loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
 [% END %]
 
-# Hashcash - perform hashcash verification.
-#
-loadplugin Mail::SpamAssassin::Plugin::Hashcash
 
 [% IF pmg.spam.rbl_checks %]
+# SPF - perform SPF verification.
+#
 loadplugin Mail::SpamAssassin::Plugin::SPF
 [% END %]
 
 # always load dkim to improve accuracy
-loadplugin Mail::SpamAssassin::Plugin::DKIM
\ No newline at end of file
+loadplugin Mail::SpamAssassin::Plugin::DKIM
diff --git a/src/templates/v310.pre.in b/src/templates/v310.pre.in
index d72c347..696142d 100644
--- a/src/templates/v310.pre.in
+++ b/src/templates/v310.pre.in
@@ -9,6 +9,11 @@
 # so you can modify it to enable some disabled-by-default plugins below,
 # if you so wish.
 #
+# There are now multiple files read to enable plugins in the
+# /etc/mail/spamassassin directory; previously only one, "init.pre" was
+# read.  Now both "init.pre", "v310.pre", and any other files ending in
+# ".pre" will be read.  As future releases are made, new plugins will be
+# added to new files, named according to the release they're added in.
 ###########################################################################
 
 [% IF pmg.spam.rbl_checks %]
@@ -40,18 +45,13 @@ loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold
 #
 loadplugin Mail::SpamAssassin::Plugin::TextCat
 
-# WhitelistSubject - Whitelist/Blacklist certain subject regular expressions
+# AccessDB - lookup from-addresses in access database
 #
-loadplugin Mail::SpamAssassin::Plugin::WhiteListSubject
+#loadplugin Mail::SpamAssassin::Plugin::AccessDB
 
-###########################################################################
-# experimental plugins
-
-# DomainKeys - perform DomainKeys verification
-#
-# External modules required for use, see INSTALL for more information.
+# WelcomelistSubject - Welcomelist/Blocklist certain subject regular expressions
 #
-#loadplugin Mail::SpamAssassin::Plugin::DomainKeys
+loadplugin Mail::SpamAssassin::Plugin::WelcomeListSubject
 
 # MIMEHeader - apply regexp rules against MIME headers in the message
 #
diff --git a/src/templates/v320.pre.in b/src/templates/v320.pre.in
index db49b07..846c73a 100644
--- a/src/templates/v320.pre.in
+++ b/src/templates/v320.pre.in
@@ -9,6 +9,11 @@
 # so you can modify it to enable some disabled-by-default plugins below,
 # if you so wish.
 #
+# There are now multiple files read to enable plugins in the
+# /etc/mail/spamassassin directory; previously only one, "init.pre" was
+# read.  Now both "init.pre", "v310.pre", and any other files ending in
+# ".pre" will be read.  As future releases are made, new plugins will be
+# added to new files, named according to the release they're added in.
 ###########################################################################
 
 # Check - Provides main check functionality
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 3/7] templates: add template for spamassassin's v342.pre
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 1/7] ruledb: spam: adapt to spamassassin 4.0.0 Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 2/7] templates: sync spamassassin templates with 4.0.0 upstream Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 4/7] templates: add template for spamassassin's v400.pre Stoiko Ivanov
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

The file is taken from upstream, the only change is that we only
enable the HashBL module if rbl_checks are enabled in pmg.conf

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/Makefile              |  1 +
 src/PMG/Config.pm         |  3 +++
 src/templates/v342.pre.in | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)
 create mode 100644 src/templates/v342.pre.in

diff --git a/src/Makefile b/src/Makefile
index 49c7974..414b1ef 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -38,6 +38,7 @@ TEMPLATES =				\
 	local.cf.in			\
 	v310.pre.in			\
 	v320.pre.in			\
+	v342.pre.in			\
 	razor-agent.conf.in		\
 	freshclam.conf.in		\
 	clamd.conf.in 			\
diff --git a/src/PMG/Config.pm b/src/PMG/Config.pm
index a0b1866..dce1513 100755
--- a/src/PMG/Config.pm
+++ b/src/PMG/Config.pm
@@ -1515,6 +1515,9 @@ sub rewrite_config_spam {
     $changes = 1 if $self->rewrite_config_file(
 	'v320.pre.in', '/etc/mail/spamassassin/v320.pre');
 
+    $changes = 1 if $self->rewrite_config_file(
+	'v342.pre.in', '/etc/mail/spamassassin/v342.pre');
+
     if ($use_razor) {
 	mkdir "/root/.razor";
 
diff --git a/src/templates/v342.pre.in b/src/templates/v342.pre.in
new file mode 100644
index 0000000..10dcaa1
--- /dev/null
+++ b/src/templates/v342.pre.in
@@ -0,0 +1,39 @@
+# This is the right place to customize your installation of SpamAssassin.
+#
+# See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
+# tweaked.
+#
+# This file was installed during the installation of SpamAssassin 3.4.2,
+# and contains plugin loading commands for the new plugins added in that
+# release.  It will not be overwritten during future SpamAssassin installs,
+# so you can modify it to enable some disabled-by-default plugins below,
+# if you so wish.
+#
+# There are now multiple files read to enable plugins in the
+# /etc/mail/spamassassin directory; previously only one, "init.pre" was
+# read.  Now both "init.pre", "v310.pre", and any other files ending in
+# ".pre" will be read.  As future releases are made, new plugins will be
+# added to new files, named according to the release they're added in.
+###########################################################################
+
+# HashBL - Query hashed/unhashed strings, emails, uris etc from DNS lists
+#
+[% IF pmg.spam.rbl_checks %]
+loadplugin Mail::SpamAssassin::Plugin::HashBL
+[% END %]
+
+# ResourceLimits - assure your spamd child processes
+# do not exceed specified CPU or memory limit
+#
+# loadplugin Mail::SpamAssassin::Plugin::ResourceLimits
+
+# FromNameSpoof - help stop spam that tries to spoof other domains using 
+# the from name
+#
+# loadplugin Mail::SpamAssassin::Plugin::FromNameSpoof
+
+# Phishing - finds uris used in phishing campaigns detected by
+# OpenPhish or PhishTank feeds.
+#
+# loadplugin Mail::SpamAssassin::Plugin::Phishing
+
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 4/7] templates: add template for spamassassin's v400.pre
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (2 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 3/7] templates: add template for spamassassin's v342.pre Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 5/7] config: add spam option for extract_text Stoiko Ivanov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

The individual new features will be enabled in seperate commits

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/Makefile              |  1 +
 src/PMG/Config.pm         |  3 +++
 src/templates/v400.pre.in | 37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 41 insertions(+)
 create mode 100644 src/templates/v400.pre.in

diff --git a/src/Makefile b/src/Makefile
index 414b1ef..0b424e9 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -39,6 +39,7 @@ TEMPLATES =				\
 	v310.pre.in			\
 	v320.pre.in			\
 	v342.pre.in			\
+	v400.pre.in			\
 	razor-agent.conf.in		\
 	freshclam.conf.in		\
 	clamd.conf.in 			\
diff --git a/src/PMG/Config.pm b/src/PMG/Config.pm
index dce1513..5dcffb7 100755
--- a/src/PMG/Config.pm
+++ b/src/PMG/Config.pm
@@ -1518,6 +1518,9 @@ sub rewrite_config_spam {
     $changes = 1 if $self->rewrite_config_file(
 	'v342.pre.in', '/etc/mail/spamassassin/v342.pre');
 
+    $changes = 1 if $self->rewrite_config_file(
+	'v400.pre.in', '/etc/mail/spamassassin/v400.pre');
+
     if ($use_razor) {
 	mkdir "/root/.razor";
 
diff --git a/src/templates/v400.pre.in b/src/templates/v400.pre.in
new file mode 100644
index 0000000..052e73e
--- /dev/null
+++ b/src/templates/v400.pre.in
@@ -0,0 +1,37 @@
+# This is the right place to customize your installation of SpamAssassin.
+#
+# See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
+# tweaked.
+#
+# This file was installed during the installation of SpamAssassin 4.0.0,
+# and contains plugin loading commands for the new plugins added in that
+# release.  It will not be overwritten during future SpamAssassin installs,
+# so you can modify it to enable some disabled-by-default plugins below,
+# if you so wish.
+#
+# There are now multiple files read to enable plugins in the
+# /etc/mail/spamassassin directory; previously only one, "init.pre" was
+# read.  Now both "init.pre", "v310.pre", and any other files ending in
+# ".pre" will be read.  As future releases are made, new plugins will be
+# added to new files, named according to the release they're added in.
+###########################################################################
+
+# ExtractText - Extract text from documents or images for matching
+#
+# Requires manual configuration, see plugin documentation.
+#
+# loadplugin Mail::SpamAssassin::Plugin::ExtractText
+
+# DecodeShortUrl - Check for shortened URLs
+#
+# Note that this plugin will send HTTP requests to different URL shortener
+# services.  Enabling caching is recommended, see plugin documentation.
+#
+# loadplugin Mail::SpamAssassin::Plugin::DecodeShortURLs
+
+# DMARC - Check DMARC compliance
+#
+# Requires Mail::DMARC module and working SPF and DKIM Plugins.
+#
+# loadplugin Mail::SpamAssassin::Plugin::DMARC
+
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 5/7] config: add spam option for extract_text
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (3 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 4/7] templates: add template for spamassassin's v400.pre Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 6/7] templates: enable DecodeShortUrls for SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

toggling the configuration options for the ExtractText SA plugin (see
[0]).

The config is copied from the module itself, the informational headers
were not added, as I don't see too much gain, apart from verifying
that the plugin is working.

the external dependencies for the plugin to work are added as
Recommends, as it is a possible config to not have them installed and
simply disable the option

[0] https://metacpan.org/pod/Mail::SpamAssassin::Plugin::ExtractText
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 debian/control            |  9 ++++++++-
 src/PMG/Config.pm         |  6 ++++++
 src/templates/v400.pre.in | 34 ++++++++++++++++++++++++++++++----
 3 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/debian/control b/debian/control
index 93ad72c..d2ed7da 100644
--- a/debian/control
+++ b/debian/control
@@ -98,7 +98,14 @@ Depends: apt (>= 2~),
          ucf,
          ${misc:Depends},
          ${perl:Depends},
-Recommends: ifupdown2, proxmox-offline-mirror-helper
+Recommends: antiword,
+            docx2txt,
+            ifupdown2,
+            odt2txt,
+            poppler-utils,
+            proxmox-offline-mirror-helper,
+            tesseract-ocr,
+            unrtf
 Suggests: zfsutils-linux
 Description: Proxmox Mailgateway API Server Implementation
  This implements a REST API to configure Proxmox Mailgateway.
diff --git a/src/PMG/Config.pm b/src/PMG/Config.pm
index 5dcffb7..699a622 100755
--- a/src/PMG/Config.pm
+++ b/src/PMG/Config.pm
@@ -211,6 +211,11 @@ sub properties {
 	    minimum => 64,
 	    default => 256*1024,
 	},
+	extract_text => {
+	    description => "Extract text from attachments (doc, pdf, rtf, images) and scan for spam.",
+	    type => 'boolean',
+	    default => 0,
+	},
     };
 }
 
@@ -225,6 +230,7 @@ sub options {
 	bounce_score => { optional => 1 },
 	rbl_checks => { optional => 1 },
 	maxspamsize => { optional => 1 },
+	extract_text => { optional => 1 },
     };
 }
 
diff --git a/src/templates/v400.pre.in b/src/templates/v400.pre.in
index 052e73e..4d68d6c 100644
--- a/src/templates/v400.pre.in
+++ b/src/templates/v400.pre.in
@@ -16,11 +16,37 @@
 # added to new files, named according to the release they're added in.
 ###########################################################################
 
+
+[% IF pmg.spam.extract_text %]
 # ExtractText - Extract text from documents or images for matching
-#
-# Requires manual configuration, see plugin documentation.
-#
-# loadplugin Mail::SpamAssassin::Plugin::ExtractText
+# informational headers and hits not configured
+loadplugin Mail::SpamAssassin::Plugin::ExtractText
+
+ifplugin Mail::SpamAssassin::Plugin::ExtractText
+
+  extracttext_external  pdftotext  /usr/bin/pdftotext -nopgbrk -layout -enc UTF-8 {} -
+  extracttext_use       pdftotext  .pdf application/pdf
+
+  # http://docx2txt.sourceforge.net
+  extracttext_external  docx2txt   /usr/bin/docx2txt {} -
+  extracttext_use       docx2txt   .docx application/docx
+
+  extracttext_external  antiword   /usr/bin/antiword -t -w 0 -m UTF-8.txt {}
+  extracttext_use       antiword   .doc application/(?:vnd\.?)?ms-?word.*
+
+  extracttext_external  unrtf      /usr/bin/unrtf --nopict {}
+  extracttext_use       unrtf      .doc .rtf application/rtf text/rtf
+
+  extracttext_external  odt2txt    /usr/bin/odt2txt --encoding=UTF-8 {}
+  extracttext_use       odt2txt    .odt .ott application/.*?opendocument.*text
+  extracttext_use       odt2txt    .sdw .stw application/(?:x-)?soffice application/(?:x-)?starwriter
+
+  extracttext_external  tesseract  {OMP_THREAD_LIMIT=1} /usr/bin/tesseract -c page_separator= {} -
+  extracttext_use       tesseract  .jpg .png .bmp .tif .tiff image/(?:jpeg|png|x-ms-bmp|tiff)
+
+endif
+
+[% END %]
 
 # DecodeShortUrl - Check for shortened URLs
 #
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 6/7] templates: enable DecodeShortUrls for SpamAssassin 4.0.0
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (4 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 5/7] config: add spam option for extract_text Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 7/7] templates: enable DMARC plugin in v400.pre.in Stoiko Ivanov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

enabled if they system has rbl_checks enabled.
The module resolves url-shortener (e.g. bit.ly) chains.
the KAM rulset has a number of url-shorteners configured
(KAM_urlshorteners.cf).

While the functionality also works without the configured caching
module, it worked well in my tests.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/templates/v400.pre.in | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/templates/v400.pre.in b/src/templates/v400.pre.in
index 4d68d6c..233493d 100644
--- a/src/templates/v400.pre.in
+++ b/src/templates/v400.pre.in
@@ -48,12 +48,17 @@ endif
 
 [% END %]
 
+
+[% IF pmg.spam.rbl_checks %]
 # DecodeShortUrl - Check for shortened URLs
 #
 # Note that this plugin will send HTTP requests to different URL shortener
 # services.  Enabling caching is recommended, see plugin documentation.
 #
-# loadplugin Mail::SpamAssassin::Plugin::DecodeShortURLs
+loadplugin Mail::SpamAssassin::Plugin::DecodeShortURLs
+url_shortener_cache_type dbi
+url_shortener_cache_dsn dbi:SQLite:dbname=/var/lib/pmg/decode_short_urls.db
+[% END %]
 
 # DMARC - Check DMARC compliance
 #
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-api 7/7] templates: enable DMARC plugin in v400.pre.in
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (5 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 6/7] templates: enable DecodeShortUrls for SpamAssassin 4.0.0 Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-gui 1/1] spamdetector: add extract_text option Stoiko Ivanov
  2023-03-15 15:55 ` [pmg-devel] applied: [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Thomas Lamprecht
  8 siblings, 0 replies; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

This module needs Mail::DMARC (libmail-dmarc-perl) as prerequisite.
It is currently only available in sid and bookworm, but can be
trivially rebuild for bullseye.

the dmarc tests are skipped if only internal relays are used/present
in the headers, so I could not explicitly test this

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/templates/v400.pre.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/templates/v400.pre.in b/src/templates/v400.pre.in
index 233493d..e09e807 100644
--- a/src/templates/v400.pre.in
+++ b/src/templates/v400.pre.in
@@ -64,5 +64,5 @@ url_shortener_cache_dsn dbi:SQLite:dbname=/var/lib/pmg/decode_short_urls.db
 #
 # Requires Mail::DMARC module and working SPF and DKIM Plugins.
 #
-# loadplugin Mail::SpamAssassin::Plugin::DMARC
+loadplugin Mail::SpamAssassin::Plugin::DMARC
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] [PATCH pmg-gui 1/1] spamdetector: add extract_text option
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (6 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 7/7] templates: enable DMARC plugin in v400.pre.in Stoiko Ivanov
@ 2023-03-13 21:23 ` Stoiko Ivanov
  2023-03-27 18:06   ` [pmg-devel] applied: " Thomas Lamprecht
  2023-03-15 15:55 ` [pmg-devel] applied: [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Thomas Lamprecht
  8 siblings, 1 reply; 11+ messages in thread
From: Stoiko Ivanov @ 2023-03-13 21:23 UTC (permalink / raw)
  To: pmg-devel

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 js/SpamDetectorOptions.js | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/js/SpamDetectorOptions.js b/js/SpamDetectorOptions.js
index 2a4059c..58eaee9 100644
--- a/js/SpamDetectorOptions.js
+++ b/js/SpamDetectorOptions.js
@@ -19,6 +19,8 @@ Ext.define('PMG.SpamDetectorOptions', {
 	me.add_boolean_row('use_razor', gettext('Use Razor2 checks'),
 			   { defaultValue: 1 });
 
+	me.add_boolean_row('extract_text', gettext('Extract Text from Attachments'));
+
 	me.add_integer_row('maxspamsize', gettext('Max Spam Size (bytes)'),
 			   {
  defaultValue: 256*1024,
-- 
2.30.2





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] applied: [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0
  2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
                   ` (7 preceding siblings ...)
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-gui 1/1] spamdetector: add extract_text option Stoiko Ivanov
@ 2023-03-15 15:55 ` Thomas Lamprecht
  8 siblings, 0 replies; 11+ messages in thread
From: Thomas Lamprecht @ 2023-03-15 15:55 UTC (permalink / raw)
  To: Stoiko Ivanov, pmg-devel

Am 13/03/2023 um 22:23 schrieb Stoiko Ivanov:
> SpamAssassin 4.0.0 was released back in December 2022 and seems to work
> nicely for the greatest part.
> 
> This patchset adapts pmg-api where necessary (patch 1/7), updates the
> templates for less diff (2/7), and enables the new features depending
> on the spam section of pmg.conf (mostly only enabling modules which do
> DNSBL lookups based on the rbl_checks setting)
> 
> for testing the ExtractText plugin I used the gtube test string in
> a libreoffice document (saving it as .doc, .docx, .pdf, .odt, .rtf)
> as it's still quite new functionality I added an explicit config-option
> for it (also to provide some visibility)
> 
> pmg-api:
> Stoiko Ivanov (7):
>   ruledb: spam: adapt to spamassassin 4.0.0
>   templates: sync spamassassin templates with 4.0.0 upstream
>   templates: add template for spamassassin's v342.pre
>   templates: add template for spamassassin's v400.pre
>   config: add spam option for extract_text
>   templates: enable DecodeShortUrls for SpamAssassin 4.0.0
>   templates: enable DMARC plugin in v400.pre.in
> 
>  debian/control            |  9 +++++-
>  src/Makefile              |  2 ++
>  src/PMG/Config.pm         | 12 +++++++
>  src/PMG/RuleDB/Spam.pm    | 10 +++---
>  src/templates/init.pre.in | 26 ++++++++++++---
>  src/templates/v310.pre.in | 18 +++++------
>  src/templates/v320.pre.in |  5 +++
>  src/templates/v342.pre.in | 39 ++++++++++++++++++++++
>  src/templates/v400.pre.in | 68 +++++++++++++++++++++++++++++++++++++++
>  9 files changed, 170 insertions(+), 19 deletions(-)
>  create mode 100644 src/templates/v342.pre.in
>  create mode 100644 src/templates/v400.pre.in
> 
> pmg-gui:
> Stoiko Ivanov (1):
>   spamdetector: add extract_text option
> 
>  js/SpamDetectorOptions.js | 2 ++
>  1 file changed, 2 insertions(+)
> 


applied series, dependency/breaks commits not yet pushed, thanks!




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pmg-devel] applied: [PATCH pmg-gui 1/1] spamdetector: add extract_text option
  2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-gui 1/1] spamdetector: add extract_text option Stoiko Ivanov
@ 2023-03-27 18:06   ` Thomas Lamprecht
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Lamprecht @ 2023-03-27 18:06 UTC (permalink / raw)
  To: Stoiko Ivanov, pmg-devel

Am 13/03/2023 um 22:23 schrieb Stoiko Ivanov:
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
>  js/SpamDetectorOptions.js | 2 ++
>  1 file changed, 2 insertions(+)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-03-27 18:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-13 21:23 [pmg-devel] [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 1/7] ruledb: spam: adapt to spamassassin 4.0.0 Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 2/7] templates: sync spamassassin templates with 4.0.0 upstream Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 3/7] templates: add template for spamassassin's v342.pre Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 4/7] templates: add template for spamassassin's v400.pre Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 5/7] config: add spam option for extract_text Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 6/7] templates: enable DecodeShortUrls for SpamAssassin 4.0.0 Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-api 7/7] templates: enable DMARC plugin in v400.pre.in Stoiko Ivanov
2023-03-13 21:23 ` [pmg-devel] [PATCH pmg-gui 1/1] spamdetector: add extract_text option Stoiko Ivanov
2023-03-27 18:06   ` [pmg-devel] applied: " Thomas Lamprecht
2023-03-15 15:55 ` [pmg-devel] applied: [PATCH pmg-api 0/7] adapt to SpamAssassin 4.0.0 Thomas Lamprecht

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal