all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Gabriel Goller <g.goller@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH pve-kernel 5/5] kernel: backport: netfilter: nf_tables: restart set lookup on base_seq change
Date: Thu, 11 Sep 2025 12:05:46 +0200	[thread overview]
Message-ID: <20250911100555.63174-6-g.goller@proxmox.com> (raw)
In-Reply-To: <20250911100555.63174-1-g.goller@proxmox.com>

If a set lookup fails, retry again if the base_seq has changed. This
means that if a lookup does not return a value, check if there was a
generation change in the meantime. If there was, redo the lookup. Exact
explanation is inside the patch.

Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---
 ...les-restart-set-lookup-on-base_seq-c.patch | 148 ++++++++++++++++++
 1 file changed, 148 insertions(+)
 create mode 100644 patches/kernel/0018-netfilter-nf_tables-restart-set-lookup-on-base_seq-c.patch

diff --git a/patches/kernel/0018-netfilter-nf_tables-restart-set-lookup-on-base_seq-c.patch b/patches/kernel/0018-netfilter-nf_tables-restart-set-lookup-on-base_seq-c.patch
new file mode 100644
index 000000000000..26b28ffc0aa1
--- /dev/null
+++ b/patches/kernel/0018-netfilter-nf_tables-restart-set-lookup-on-base_seq-c.patch
@@ -0,0 +1,148 @@
+From 35dd20f6e27b4a6b0b6c2cbf6ebf8eab33121378 Mon Sep 17 00:00:00 2001
+From: Gabriel Goller <g.goller@proxmox.com>
+Date: Wed, 10 Sep 2025 12:18:18 +0200
+Subject: [PATCH 5/5] netfilter: nf_tables: restart set lookup on base_seq
+ change
+
+The hash, hash_fast, rhash and bitwise sets may indicate no result even
+though a matching element exists during a short time window while other
+cpu is finalizing the transaction.
+
+This happens when the hash lookup/bitwise lookup function has picked up
+the old genbit, right before it was toggled by nf_tables_commit(), but
+then the same cpu managed to unlink the matching old element from the
+hash table:
+
+cpu0					cpu1
+  has added new elements to clone
+  has marked elements as being
+  inactive in new generation
+					perform lookup in the set
+  enters commit phase:
+					A) observes old genbit
+   increments base_seq
+I) increments the genbit
+II) removes old element from the set
+					B) finds matching element
+					C) returns no match: found
+					element is not valid in old
+					generation
+
+					Next lookup observes new genbit and
+					finds matching e2.
+
+Consider a packet matching element e1, e2.
+
+cpu0 processes following transaction:
+1. remove e1
+2. adds e2, which has same key as e1.
+
+P matches both e1 and e2.  Therefore, cpu1 should always find a match
+for P. Due to above race, this is not the case:
+
+cpu1 observed the old genbit.  e2 will not be considered once it is found.
+The element e1 is not found anymore if cpu0 managed to unlink it from the
+hlist before cpu1 found it during list traversal.
+
+The situation only occurs for a brief time period, lookups happening
+after I) observe new genbit and return e2.
+
+This problem exists in all set types except nft_set_pipapo, so fix it once
+in nft_lookup rather than each set ops individually.
+
+Sample the base sequence counter, which gets incremented right before the
+genbit is changed.
+
+Then, if no match is found, retry the lookup if the base sequence was
+altered in between.
+
+If the base sequence hasn't changed:
+ - No update took place: no-match result is expected.
+   This is the common case.  or:
+ - nf_tables_commit() hasn't progressed to genbit update yet.
+   Old elements were still visible and nomatch result is expected, or:
+ - nf_tables_commit updated the genbit:
+   We picked up the new base_seq, so the lookup function also picked
+   up the new genbit, no-match result is expected.
+
+If the old genbit was observed, then nft_lookup also picked up the old
+base_seq: nft_lookup_should_retry() returns true and relookup is performed
+in the new generation.
+
+Thanks to Pablo Neira Ayuso for reviewing an earlier version of this
+patchset, for suggesting re-use of existing base_seq and placement of
+the restart loop in nft_set_do_lookup().
+
+Signed-off-by: Florian Westphal <fw@strlen.de>
+[GG: this patch needed a bigger change because __nft_set_do_lookup
+doesn't ext pointer yet, but still returns a bool, so we just check if
+the bool is false, so if no element has been found and retry.]
+Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
+---
+ net/netfilter/nf_tables_api.c |  3 ++-
+ net/netfilter/nft_lookup.c    | 32 +++++++++++++++++++++++++++++++-
+ 2 files changed, 33 insertions(+), 2 deletions(-)
+
+diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
+index f9e7f056ea5b..40a32ef86470 100644
+--- a/net/netfilter/nf_tables_api.c
++++ b/net/netfilter/nf_tables_api.c
+@@ -10733,7 +10733,8 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
+ 	while (++base_seq == 0)
+ 		;
+ 
+-	WRITE_ONCE(net->nft.base_seq, base_seq);
++	/* pairs with smp_load_acquire in nft_lookup_eval */
++	smp_store_release(&net->nft.base_seq, base_seq);
+ 
+ 	gc_seq = nft_gc_seq_begin(nft_net);
+ 
+diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
+index 7d0add1041bb..fc0eb91d94c0 100644
+--- a/net/netfilter/nft_lookup.c
++++ b/net/netfilter/nft_lookup.c
+@@ -53,10 +53,40 @@ static bool __nft_set_do_lookup(const struct net *net, const struct nft_set *set
+ #endif
+ 	return set->ops->lookup(net, set, key, ext);
+ }
++
++static unsigned int nft_base_seq(const struct net *net)
++{
++	/* pairs with smp_store_release() in nf_tables_commit() */
++	return smp_load_acquire(&net->nft.base_seq);
++}
++
++static bool nft_lookup_should_retry(const struct net *net, unsigned int seq)
++{
++	return unlikely(seq != nft_base_seq(net));
++}
++
+ bool nft_set_do_lookup(const struct net *net, const struct nft_set *set,
+ 		       const u32 *key, const struct nft_set_ext **ext)
+ {
+-	return __nft_set_do_lookup(net, set, key, ext);
++	bool result;
++	unsigned int base_seq;
++
++	do {
++		base_seq = nft_base_seq(net);
++
++		result = __nft_set_do_lookup(net, set, key, ext);
++		if (result)
++			break;
++		/* No match?  There is a small chance that lookup was
++		 * performed in the old generation, but nf_tables_commit()
++		 * already unlinked a (matching) element.
++		 *
++		 * We need to repeat the lookup to make sure that we didn't
++		 * miss a matching element in the new generation.
++		 */
++	} while (nft_lookup_should_retry(net, base_seq));
++
++	return result;
+ }
+ EXPORT_SYMBOL_GPL(nft_set_do_lookup);
+ 
+-- 
+2.47.3
+
-- 
2.47.3



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


      parent reply	other threads:[~2025-09-11 10:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11 10:05 [pve-devel] [PATCH kernel 0/5] backport nftables atomicity fix Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 1/5] kernel: backport: netfilter: nft_set_pipapo: don't check genbit from packetpath lookups Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 2/5] kernel: backport: netfilter: nft_set_rbtree: continue traversal if element is inactive Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 3/5] kernel: backport: netfilter: nf_tables: place base_seq in struct net Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 4/5] kernel: backport: netfilter: nf_tables: make nft_set_do_lookup available unconditionally Gabriel Goller
2025-09-11 10:05 ` Gabriel Goller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250911100555.63174-6-g.goller@proxmox.com \
    --to=g.goller@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal