From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 515921FF165 for ; Thu, 11 Sep 2025 12:06:15 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id F0CC1FE40; Thu, 11 Sep 2025 12:06:04 +0200 (CEST) From: Gabriel Goller To: pve-devel@lists.proxmox.com Date: Thu, 11 Sep 2025 12:05:42 +0200 Message-ID: <20250911100555.63174-2-g.goller@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20250911100555.63174-1-g.goller@proxmox.com> References: <20250911100555.63174-1-g.goller@proxmox.com> MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1757585157213 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.005 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH pve-kernel 1/5] kernel: backport: netfilter: nft_set_pipapo: don't check genbit from packetpath lookups X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" The pipapo set is a specific nftables set which holds ip-ranges. When updating the set, the new rules are not visible (to the live ruleset) until the next generation, so we don't need to check the genbit on lookup. Signed-off-by: Gabriel Goller --- ...t_pipapo-don-t-check-genbit-from-pac.patch | 160 ++++++++++++++++++ 1 file changed, 160 insertions(+) create mode 100644 patches/kernel/0014-netfilter-nft_set_pipapo-don-t-check-genbit-from-pac.patch diff --git a/patches/kernel/0014-netfilter-nft_set_pipapo-don-t-check-genbit-from-pac.patch b/patches/kernel/0014-netfilter-nft_set_pipapo-don-t-check-genbit-from-pac.patch new file mode 100644 index 000000000000..be34c0efa7e9 --- /dev/null +++ b/patches/kernel/0014-netfilter-nft_set_pipapo-don-t-check-genbit-from-pac.patch @@ -0,0 +1,160 @@ +From da2f3da272763d4382e4c82d2a6d1358aed10123 Mon Sep 17 00:00:00 2001 +From: Gabriel Goller +Date: Wed, 10 Sep 2025 12:08:09 +0200 +Subject: [PATCH 1/5] netfilter: nft_set_pipapo: don't check genbit from + packetpath lookups + +The pipapo set type is special in that it has two copies of its +datastructure: one live copy containing only valid elements and one +on-demand clone used during transaction where adds/deletes happen. + +This clone is not visible to the datapath. + +This is unlike all other set types in nftables, those all link new +elements into their live hlist/tree. + +For those sets, the lookup functions must skip the new elements while the +transaction is ongoing to ensure consistency. + +As the clone is shallow, removal does have an effect on the packet path: +once the transaction enters the commit phase the 'gencursor' bit that +determines which elements are active and which elements should be ignored +(because they are no longer valid) is flipped. + +This causes the datapath lookup to ignore these elements if they are found +during lookup. + +This opens up a small race window where pipapo has an inconsistent view of +the dataset from when the transaction-cpu flipped the genbit until the +transaction-cpu calls nft_pipapo_commit() to swap live/clone pointers: + +cpu0 cpu1 + has added new elements to clone + has marked elements as being + inactive in new generation + perform lookup in the set + enters commit phase: + +I) increments the genbit + A) observes new genbit + removes elements from the clone so + they won't be found anymore + B) lookup in datastructure + can't see new elements yet, + but old elements are ignored + -> Only matches elements that + were not changed in the + transaction +II) calls nft_pipapo_commit(), clone + and live pointers are swapped. + C New nft_lookup happening now + will find matching elements. + +Consider a packet matching range r1-r2: + +cpu0 processes following transaction: +1. remove r1-r2 +2. add r1-r3 + +P is contained in both ranges. Therefore, cpu1 should always find a match +for P. Due to above race, this is not the case: + +cpu1 does find r1-r2, but then ignores it due to the genbit indicating +the range has been removed. + +At the same time, r1-r3 is not visible yet, because it can only be found +in the clone. + +The situation persists for all lookups until after cpu0 hits II). + +The fix is easy: Don't check the genbit from pipapo lookup functions. +This is possible because unlike the other set types, the new elements are +not reachable from the live copy of the dataset. + +The clone/live pointer swap is enough to avoid matching on old elements +while at the same time all new elements are exposed in one go. + +After this change, step B above returns a match in r1-r2. +This is fine: r1-r2 only becomes truly invalid the moment they get freed. +This happens after a synchronize_rcu() call and rcu read lock is held +via netfilter hook traversal (nf_hook_slow()). + +Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") +Signed-off-by: Florian Westphal +Signed-off-by: Gabriel Goller +--- + net/netfilter/nft_set_pipapo.c | 21 +++++++++++++++++++-- + net/netfilter/nft_set_pipapo_avx2.c | 4 +--- + 2 files changed, 20 insertions(+), 5 deletions(-) + +diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c +index 0529e4ef7520..a386c3761cac 100644 +--- a/net/netfilter/nft_set_pipapo.c ++++ b/net/netfilter/nft_set_pipapo.c +@@ -405,6 +405,24 @@ int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules, + * + * For more details, see DOC: Theory of Operation. + * ++ * Unlike other set types, this uses NFT_GENMASK_ANY instead of ++ * nft_genmask_cur(). ++ * ++ * This is because new (future) elements are not reachable from ++ * priv->match, they get added to priv->clone instead. ++ * When the commit phase flips the generation bitmask, the ++ * 'now old' entries are skipped but without the 'now current' ++ * elements becoming visible. Using nft_genmask_cur() thus creates ++ * inconsistent state: matching old entries get skipped but thew ++ * newly matching entries are unreachable. ++ * ++ * GENMASK will still find the 'now old' entries which ensures consistent ++ * priv->match view. ++ * ++ * nft_pipapo_commit swaps ->clone and ->match shortly after the ++ * genbit flip. As ->clone doesn't contain the old entries in the first ++ * place, lookup will only find the now-current ones. ++ * + * Return: true on match, false otherwise. + */ + bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, +@@ -413,7 +431,6 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, + struct nft_pipapo *priv = nft_set_priv(set); + struct nft_pipapo_scratch *scratch; + unsigned long *res_map, *fill_map; +- u8 genmask = nft_genmask_cur(net); + const struct nft_pipapo_match *m; + const struct nft_pipapo_field *f; + const u8 *rp = (const u8 *)key; +@@ -471,7 +488,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, + if (last) { + *ext = &f->mt[b].e->ext; + if (unlikely(nft_set_elem_expired(*ext) || +- !nft_set_elem_active(*ext, genmask))) ++ !nft_set_elem_active(*ext, NFT_GENMASK_ANY))) + goto next_match; + + /* Last field: we're just returning the key without +diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c +index be7c16c79f71..675c0a9634b6 100644 +--- a/net/netfilter/nft_set_pipapo_avx2.c ++++ b/net/netfilter/nft_set_pipapo_avx2.c +@@ -1151,7 +1151,6 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, + { + struct nft_pipapo *priv = nft_set_priv(set); + struct nft_pipapo_scratch *scratch; +- u8 genmask = nft_genmask_cur(net); + const struct nft_pipapo_match *m; + const struct nft_pipapo_field *f; + const u8 *rp = (const u8 *)key; +@@ -1245,8 +1244,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, + + if (last) { + *ext = &f->mt[ret].e->ext; +- if (unlikely(nft_set_elem_expired(*ext) || +- !nft_set_elem_active(*ext, genmask))) { ++ if (unlikely(nft_set_elem_expired(*ext))) { + ret = 0; + goto next_match; + } +-- +2.47.3 + -- 2.47.3 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel