public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH cluster 1/2] clusterlog: segfault reproducer
@ 2021-12-14 10:19 Fabian Grünbichler
  2021-12-14 10:19 ` [pve-devel] [PATCH cluster 2/2] clusterlog: fix segfault / wrong iteration bounds Fabian Grünbichler
  2021-12-15 14:55 ` [pve-devel] applied: [PATCH cluster 1/2] clusterlog: segfault reproducer Thomas Lamprecht
  0 siblings, 2 replies; 4+ messages in thread
From: Fabian Grünbichler @ 2021-12-14 10:19 UTC (permalink / raw)
  To: pve-devel

see next commit for details.

get_state mimics the code path triggered in the wild, the other two are
affected just the same.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    doesn't need to be committed, just to have an easy way to check affected parts
    and the fix..

 data/src/Makefile   |   3 ++
 data/src/logtest2.c | 103 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+)
 create mode 100644 data/src/logtest2.c

diff --git a/data/src/Makefile b/data/src/Makefile
index 3d39201..1f39f67 100644
--- a/data/src/Makefile
+++ b/data/src/Makefile
@@ -35,6 +35,9 @@ create_pmxcfs_db: create_pmxcfs_db.o libpmxcfs.a
 logtest: logtest.o libpmxcfs.a
 	$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)
 
+logtest2: logtest2.o libpmxcfs.a
+	$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)
+
 check_memdb: check_memdb.o libpmxcfs.a
 	$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS) $(shell pkg-config --libs check)
 
diff --git a/data/src/logtest2.c b/data/src/logtest2.c
new file mode 100644
index 0000000..5f8f8f8
--- /dev/null
+++ b/data/src/logtest2.c
@@ -0,0 +1,103 @@
+/*
+  Copyright (C) 2010 - 2020 Proxmox Server Solutions GmbH
+
+  This program is free software: you can redistribute it and/or modify
+  it under the terms of the GNU Affero General Public License as published by
+  the Free Software Foundation, either version 3 of the License, or
+  (at your option) any later version.
+
+  This program is distributed in the hope that it will be useful,
+  but WITHOUT ANY WARRANTY; without even the implied warranty of
+  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+  GNU Affero General Public License for more details.
+
+  You should have received a copy of the GNU Affero General Public License
+  along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+  Author: Dietmar Maurer <dietmar@proxmox.com>
+
+*/
+
+#define _XOPEN_SOURCE /* glibc2 needs this */
+#include <time.h> /* for strptime */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <glib.h>
+#include <sys/types.h>
+#include <sys/syslog.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <unistd.h>
+#include <string.h>
+#include <ctype.h>
+
+#include "cfs-utils.h"
+#include "logger.h"
+
+struct clog_base {
+	uint32_t size;
+	uint32_t cpos;
+	char data[];
+};
+
+struct clusterlog {
+	GHashTable *dedup;
+	GMutex mutex;
+	clog_base_t *base;
+};
+
+cfs_t cfs = {
+        .debug = 0,
+	.nodename = "testnode",
+};
+
+void get_state(clusterlog_t *cl) {
+    unsigned int res_len;
+    clusterlog_get_state(cl, &res_len);
+}
+
+
+void insert(clusterlog_t *cl) {
+	uint32_t pid = getpid();
+	clog_entry_t *entry = (clog_entry_t *)alloca(CLOG_MAX_ENTRY_SIZE);
+	clog_pack(entry, cfs.nodename, "root", "cluster", pid, time(NULL), LOG_INFO, "short");
+	clusterlog_insert(cl, entry);
+}
+
+void insert2(clusterlog_t *cl) {
+	uint32_t pid = getpid();
+	clog_entry_t *entry = (clog_entry_t *)alloca(CLOG_MAX_ENTRY_SIZE);
+	clog_pack(entry, cfs.nodename, "root", "cluster", pid, time(NULL), LOG_INFO, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
+	clusterlog_insert(cl, entry);
+}
+
+int
+main(void)
+{
+	uint32_t pid = getpid();
+
+	clusterlog_t *cl3 = clusterlog_new();
+
+	clog_entry_t *entry = (clog_entry_t *)alloca(CLOG_MAX_ENTRY_SIZE);
+	clog_pack(entry, cfs.nodename, "root", "cluster", pid, time(NULL), LOG_INFO, "starting cluster log");
+	clusterlog_insert(cl3, entry);
+
+	for (int i = 0; i < 184; i++) {
+	       insert2(cl3);
+	}
+
+	for (int i = 0; i < 1629; i++) {
+	       insert(cl3);
+	}
+
+	GString *outbuf = g_string_new(NULL);
+
+	// all of these segfault if they don't handle wrap-arounds pointing to already overwritten entries
+	clusterlog_dump(cl3, outbuf, NULL, 8192);
+	clog_dump(cl3->base);
+	get_state(cl3);
+
+	clusterlog_destroy(cl3);
+}
-- 
2.30.2





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] [PATCH cluster 2/2] clusterlog: fix segfault / wrong iteration bounds
  2021-12-14 10:19 [pve-devel] [PATCH cluster 1/2] clusterlog: segfault reproducer Fabian Grünbichler
@ 2021-12-14 10:19 ` Fabian Grünbichler
  2021-12-15 14:56   ` [pve-devel] applied: " Thomas Lamprecht
  2021-12-15 14:55 ` [pve-devel] applied: [PATCH cluster 1/2] clusterlog: segfault reproducer Thomas Lamprecht
  1 sibling, 1 reply; 4+ messages in thread
From: Fabian Grünbichler @ 2021-12-14 10:19 UTC (permalink / raw)
  To: pve-devel

the clusterlog struct is a basic ring buffer:

struct clog_base {
    uint32_t size; // total size of this clog_base
    uint32_t cpos; // index into data, starts counting at start of clog_base, initially 0
    char data[];
};

an entry consists of indices of the next and previous entries and
various fields (fixed-length ones omitted here):

typedef struct {
	uint32_t prev; // index of previous entry, or 0 if none exists
	uint32_t next; // index of next entry
	[..] // fixed-length fields
	uint8_t node_len;
	uint8_t ident_len;
	uint8_t tag_len;
	uint32_t msg_len;
	char data[]; // node+ident+tag+msg - variable-length fields
} clog_entry_t;

the next and prev indices are calculated when allocating a new entry,
and the position of the current entry 'cpos' is updated accordingly
(clog_alloc_entry):
- size of the entry is padded with up to 7 bytes
- first entry goes to index 8
- second and subsequent entries go to the current entry's 'next' index
- if the current entry's 'next' index is out of bonds, the first entry
  is overwritten => wrap-around
- the 'prev' index of the new entry is set to cpos
- cpos is set to the index of the new entry
- the 'next' index of the new entry is set to its index+padded size

when iterating over the entries, the following bounds are used to follow
the 'prev' links starting at the current entry:

	while (cpos && (cpos <= clog->cpos || cpos > (clog->cpos + CLOG_MAX_ENTRY_SIZE))) {

while this handles a not-yet-wrapped around ring buffer (cpos would be 0
when reaching the first entry), and tries to handle wrap-arounds by
terminating when reaching a 'red-zone' of 'CLOG_MAX_ENTRY_SIZE' starting
at the current entry (this covers the current entry which was already
visited as first entry during the iteration, and the next entry after it
which might have been overwritten) - but it's possible that entries line
up so that the wrap-around 'prev' index of the first entry points to a
location *before* the current entry.

for example, looking at clog_base with S being the size field, C being
the cpos field, followed by the actual data. N/P are the next/prev
indices of the entry at C, Q denotes the 'prev' index of the first entry
in the data array, and 'R' the red zone used for the loop check in case
of wrap-around.

first, fill up the buffer with six large entries:

Q                               P      C      N
|                               |      |      |
|                               |      |      |
v                               v      v      v
+-+-+------+------+------+------+------+------+-+
| | |      |      |      |      |      |      |x|
| | |   1  |   2  |   3  |   4  |   5  |   6  |x|
| | |      |      |      |      |      |      |x|
+-+-+------+------+------+------+------+------+-+
 S C                                    RRRRRRRRRRR

iterating from C backwards ends up at Q being 0, terminating the loop
without a wrap-around after having visit 6->1

now the next (in this example, smaller) entry that gets allocated/insert
needs to wrap around, because the empty space at the end (denoted by
XXX) is too small:

    C      N                          QP
    |      |                          ||
    |      |                          ||
    v      v                          vv
+-+-+------+------+------+------+------+------+-+
| | |      |      |      |      |      |      |x|
| | |   7  |   2  |   3  |   4  |   5  |   6  |x|
| | |      |      |      |      |      |      |x|
+-+-+------+------+------+------+------+------+-+
 S C RRRRRRRRRRR

iterating backwards from C terminates the loop when reaching the red
zone, with the (second) entry no longer being considered since it partly
overlaps it. only 7->3 are visited.

adding more entries we end up with the following layout:

                                   P  QC   N
                                   |  ||   |
                                   |  ||   |
                                   v  vv   v
+-+-+------+---+---+---+---+---+---+---+---+--+-+
| | |      |   |   |   |   |   |   |   |   |##|x|
| | |   7  | 8 | 9 |10 |11 |12 |13 |14 |15 |#6|x|
| | |      |   |   |   |   |   |   |   |   |##|x|
+-+-+------+---+---+---+---+---+---+---+---+--+-+
 S C                                    RRRRRRRRRRR

with # denoting space previously occupied the last large entry (#6)
which is still unmodified (the rest of that entry's data has been
overwritten by entries #14 and #15).

iterating from C (to the left/P) the loop ends up at entry #7, follows
the link to Q (which satisfies the loop bounds as Q < C), and the data
starting at (invalid index) Q gets interpreted as an entry. it is
possible (though even more unlikely than the partial overwrite case)
that Q and C line up perfectly, which would cause the loop to become an
infinite loop. the loop *should* terminate after having visited 15-7,
without wrapping around.

note that the actual sizes of the entries are not relevant, the
requirements are:
- entry before last wrap-around must be big enough that entry of current
  index can overtake it without another wrap-around
- method that does iteration must be called before next wrap-around

the fix is obviously trivial once the issue became apparent - when
wrapping around during iteration, additionally check that we are not
jumping across the red zone into already invalidated parts of data.

clusterlog_merge is technically not affected since it aborts before a
wrap-around anyway, but it doesn't hurt to have the checks consistently
in case this ever changes.

thanks to @kev1904 on our community forums for reporting and providing the data
to nail the cause down fast!

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 data/src/logger.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/data/src/logger.c b/data/src/logger.c
index 4cf9cce..619e1f6 100644
--- a/data/src/logger.c
+++ b/data/src/logger.c
@@ -136,6 +136,11 @@ clog_dump(clog_base_t *clog)
 	while (cpos && (cpos <= clog->cpos || cpos > (clog->cpos + CLOG_MAX_ENTRY_SIZE))) {
 		clog_entry_t *cur = (clog_entry_t *)((char *)clog + cpos);
 		clog_dump_entry(cur, cpos);
+
+		// wrap-around has to land after initial position
+		if (cpos < cur->prev && cur->prev <= clog->cpos) {
+			break;
+		}
 		cpos = cur->prev;
 	}
 }
@@ -165,6 +170,11 @@ clog_dump_json(
 	guint count = 0;
 	while (cpos && (cpos <= clog->cpos || cpos > (clog->cpos + CLOG_MAX_ENTRY_SIZE))) {
 		clog_entry_t *cur = (clog_entry_t *)((char *)clog + cpos);
+
+		// wrap-around has to land after initial position
+		if (cpos < cur->prev && cur->prev <= clog->cpos) {
+			break;
+		}
 		cpos = cur->prev;
 
 		if (count >= max_entries)
@@ -358,6 +368,11 @@ clog_sort(clog_base_t *clog)
 
 		g_tree_insert(tree, cur, cur);
 
+		// wrap-around has to land after initial position
+		if (cpos < cur->prev && cur->prev <= clog->cpos) {
+			break;
+		}
+
 		cpos = cur->prev;
 	}
 
@@ -509,10 +524,12 @@ clusterlog_merge(
 				break;
 		}
 
-		if (!cur->prev) {
+		// no previous entry or wrap-around into already overwritten entry
+		if (!cur->prev || (cpos[found] < cur->prev && cur->prev <= clog[found]->cpos)) {
 			cpos[found] = 0;
 		} else {
 			cpos[found] = cur->prev;
+			// wrap-around into current entry
 			if (!(cpos[found] <= clog[found]->cpos || 
 			      cpos[found] > (clog[found]->cpos + CLOG_MAX_ENTRY_SIZE))) {
 				cpos[found] = 0;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] applied: [PATCH cluster 1/2] clusterlog: segfault reproducer
  2021-12-14 10:19 [pve-devel] [PATCH cluster 1/2] clusterlog: segfault reproducer Fabian Grünbichler
  2021-12-14 10:19 ` [pve-devel] [PATCH cluster 2/2] clusterlog: fix segfault / wrong iteration bounds Fabian Grünbichler
@ 2021-12-15 14:55 ` Thomas Lamprecht
  1 sibling, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2021-12-15 14:55 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

On 14.12.21 11:19, Fabian Grünbichler wrote:
> see next commit for details.
> 
> get_state mimics the code path triggered in the wild, the other two are
> affected just the same.
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> 
> Notes:
>     doesn't need to be committed, just to have an easy way to check affected parts
>     and the fix..
> 

we should maybe move them to a test dir and see if we can get some mire unit tests#
or regression tests in general, but I think it's good to have in the git history.

>  data/src/Makefile   |   3 ++
>  data/src/logtest2.c | 103 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 106 insertions(+)
>  create mode 100644 data/src/logtest2.c
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] applied: [PATCH cluster 2/2] clusterlog: fix segfault / wrong iteration bounds
  2021-12-14 10:19 ` [pve-devel] [PATCH cluster 2/2] clusterlog: fix segfault / wrong iteration bounds Fabian Grünbichler
@ 2021-12-15 14:56   ` Thomas Lamprecht
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2021-12-15 14:56 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

On 14.12.21 11:19, Fabian Grünbichler wrote:
...
> note that the actual sizes of the entries are not relevant, the
> requirements are:
> - entry before last wrap-around must be big enough that entry of current
>   index can overtake it without another wrap-around
> - method that does iteration must be called before next wrap-around
> 
> the fix is obviously trivial once the issue became apparent - when
> wrapping around during iteration, additionally check that we are not
> jumping across the red zone into already invalidated parts of data.
> 
> clusterlog_merge is technically not affected since it aborts before a
> wrap-around anyway, but it doesn't hurt to have the checks consistently
> in case this ever changes.
> 
> thanks to @kev1904 on our community forums for reporting and providing the data
> to nail the cause down fast!
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
>  data/src/logger.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
>

applied, many thanks!!




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-15 14:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-14 10:19 [pve-devel] [PATCH cluster 1/2] clusterlog: segfault reproducer Fabian Grünbichler
2021-12-14 10:19 ` [pve-devel] [PATCH cluster 2/2] clusterlog: fix segfault / wrong iteration bounds Fabian Grünbichler
2021-12-15 14:56   ` [pve-devel] applied: " Thomas Lamprecht
2021-12-15 14:55 ` [pve-devel] applied: [PATCH cluster 1/2] clusterlog: segfault reproducer Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal