all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH pve-kernel] cherry-pick/backport amd{gpu, _sfh} fixes from ubuntu-jammy
@ 2021-12-10  9:24 Fabian Ebner
  0 siblings, 0 replies; only message in thread
From: Fabian Ebner @ 2021-12-10  9:24 UTC (permalink / raw)
  To: pve-devel

Some users reported boot failures after updating to the latest 5.13
kernel[0] because of a crash in amdgpu.

The patch
    drm/amdgpu: fix uvd crash on Polaris12 during driver unloading
fixes
    d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver
    unload")
which is present as a backport 838dfb5888ff in the impish tree. As
this is a supplement to the original one, fixing a crash with a
similar backtrace as the ones in the forum thread[0], this seems to be
the most promising.

The patch
    drm/amd/pm: avoid duplicate powergate/ungate setting
is related as it fixes
    bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12
    UVD/VCE on suspend")
which is the same commit that was fixed by 838dfb5888ff and has a Cc
for stable. A very slight adaptation of the surrounding code was
necessary for the patch to apply.

The patch
    drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works
    on vga and dvi connectors
is likely not related, but it seems simply enough, has a Cc for stable
and applied cleanly.

The patch (with the same title as the one it fixes)
    HID: amd_sfh: Fix potential NULL pointer dereference
fixes
    d46ef750ed58 ("HID: amd_sfh: Fix potential NULL pointer
    dereference")
which is present as a backport 56559d7910e7 in the impish tree and
seems like the most likely culprit for a different issue reported in
the same forum thread[1]. A very slight adaptation of the surrounding
code was necessary for the patch to apply.

[0]: https://forum.proxmox.com/threads/100825/
[1]: https://forum.proxmox.com/threads/100825/post-435329

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 ...x-potential-NULL-pointer-dereference.patch |  52 ++++++++
 ...vd-crash-on-Polaris12-during-driver-.patch |  71 +++++++++++
 ...et-scaling-mode-Full-Full-aspect-Cen.patch |  45 +++++++
 ...d-duplicate-powergate-ungate-setting.patch | 119 ++++++++++++++++++
 4 files changed, 287 insertions(+)
 create mode 100644 patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch
 create mode 100644 patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch
 create mode 100644 patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch
 create mode 100644 patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch

diff --git a/patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch b/patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch
new file mode 100644
index 0000000..993328e
--- /dev/null
+++ b/patches/kernel/0011-HID-amd_sfh-Fix-potential-NULL-pointer-dereference.patch
@@ -0,0 +1,52 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
+Date: Thu, 23 Sep 2021 17:59:27 +0530
+Subject: [PATCH] HID: amd_sfh: Fix potential NULL pointer dereference
+
+The cl_data field of a privdata must be allocated and updated before
+using in amd_sfh_hid_client_init() function.
+
+Hence handling NULL pointer cl_data accordingly.
+
+Fixes: d46ef750ed58 ("HID: amd_sfh: Fix potential NULL pointer dereference")
+Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
+Signed-off-by: Jiri Kosina <jkosina@suse.cz>
+[trivial backport]
+Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
+---
+ drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 12 ++++--------
+ 1 file changed, 4 insertions(+), 8 deletions(-)
+
+diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
+index 9a1824757aae..05c007b213f2 100644
+--- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
++++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
+@@ -235,21 +235,17 @@ static int amd_mp2_pci_probe(struct pci_dev *pdev, const struct pci_device_id *i
+ 		return rc;
+ 	}
+ 
+-	rc = amd_sfh_hid_client_init(privdata);
+-	if (rc)
+-		return rc;
+-
+ 	privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct amdtp_cl_data), GFP_KERNEL);
+ 	if (!privdata->cl_data)
+ 		return -ENOMEM;
+ 
+-	rc = devm_add_action_or_reset(&pdev->dev, amd_mp2_pci_remove, privdata);
++	mp2_select_ops(privdata);
++
++	rc = amd_sfh_hid_client_init(privdata);
+ 	if (rc)
+ 		return rc;
+ 
+-	mp2_select_ops(privdata);
+-
+-	return 0;
++	return devm_add_action_or_reset(&pdev->dev, amd_mp2_pci_remove, privdata);
+ }
+ 
+ static const struct pci_device_id amd_mp2_pci_tbl[] = {
+-- 
+2.30.2
+
diff --git a/patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch b/patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch
new file mode 100644
index 0000000..59a4f57
--- /dev/null
+++ b/patches/kernel/0012-drm-amdgpu-fix-uvd-crash-on-Polaris12-during-driver-.patch
@@ -0,0 +1,71 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Evan Quan <evan.quan@amd.com>
+Date: Sat, 9 Oct 2021 17:35:36 +0800
+Subject: [PATCH] drm/amdgpu: fix uvd crash on Polaris12 during driver
+ unloading
+
+BugLink: https://bugs.launchpad.net/bugs/1951822
+
+[ Upstream commit 4fc30ea780e0a5c1c019bc2e44f8523e1eed9051 ]
+
+There was a change(below) target for such issue:
+d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver unload")
+But the fix for VI ASICs was missing there. This is a supplement for
+that.
+
+Fixes: d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver unload")
+
+Signed-off-by: Evan Quan <evan.quan@amd.com>
+Acked-by: Alex Deucher <alexander.deucher@amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
+---
+ drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 24 +++++++++++++-----------
+ 1 file changed, 13 insertions(+), 11 deletions(-)
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+index bc571833632e..72f876290768 100644
+--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+@@ -543,6 +543,19 @@ static int uvd_v6_0_hw_fini(void *handle)
+ {
+ 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ 
++	cancel_delayed_work_sync(&adev->uvd.idle_work);
++
++	if (RREG32(mmUVD_STATUS) != 0)
++		uvd_v6_0_stop(adev);
++
++	return 0;
++}
++
++static int uvd_v6_0_suspend(void *handle)
++{
++	int r;
++	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
++
+ 	/*
+ 	 * Proper cleanups before halting the HW engine:
+ 	 *   - cancel the delayed idle work
+@@ -567,17 +580,6 @@ static int uvd_v6_0_hw_fini(void *handle)
+ 						       AMD_CG_STATE_GATE);
+ 	}
+ 
+-	if (RREG32(mmUVD_STATUS) != 0)
+-		uvd_v6_0_stop(adev);
+-
+-	return 0;
+-}
+-
+-static int uvd_v6_0_suspend(void *handle)
+-{
+-	int r;
+-	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+-
+ 	r = uvd_v6_0_hw_fini(adev);
+ 	if (r)
+ 		return r;
+-- 
+2.30.2
+
diff --git a/patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch b/patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch
new file mode 100644
index 0000000..b904bbd
--- /dev/null
+++ b/patches/kernel/0013-drm-amdgpu-fix-set-scaling-mode-Full-Full-aspect-Cen.patch
@@ -0,0 +1,45 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: hongao <hongao@uniontech.com>
+Date: Thu, 11 Nov 2021 11:32:07 +0800
+Subject: [PATCH] drm/amdgpu: fix set scaling mode Full/Full aspect/Center not
+ works on vga and dvi connectors
+
+BugLink: https://bugs.launchpad.net/bugs/1952579
+
+commit bf552083916a7f8800477b5986940d1c9a31b953 upstream.
+
+amdgpu_connector_vga_get_modes missed function amdgpu_get_native_mode
+which assign amdgpu_encoder->native_mode with *preferred_mode result in
+amdgpu_encoder->native_mode.clock always be 0. That will cause
+amdgpu_connector_set_property returned early on:
+if ((rmx_type != DRM_MODE_SCALE_NONE) &&
+	(amdgpu_encoder->native_mode.clock == 0))
+when we try to set scaling mode Full/Full aspect/Center.
+Add the missing function to amdgpu_connector_vga_get_mode can fix this.
+It also works on dvi connectors because
+amdgpu_connector_dvi_helper_funcs.get_mode use the same method.
+
+Signed-off-by: hongao <hongao@uniontech.com>
+Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
+Cc: stable@vger.kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
+---
+ drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+index b9c11c2b2885..0de66f59adb8 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+@@ -827,6 +827,7 @@ static int amdgpu_connector_vga_get_modes(struct drm_connector *connector)
+ 
+ 	amdgpu_connector_get_edid(connector);
+ 	ret = amdgpu_connector_ddc_get_modes(connector);
++	amdgpu_get_native_mode(connector);
+ 
+ 	return ret;
+ }
+-- 
+2.30.2
+
diff --git a/patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch b/patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch
new file mode 100644
index 0000000..8e638ae
--- /dev/null
+++ b/patches/kernel/0014-drm-amd-pm-avoid-duplicate-powergate-ungate-setting.patch
@@ -0,0 +1,119 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Evan Quan <evan.quan@amd.com>
+Date: Fri, 5 Nov 2021 15:25:30 +0800
+Subject: [PATCH] drm/amd/pm: avoid duplicate powergate/ungate setting
+
+BugLink: https://bugs.launchpad.net/bugs/1952579
+
+commit 6ee27ee27ba8b2e725886951ba2d2d87f113bece upstream.
+
+Just bail out if the target IP block is already in the desired
+powergate/ungate state. This can avoid some duplicate settings
+which sometimes may cause unexpected issues.
+
+Link: https://lore.kernel.org/all/YV81vidWQLWvATMM@zn.tnic/
+Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214921
+Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215025
+Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1789
+Fixes: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")
+Signed-off-by: Evan Quan <evan.quan@amd.com>
+Tested-by: Borislav Petkov <bp@suse.de>
+Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
+Cc: stable@vger.kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
+[trivial backport]
+Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
+---
+ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
+ drivers/gpu/drm/amd/include/amd_shared.h   |  3 ++-
+ drivers/gpu/drm/amd/pm/amdgpu_dpm.c        | 10 ++++++++++
+ drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h    |  8 ++++++++
+ 4 files changed, 23 insertions(+), 1 deletion(-)
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+index c1e34aa5925b..96ca42bcfdbf 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+@@ -3387,6 +3387,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
+ 		adev->rmmio_size = pci_resource_len(adev->pdev, 2);
+ 	}
+ 
++	for (i = 0; i < AMD_IP_BLOCK_TYPE_NUM; i++)
++		atomic_set(&adev->pm.pwr_state[i], POWER_STATE_UNKNOWN);
++
+ 	adev->rmmio = ioremap(adev->rmmio_base, adev->rmmio_size);
+ 	if (adev->rmmio == NULL) {
+ 		return -ENOMEM;
+diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd/include/amd_shared.h
+index 257f280d3d53..bd077ea224a4 100644
+--- a/drivers/gpu/drm/amd/include/amd_shared.h
++++ b/drivers/gpu/drm/amd/include/amd_shared.h
+@@ -97,7 +97,8 @@ enum amd_ip_block_type {
+ 	AMD_IP_BLOCK_TYPE_ACP,
+ 	AMD_IP_BLOCK_TYPE_VCN,
+ 	AMD_IP_BLOCK_TYPE_MES,
+-	AMD_IP_BLOCK_TYPE_JPEG
++	AMD_IP_BLOCK_TYPE_JPEG,
++	AMD_IP_BLOCK_TYPE_NUM,
+ };
+ 
+ enum amd_clockgating_state {
+diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+index 03581d5b1836..08362d506534 100644
+--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
++++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+@@ -927,6 +927,13 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, uint32_t block
+ {
+ 	int ret = 0;
+ 	const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
++	enum ip_power_state pwr_state = gate ? POWER_STATE_OFF : POWER_STATE_ON;
++
++	if (atomic_read(&adev->pm.pwr_state[block_type]) == pwr_state) {
++		dev_dbg(adev->dev, "IP block%d already in the target %s state!",
++				block_type, gate ? "gate" : "ungate");
++		return 0;
++	}
+ 
+ 	switch (block_type) {
+ 	case AMD_IP_BLOCK_TYPE_UVD:
+@@ -979,6 +986,9 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, uint32_t block
+ 		break;
+ 	}
+ 
++	if (!ret)
++		atomic_set(&adev->pm.pwr_state[block_type], pwr_state);
++
+ 	return ret;
+ }
+ 
+diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+index 98f1b3d8c1d5..16e3f72d31b9 100644
+--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
++++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+@@ -417,6 +417,12 @@ struct amdgpu_dpm {
+ 	enum amd_dpm_forced_level forced_level;
+ };
+ 
++enum ip_power_state {
++	POWER_STATE_UNKNOWN,
++	POWER_STATE_ON,
++	POWER_STATE_OFF,
++};
++
+ struct amdgpu_pm {
+ 	struct mutex		mutex;
+ 	u32                     current_sclk;
+@@ -451,6 +457,8 @@ struct amdgpu_pm {
+ 	/* Used for I2C access to various EEPROMs on relevant ASICs */
+ 	struct i2c_adapter smu_i2c;
+ 	struct list_head	pm_attr_list;
++
++	atomic_t		pwr_state[AMD_IP_BLOCK_TYPE_NUM];
+ };
+ 
+ #define R600_SSTU_DFLT                               0
+-- 
+2.30.2
+
-- 
2.30.2





^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-12-10  9:24 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-10  9:24 [pve-devel] [PATCH pve-kernel] cherry-pick/backport amd{gpu, _sfh} fixes from ubuntu-jammy Fabian Ebner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal