AMD GPU: Difference between revisions

Add Troubleshooting section.
Pigs (talk | contribs)
m combine problems section with troubleshooting
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This guide is about setting up NixOS to correctly use your AMD Graphics card if it is relatively new (aka, after the GCN architecture).
[https://en.wikipedia.org/wiki/AMDgpu_(Linux_kernel_module) AMDGPU] is an open source graphics driver for AMD Radeon graphics cards. It supports AMD GPUs based on the [https://en.wikipedia.org/wiki/Graphics_Core_Next GCN architecture Graphics Core Next (GCN)] architecture and later, covering hardware released from approximately 2012 onward. This guide is about configuration of NixOS to correctly use AMD GPUs supported by the AMDGPU driver.


== Basic Setup ==
== Basic Setup ==
Line 6: Line 6:
   enable = true;
   enable = true;
   enable32Bit = true;
   enable32Bit = true;
};</syntaxhighlight>
};</syntaxhighlight>There is also the [https://search.nixos.org/options?channel=unstable&query=hardware.amdgpu amdgpu nixos module available for common configuration options], such as enabling opencl, legacy support, overdrive/overclocking and loading during initrd.


== Problems ==
== Special Configuration ==
 
The following configurations are only required if you have a specific reason for needing them. They are not expected to be necessary for a typical desktop / gaming setup.
=== Dual Monitors ===
 
If you encounter problems having multiple monitors connected to your GPU, adding `video` parameters for each connector to the kernel command line sometimes helps.


For example:
=== Enable Southern Islands (SI) and Sea Islands (CIK) support (eg. HD 7000/8000) ===
The oldest architectures that AMDGPU supports are [[wikipedia:Radeon_HD_7000_series|Southern Islands (SI, i.e. GCN 1)]] and [[wikipedia:Radeon_HD_8000_series|Sea Islands (CIK, i.e. GCN 2)]], but support for them is disabled by default. To use AMDGPU instead of the <code>radeon</code> driver, you can set the legacySupport option in the amdgpu module.<syntaxhighlight lang="nix">
hardware.amdgpu.legacySupport.enable = true;
</syntaxhighlight>This will set the kernel parameters as follows (this is redundant if you set the above option)


<syntaxhighlight lang="nix">
<syntaxhighlight lang="nix">
boot.kernelParams = [
boot.kernelParams = [
  "video=DP-1:2560x1440@144"
    # For Southern Islands (SI i.e. GCN 1) cards
  "video=DP-2:2560x1440@144"
    "amdgpu.si_support=1"
    "radeon.si_support=0"
    # For Sea Islands (CIK i.e. GCN 2) cards
    "amdgpu.cik_support=1"
    "radeon.cik_support=0"
];
];
</syntaxhighlight>
</syntaxhighlight>


With the connector names (like `DP-1`), the resolution and frame rate adjusted accordingly.
Doing this is required to use [[#Vulkan|Vulkan]] on these cards, as the <code>radeon</code> driver doesn't support it.
 
To figure out the connector names, execute the following command while your monitors are connected:
 
<syntaxhighlight lang="bash">
head /sys/class/drm/*/status
</syntaxhighlight>
 
=== System Hang with Vega Graphics (and select GPUs) ===
 
Currently on the latest kernel/mesa (currently 6.13 and 24.3.4 respectively), Vega integrated graphics (and other GPUs like the RX 6600<ref>https://bbs.archlinux.org/viewtopic.php?pid=2224147#p2224147</ref>) will have a possibility to hang due to context-switching between Graphics and Compute.<ref>https://bbs.archlinux.org/viewtopic.php?id=301798</ref> There are currently two sets of patches to choose between stability or speed that can be applied: [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu amdgpu-stable] and [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu-testing amdgpu-testing].
 
See [[Linux Kernel#Patching a single In-tree kernel module]], keep in mind how to make [https://stackoverflow.com/a/23525893 patch diffs from commits from GitHub], and consider this example configuration:<syntaxhighlight lang="nix">
{ config, pkgs, ... }:
let
  amdgpu-kernel-module = pkgs.callPackage ./packages/amdgpu-kernel-module.nix {
    # Make sure the module targets the same kernel as your system is using.
    kernel = config.boot.kernelPackages.kernel;
  };
  # linuxPackages_latest 6.13 (or linuxPackages_zen 6.13)
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch";
    url = "https://github.com/torvalds/linux/compare/ffd294d346d185b70e28b1a28abe367bbfe53c04...SeryogaBrigada:linux:4c55a12d64d769f925ef049dd6a92166f7841453.diff";
    hash = "sha256-q/gWUPmKHFBHp7V15BW4ixfUn1kaeJhgDs0okeOGG9c=";
  };
  /*
  # linuxPackages_zen 6.12
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch-zen";
    url = "https://github.com/zen-kernel/zen-kernel/compare/fd00d197bb0a82b25e28d26d4937f917969012aa...WhiteHusky:zen-kernel:f4c32ca166ad55d7e2bbf9adf121113500f3b42b.diff";
    hash = "sha256-bMT5OqBCyILwspWJyZk0j0c8gbxtcsEI53cQMbhbkL8=";
  };
  */
in
{
  # amdgpu instability with context switching between compute and graphics
  # https://bbs.archlinux.org/viewtopic.php?id=301798
  # side-effects: plymouth fails to show at boot, but does not interfere with booting
  boot.extraModulePackages = [
    (amdgpu-kernel-module.overrideAttrs (_: {
      patches = [
        amdgpu-stability-patch
      ];
    }))
  ];
}
</syntaxhighlight>
 
=== Sporadic Crashes ===
 
If getting error messages in <code>dmesg</code> with <code>page fault</code> or <code>GCVM_L2_PROTECTION_FAULT_STATUS</code> it might be from AMD GPU boosting too high without enough voltage
 
Use a tool like LACT to increase power usage limit to 15%, undervolt by moderate amount (e.g. -50mV for 7900 XTX) and optionally decrease maximum GPU clock.


* https://wiki.gentoo.org/wiki/AMDGPU#Frequent_and_Sporadic_Crashes
Please note this also removes support for analog video outputs, which is only available with the <code>radeon</code> driver.
* https://gitlab.freedesktop.org/mesa/mesa/-/issues/11532
* https://gitlab.freedesktop.org/drm/amd/-/issues/3067
 
 
== Special Configuration ==
The following configurations are only required if you have a specific reason for needing them. They are not expected to be necessary for a typical desktop / gaming setup.
 
=== Enable Southern Islands (SI) and Sea Islands (CIK) support ===
The oldest architectures that AMDGPU supports are [[wikipedia:Radeon_HD_7000_series|Southern Islands (SI, i.e. GCN 1)]] and [[wikipedia:Radeon_HD_8000_series|Sea Islands (CIK, i.e. GCN 2)]], but support for them is disabled by default. To use AMDGPU instead of the <code>radeon</code> driver, you can set the kernel parameters:
 
<syntaxhighlight lang="nix">
# For Southern Islands (SI i.e. GCN 1) cards
boot.kernelParams = [ "radeon.si_support=0" "amdgpu.si_support=1" ];
# For Sea Islands (CIK i.e. GCN 2) cards
boot.kernelParams = [ "radeon.cik_support=0" "amdgpu.cik_support=1" ];
</syntaxhighlight>
 
Doing this is required to use [[#Vulkan|Vulkan]] on these cards, as the <code>radeon</code> driver doesn't support it.


=== HIP ===
=== HIP ===
Line 123: Line 57:


=== OpenCL ===
=== OpenCL ===
<syntaxhighlight lang="nix">
OpenCL support using the ROCM runtime library can be enabled via the amdgpu module.<syntaxhighlight lang="nix">
hardware.graphics.extraPackages = with pkgs; [ rocmPackages.clr.icd ];
hardware.amdgpu.opencl.enable = true;
</syntaxhighlight>
</syntaxhighlight>


Line 234: Line 168:
== Troubleshooting ==
== Troubleshooting ==


==== Error: <code>amdgpu: Failed to get gpu_info firmware</code> ====
=== Low resolution during initramfs phase ===
If you encounter a low resolution output during early boot phases, you can load the amdgpu module in the initial ramdisk<syntaxhighlight lang="nix">
hardware.amdgpu.initrd.enable = true; # sets boot.initrd.kernelModules = ["amdgpu"];
</syntaxhighlight>
 
=== Dual Monitors ===
 
If you encounter problems having multiple monitors connected to your GPU, adding `video` parameters for each connector to the kernel command line sometimes helps.
 
For example:
 
<syntaxhighlight lang="nix">
boot.kernelParams = [
  "video=DP-1:2560x1440@144"
  "video=DP-2:2560x1440@144"
];
</syntaxhighlight>
 
With the connector names (like `DP-1`), the resolution and frame rate adjusted accordingly.
 
To figure out the connector names, execute the following command while your monitors are connected:
 
<syntaxhighlight lang="bash">
head /sys/class/drm/*/status
</syntaxhighlight>
 
=== System Hang with Vega Graphics (and select GPUs) ===
 
Currently on the latest kernel/mesa (currently 6.13 and 24.3.4 respectively), Vega integrated graphics (and other GPUs like the RX 6600<ref>https://bbs.archlinux.org/viewtopic.php?pid=2224147#p2224147</ref>) will have a possibility to hang due to context-switching between Graphics and Compute.<ref>https://bbs.archlinux.org/viewtopic.php?id=301798</ref> There are currently two sets of patches to choose between stability or speed that can be applied: [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu amdgpu-stable] and [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu-testing amdgpu-testing].
 
See [[Linux Kernel#Patching a single In-tree kernel module]], keep in mind how to make [https://stackoverflow.com/a/23525893 patch diffs from commits from GitHub], and consider this example configuration:<syntaxhighlight lang="nix">
{ config, pkgs, ... }:
let
  amdgpu-kernel-module = pkgs.callPackage ./packages/amdgpu-kernel-module.nix {
    # Make sure the module targets the same kernel as your system is using.
    kernel = config.boot.kernelPackages.kernel;
  };
  # linuxPackages_latest 6.13 (or linuxPackages_zen 6.13)
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch";
    url = "https://github.com/torvalds/linux/compare/ffd294d346d185b70e28b1a28abe367bbfe53c04...SeryogaBrigada:linux:4c55a12d64d769f925ef049dd6a92166f7841453.diff";
    hash = "sha256-q/gWUPmKHFBHp7V15BW4ixfUn1kaeJhgDs0okeOGG9c=";
  };
  /*
  # linuxPackages_zen 6.12
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch-zen";
    url = "https://github.com/zen-kernel/zen-kernel/compare/fd00d197bb0a82b25e28d26d4937f917969012aa...WhiteHusky:zen-kernel:f4c32ca166ad55d7e2bbf9adf121113500f3b42b.diff";
    hash = "sha256-bMT5OqBCyILwspWJyZk0j0c8gbxtcsEI53cQMbhbkL8=";
  };
  */
in
{
  # amdgpu instability with context switching between compute and graphics
  # https://bbs.archlinux.org/viewtopic.php?id=301798
  # side-effects: plymouth fails to show at boot, but does not interfere with booting
  boot.extraModulePackages = [
    (amdgpu-kernel-module.overrideAttrs (_: {
      patches = [
        amdgpu-stability-patch
      ];
    }))
  ];
}
</syntaxhighlight>
 
=== Sporadic Crashes ===
 
If getting error messages in <code>dmesg</code> with <code>page fault</code> or <code>GCVM_L2_PROTECTION_FAULT_STATUS</code> it might be from AMD GPU boosting too high without enough voltage
 
Use a tool like LACT to increase power usage limit to 15%, undervolt by moderate amount (e.g. -50mV for 7900 XTX) and optionally decrease maximum GPU clock.
 
* https://wiki.gentoo.org/wiki/AMDGPU#Frequent_and_Sporadic_Crashes
* https://gitlab.freedesktop.org/mesa/mesa/-/issues/11532
* https://gitlab.freedesktop.org/drm/amd/-/issues/3067
 
=== Error: <code>amdgpu: Failed to get gpu_info firmware</code> ===
 
Solution:
Solution:
  hardware.firmware = [ pkgs.linux-firmware ];
  hardware.firmware = [ pkgs.linux-firmware ];


=== Links ===
== See Also ==


* https://wiki.archlinux.org/title/AMDGPU
* https://wiki.archlinux.org/title/AMDGPU
* https://wiki.gentoo.org/wiki/AMDGPU
* https://wiki.gentoo.org/wiki/AMDGPU


=== References ===
== References ==


[[Category:Video]]
[[Category:Video]]