AMD GPU: Difference between revisions

Updated page to reflect the amdgpu module in nixos 25.05 and unstable. I tried to retain useful information such as what these settings actually do, since they are kind of educational in of themselves.
Smudgebun (talk | contribs)
Blender: Converted to a general section on ROCm and HIP enabling for packages, linked to the Blender page HIP section for Blender-specific information.
 
(5 intermediate revisions by 4 users not shown)
Line 1: Line 1:
This guide is about setting up NixOS to correctly use your AMD Graphics card if it is relatively new (aka, after the GCN architecture).
[https://en.wikipedia.org/wiki/AMDgpu_(Linux_kernel_module) AMDGPU] is an open source graphics driver for AMD Radeon graphics cards. It supports AMD GPUs based on the [https://en.wikipedia.org/wiki/Graphics_Core_Next GCN architecture Graphics Core Next (GCN)] architecture and later, covering hardware released from approximately 2012 onward. This guide is about configuration of NixOS to correctly use AMD GPUs supported by the AMDGPU driver.


== Basic Setup ==
== Basic Setup ==
Line 8: Line 8:
};</syntaxhighlight>There is also the [https://search.nixos.org/options?channel=unstable&query=hardware.amdgpu amdgpu nixos module available for common configuration options], such as enabling opencl, legacy support, overdrive/overclocking and loading during initrd.
};</syntaxhighlight>There is also the [https://search.nixos.org/options?channel=unstable&query=hardware.amdgpu amdgpu nixos module available for common configuration options], such as enabling opencl, legacy support, overdrive/overclocking and loading during initrd.


== Problems ==
== Special Configuration ==
 
The following configurations are only required if you have a specific reason for needing them. They are not expected to be necessary for a typical desktop / gaming setup.
=== Low resolution during initramfs phase ===
If you encounter a low resolution output during early boot phases, you can load the amdgpu module in the initial ramdisk<syntaxhighlight lang="nix">
hardware.amdgpu.initrd.enable = true; # sets boot.initrd.kernelModules = ["amdgpu"];
</syntaxhighlight>
 
=== Dual Monitors ===
 
If you encounter problems having multiple monitors connected to your GPU, adding `video` parameters for each connector to the kernel command line sometimes helps.


For example:
=== AMD iGPU with high amount of RAM (usecase: large language models) ===
The iGPU uses system RAM and has no dedicated VRAM. It can use up to the full available system RAM for example for large LLM models. On many systems its possible to set the amount of VRAM in BIOS: „Auto“ or the lowest amount is enough. The driver knows, to expand with GTT.


Documentation:
* [https://docs.kernel.org/gpu/amdgpu/module-parameters.html amdgpu gttsize]
* [https://www.kernel.org/doc/html/v4.14/gpu/drm-mm.html#the-translation-table-manager-ttm ttm pages_limit and ttm pages_pool]
Example for 128GB system RAM, in this example the LLM can use 120 GB „VRAM/GTT“:
<syntaxhighlight lang="nix">
<syntaxhighlight lang="nix">
boot.kernelParams = [
boot.kernelParams = [
   "video=DP-1:2560x1440@144"
   # The kernel module parameter gttsize is a is deprecated and will be removed in the future.
  "video=DP-2:2560x1440@144"
   options amdgpu gttsize=120000
];
</syntaxhighlight>
 
With the connector names (like `DP-1`), the resolution and frame rate adjusted accordingly.
 
To figure out the connector names, execute the following command while your monitors are connected:
 
<syntaxhighlight lang="bash">
head /sys/class/drm/*/status
</syntaxhighlight>
 
=== System Hang with Vega Graphics (and select GPUs) ===
 
Currently on the latest kernel/mesa (currently 6.13 and 24.3.4 respectively), Vega integrated graphics (and other GPUs like the RX 6600<ref>https://bbs.archlinux.org/viewtopic.php?pid=2224147#p2224147</ref>) will have a possibility to hang due to context-switching between Graphics and Compute.<ref>https://bbs.archlinux.org/viewtopic.php?id=301798</ref> There are currently two sets of patches to choose between stability or speed that can be applied: [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu amdgpu-stable] and [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu-testing amdgpu-testing].
 
See [[Linux Kernel#Patching a single In-tree kernel module]], keep in mind how to make [https://stackoverflow.com/a/23525893 patch diffs from commits from GitHub], and consider this example configuration:<syntaxhighlight lang="nix">
{ config, pkgs, ... }:
let
  amdgpu-kernel-module = pkgs.callPackage ./packages/amdgpu-kernel-module.nix {
    # Make sure the module targets the same kernel as your system is using.
    kernel = config.boot.kernelPackages.kernel;
  };
  # linuxPackages_latest 6.13 (or linuxPackages_zen 6.13)
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch";
    url = "https://github.com/torvalds/linux/compare/ffd294d346d185b70e28b1a28abe367bbfe53c04...SeryogaBrigada:linux:4c55a12d64d769f925ef049dd6a92166f7841453.diff";
    hash = "sha256-q/gWUPmKHFBHp7V15BW4ixfUn1kaeJhgDs0okeOGG9c=";
  };
  /*
  # linuxPackages_zen 6.12
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch-zen";
    url = "https://github.com/zen-kernel/zen-kernel/compare/fd00d197bb0a82b25e28d26d4937f917969012aa...WhiteHusky:zen-kernel:f4c32ca166ad55d7e2bbf9adf121113500f3b42b.diff";
    hash = "sha256-bMT5OqBCyILwspWJyZk0j0c8gbxtcsEI53cQMbhbkL8=";
   };
  */
in
{
  # amdgpu instability with context switching between compute and graphics
  # https://bbs.archlinux.org/viewtopic.php?id=301798
  # side-effects: plymouth fails to show at boot, but does not interfere with booting
  boot.extraModulePackages = [
    (amdgpu-kernel-module.overrideAttrs (_: {
      patches = [
        amdgpu-stability-patch
      ];
    }))
  ];
}
</syntaxhighlight>


=== Sporadic Crashes ===
  # specified as 4KiB pages: 120 GB GTT
 
  options ttm pages_limit=31457280
If getting error messages in <code>dmesg</code> with <code>page fault</code> or <code>GCVM_L2_PROTECTION_FAULT_STATUS</code> it might be from AMD GPU boosting too high without enough voltage
  # specified as 4KiB pages: 60 GB pre-allocated
 
  options ttm page_pool_size=15728640
Use a tool like LACT to increase power usage limit to 15%, undervolt by moderate amount (e.g. -50mV for 7900 XTX) and optionally decrease maximum GPU clock.
];</syntaxhighlight>
 
* https://wiki.gentoo.org/wiki/AMDGPU#Frequent_and_Sporadic_Crashes
* https://gitlab.freedesktop.org/mesa/mesa/-/issues/11532
* https://gitlab.freedesktop.org/drm/amd/-/issues/3067
 
 
== Special Configuration ==
The following configurations are only required if you have a specific reason for needing them. They are not expected to be necessary for a typical desktop / gaming setup.


=== Enable Southern Islands (SI) and Sea Islands (CIK) support (eg. HD 7000/8000) ===
=== Enable Southern Islands (SI) and Sea Islands (CIK) support (eg. HD 7000/8000) ===
Line 130: Line 70:
</syntaxhighlight>
</syntaxhighlight>


==== Blender ====
=== Enabling  ROCm & HIP For Packages ===
Hardware accelerated rendering can be achieved by using the package <syntaxhighlight lang="nix" inline="">blender-hip</syntaxhighlight>.
Whether or not a package is built with ROCm support is controlled by the <code>rocmSupport</code> nixpkgs config variable. As HIP is a component of ROCm, anything that needs HIP support (e.g. Blender) gets that enabled through <code>rocmSupport</code> too.
 
You can set it globally with this line
 
<syntaxhighlight lang="nix">
nixpkgs.config.rocmSupport = true;
</syntaxhighlight>
 
Or override specific packages
 
<syntaxhighlight lang="nix">
environment.systemPackages = with pkgs; [
(ffmpeg-full.override {config.rocmSupport=true;})
pkgsRocm.ffmpeg-full # equivalent to (ffmpeg-full.override {config.rocmSupport=true;}) for packages in ROCm Release attrPaths
];
 
</syntaxhighlight>
 
While most if not all packages that support ROCm should be in the [https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/rocm-modules/release-attrPaths.json ROCm Release attrPaths], and therefore built by Hydra and cached in cache.nixos.org with <code>rocmSupport</code>, enabling it globally still has a slight chance of pointless compiling on your machine.


Currently, you need to [[Linux kernel|use the latest kernel]] for <syntaxhighlight lang="nix" inline="">blender-hip</syntaxhighlight> to work.
For Blender-specific information on setting up HIP support, see: [[Blender#HIP]].


=== OpenCL ===
=== OpenCL ===
Line 189: Line 147:


<syntaxhighlight lang="nix">
<syntaxhighlight lang="nix">
#24.11
hardware.graphics.extraPackages = with pkgs; [
hardware.graphics.extraPackages = with pkgs; [
   amdvlk
   amdvlk
Line 195: Line 152:
# For 32 bit applications  
# For 32 bit applications  
hardware.graphics.extraPackages32 = with pkgs; [
hardware.graphics.extraPackages32 = with pkgs; [
  driversi686Linux.amdvlk
];
#24.05 and below
hardware.opengl.extraPackages = with pkgs; [
  amdvlk
];
# For 32 bit applications
hardware.opengl.extraPackages32 = with pkgs; [
   driversi686Linux.amdvlk
   driversi686Linux.amdvlk
];
];
Line 247: Line 195:
== Troubleshooting ==
== Troubleshooting ==


==== Error: <code>amdgpu: Failed to get gpu_info firmware</code> ====
=== Low resolution during initramfs phase ===
If you encounter a low resolution output during early boot phases, you can load the amdgpu module in the initial ramdisk<syntaxhighlight lang="nix">
hardware.amdgpu.initrd.enable = true; # sets boot.initrd.kernelModules = ["amdgpu"];
</syntaxhighlight>
 
=== Dual Monitors ===
 
If you encounter problems having multiple monitors connected to your GPU, adding `video` parameters for each connector to the kernel command line sometimes helps.
 
For example:
 
<syntaxhighlight lang="nix">
boot.kernelParams = [
  "video=DP-1:2560x1440@144"
  "video=DP-2:2560x1440@144"
];
</syntaxhighlight>
 
With the connector names (like `DP-1`), the resolution and frame rate adjusted accordingly.
 
To figure out the connector names, execute the following command while your monitors are connected:
 
<syntaxhighlight lang="bash">
head /sys/class/drm/*/status
</syntaxhighlight>
 
=== System Hang with Vega Graphics (and select GPUs) ===
 
Currently on the latest kernel/mesa (currently 6.13 and 24.3.4 respectively), Vega integrated graphics (and other GPUs like the RX 6600<ref>https://bbs.archlinux.org/viewtopic.php?pid=2224147#p2224147</ref>) will have a possibility to hang due to context-switching between Graphics and Compute.<ref>https://bbs.archlinux.org/viewtopic.php?id=301798</ref> There are currently two sets of patches to choose between stability or speed that can be applied: [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu amdgpu-stable] and [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu-testing amdgpu-testing].
 
See [[Linux Kernel#Patching a single In-tree kernel module]], keep in mind how to make [https://stackoverflow.com/a/23525893 patch diffs from commits from GitHub], and consider this example configuration:<syntaxhighlight lang="nix">
{ config, pkgs, ... }:
let
  amdgpu-kernel-module = pkgs.callPackage ./packages/amdgpu-kernel-module.nix {
    # Make sure the module targets the same kernel as your system is using.
    kernel = config.boot.kernelPackages.kernel;
  };
  # linuxPackages_latest 6.13 (or linuxPackages_zen 6.13)
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch";
    url = "https://github.com/torvalds/linux/compare/ffd294d346d185b70e28b1a28abe367bbfe53c04...SeryogaBrigada:linux:4c55a12d64d769f925ef049dd6a92166f7841453.diff";
    hash = "sha256-q/gWUPmKHFBHp7V15BW4ixfUn1kaeJhgDs0okeOGG9c=";
  };
  /*
  # linuxPackages_zen 6.12
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch-zen";
    url = "https://github.com/zen-kernel/zen-kernel/compare/fd00d197bb0a82b25e28d26d4937f917969012aa...WhiteHusky:zen-kernel:f4c32ca166ad55d7e2bbf9adf121113500f3b42b.diff";
    hash = "sha256-bMT5OqBCyILwspWJyZk0j0c8gbxtcsEI53cQMbhbkL8=";
  };
  */
in
{
  # amdgpu instability with context switching between compute and graphics
  # https://bbs.archlinux.org/viewtopic.php?id=301798
  # side-effects: plymouth fails to show at boot, but does not interfere with booting
  boot.extraModulePackages = [
    (amdgpu-kernel-module.overrideAttrs (_: {
      patches = [
        amdgpu-stability-patch
      ];
    }))
  ];
}
</syntaxhighlight>
 
=== Sporadic Crashes ===
 
If getting error messages in <code>dmesg</code> with <code>page fault</code> or <code>GCVM_L2_PROTECTION_FAULT_STATUS</code> it might be from AMD GPU boosting too high without enough voltage
 
Use a tool like LACT to increase power usage limit to 15%, undervolt by moderate amount (e.g. -50mV for 7900 XTX) and optionally decrease maximum GPU clock.
 
* https://wiki.gentoo.org/wiki/AMDGPU#Frequent_and_Sporadic_Crashes
* https://gitlab.freedesktop.org/mesa/mesa/-/issues/11532
* https://gitlab.freedesktop.org/drm/amd/-/issues/3067
 
=== Error: <code>amdgpu: Failed to get gpu_info firmware</code> ===
 
Solution:
Solution:
  hardware.firmware = [ pkgs.linux-firmware ];
  hardware.firmware = [ pkgs.linux-firmware ];


=== Links ===
== See Also ==


* https://wiki.archlinux.org/title/AMDGPU
* https://wiki.archlinux.org/title/AMDGPU
* https://wiki.gentoo.org/wiki/AMDGPU
* https://wiki.gentoo.org/wiki/AMDGPU


=== References ===
== References ==


[[Category:Video]]
[[Category:Video]]