AMD GPU: Difference between revisions

Restored revision 19852 from User:CarlenWhite (Reason: Another user seems to have mistakenly reverted the page to the nixos.wiki version, which is out of date and contains incorrect information)
Tag: Manual revert
Klinger (talk | contribs)
Information about iGPU usage of VRAM/GTT for large amounts of VRAM/GTT for LLM (source: based on a forum comment https://community.frame.work/t/igpu-vram-how-much-can-be-assigned/73081/6 of https://community.frame.work/u/lhl/summary )
 
(13 intermediate revisions by 10 users not shown)
Line 1: Line 1:
This guide is about setting up NixOS to correctly use your AMD Graphics card if it is relatively new (aka, after the GCN architecture).
[https://en.wikipedia.org/wiki/AMDgpu_(Linux_kernel_module) AMDGPU] is an open source graphics driver for AMD Radeon graphics cards. It supports AMD GPUs based on the [https://en.wikipedia.org/wiki/Graphics_Core_Next GCN architecture Graphics Core Next (GCN)] architecture and later, covering hardware released from approximately 2012 onward. This guide is about configuration of NixOS to correctly use AMD GPUs supported by the AMDGPU driver.


== Basic Setup ==
== Basic Setup ==
For ordinary desktop / gaming usage, AMD GPUs are expected to work out of the box. As with any desktop configuration though, graphics acceleration does need to be enabled.
For ordinary desktop / gaming usage, AMD GPUs are expected to work out of the box. As with any desktop configuration though, graphics acceleration does need to be enabled.
<syntaxhighlight lang="nix">
<syntaxhighlight lang="nix">hardware.graphics = {
# as of 24.11
hardware.graphics = {
   enable = true;
   enable = true;
   enable32Bit = true;
   enable32Bit = true;
};
};</syntaxhighlight>There is also the [https://search.nixos.org/options?channel=unstable&query=hardware.amdgpu amdgpu nixos module available for common configuration options], such as enabling opencl, legacy support, overdrive/overclocking and loading during initrd.


# pre 24.11
== Special Configuration ==
hardware.opengl = {
The following configurations are only required if you have a specific reason for needing them. They are not expected to be necessary for a typical desktop / gaming setup.
  enable = true;
  driSupport = true;
  driSupport32Bit = true;
};


</syntaxhighlight>
=== AMD iGPU with high amount of RAM (usecase: large language models) ===
The iGPU uses system RAM and has no dedicated VRAM. It can use up to the full available system RAM for example for large LLM models. On many systems its possible to set the amount of VRAM in BIOS: „Auto“ or the lowest amount is enough. The driver knows, to expand with GTT.


== Problems ==
Documentation:
* [https://docs.kernel.org/gpu/amdgpu/module-parameters.html amdgpu gttsize]
* [https://www.kernel.org/doc/html/v4.14/gpu/drm-mm.html#the-translation-table-manager-ttm ttm pages_limit and ttm pages_pool]
Example for 128GB system RAM, in this example the LLM can use 120 GB „VRAM/GTT“:
<syntaxhighlight lang="nix">
boot.kernelParams = [
  # The kernel module parameter gttsize is a is deprecated and will be removed in the future.
  options amdgpu gttsize=120000


=== Dual Monitors ===
  # specified as 4KiB pages: 120 GB GTT
  options ttm pages_limit=31457280
  # specified as 4KiB pages: 60 GB pre-allocated
  options ttm page_pool_size=15728640
];</syntaxhighlight>


If you encounter problems having multiple monitors connected to your GPU, adding `video` parameters for each connector to the kernel command line sometimes helps.
=== Enable Southern Islands (SI) and Sea Islands (CIK) support (eg. HD 7000/8000) ===
 
The oldest architectures that AMDGPU supports are [[wikipedia:Radeon_HD_7000_series|Southern Islands (SI, i.e. GCN 1)]] and [[wikipedia:Radeon_HD_8000_series|Sea Islands (CIK, i.e. GCN 2)]], but support for them is disabled by default. To use AMDGPU instead of the <code>radeon</code> driver, you can set the legacySupport option in the amdgpu module.<syntaxhighlight lang="nix">
For example:
hardware.amdgpu.legacySupport.enable = true;
</syntaxhighlight>This will set the kernel parameters as follows (this is redundant if you set the above option)


<syntaxhighlight lang="nix">
<syntaxhighlight lang="nix">
boot.kernelParams = [
boot.kernelParams = [
  "video=DP-1:2560x1440@144"
    # For Southern Islands (SI i.e. GCN 1) cards
  "video=DP-2:2560x1440@144"
    "amdgpu.si_support=1"
    "radeon.si_support=0"
    # For Sea Islands (CIK i.e. GCN 2) cards
    "amdgpu.cik_support=1"
    "radeon.cik_support=0"
];
];
</syntaxhighlight>
</syntaxhighlight>


With the connector names (like `DP-1`), the resolution and frame rate adjusted accordingly.
Doing this is required to use [[#Vulkan|Vulkan]] on these cards, as the <code>radeon</code> driver doesn't support it.
 
To figure out the connector names, execute the following command while your monitors are connected:
 
<syntaxhighlight lang="bash">
head /sys/class/drm/*/status
</syntaxhighlight>


=== System Hang with Vega Graphics (and select GPUs) ===
Please note this also removes support for analog video outputs, which is only available with the <code>radeon</code> driver.
Currently on the latest kernel/mesa (currently 6.13 and 24.3.4 respectively), Vega integrated graphics (and other GPUs like the RX 6600<ref>https://bbs.archlinux.org/viewtopic.php?pid=2224147#p2224147</ref>) will have a possibility to hang due to context-switching between Graphics and Compute.<ref>https://bbs.archlinux.org/viewtopic.php?id=301798</ref> There are currently two sets of patches to choose between stability or speed that can be applied: [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu amdgpu-stable] and [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu-testing amdgpu-testing].
 
See [[Linux Kernel#Patching a single In-tree kernel module]], keep in mind how to make [https://stackoverflow.com/a/23525893 patch diffs from commits from GitHub], and consider this example configuration:<syntaxhighlight lang="nix">
{ config, pkgs, ... }:
let
  amdgpu-kernel-module = pkgs.callPackage ./packages/amdgpu-kernel-module.nix {
    # Make sure the module targets the same kernel as your system is using.
    kernel = config.boot.kernelPackages.kernel;
  };
  # linuxPackages_latest 6.13 (or linuxPackages_zen 6.13)
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch";
    url = "https://github.com/torvalds/linux/compare/ffd294d346d185b70e28b1a28abe367bbfe53c04...SeryogaBrigada:linux:4c55a12d64d769f925ef049dd6a92166f7841453.diff";
    hash = "sha256-q/gWUPmKHFBHp7V15BW4ixfUn1kaeJhgDs0okeOGG9c=";
  };
  /*
  # linuxPackages_zen 6.12
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch-zen";
    url = "https://github.com/zen-kernel/zen-kernel/compare/fd00d197bb0a82b25e28d26d4937f917969012aa...WhiteHusky:zen-kernel:f4c32ca166ad55d7e2bbf9adf121113500f3b42b.diff";
    hash = "sha256-bMT5OqBCyILwspWJyZk0j0c8gbxtcsEI53cQMbhbkL8=";
  };
  */
in
{
  # amdgpu instability with context switching between compute and graphics
  # https://bbs.archlinux.org/viewtopic.php?id=301798
  # side-effects: plymouth fails to show at boot, but does not interfere with booting
  boot.extraModulePackages = [
    (amdgpu-kernel-module.overrideAttrs (_: {
      patches = [
        amdgpu-stability-patch
      ];
    }))
  ];
}
</syntaxhighlight>
 
== Special Configuration ==
The following configurations are only required if you have a specific reason for needing them. They are not expected to be necessary for a typical desktop / gaming setup.
 
=== Enable Southern Islands (SI) and Sea Islands (CIK) support ===
The oldest architectures that AMDGPU supports are [[wikipedia:Radeon_HD_7000_series|Southern Islands (SI, i.e. GCN 1)]] and [[wikipedia:Radeon_HD_8000_series|Sea Islands (CIK, i.e. GCN 2)]], but support for them is disabled by default. To use AMDGPU instead of the <code>radeon</code> driver, you can set the kernel parameters:
 
<syntaxhighlight lang="nix">
# For Southern Islands (SI i.e. GCN 1) cards
boot.kernelParams = [ "radeon.si_support=0" "amdgpu.si_support=1" ];
# For Sea Islands (CIK i.e. GCN 2) cards
boot.kernelParams = [ "radeon.cik_support=0" "amdgpu.cik_support=1" ];
</syntaxhighlight>
 
Doing this is required to use [[#Vulkan|Vulkan]] on these cards, as the <code>radeon</code> driver doesn't support it.


=== HIP ===
=== HIP ===
Line 122: Line 76:


=== OpenCL ===
=== OpenCL ===
<syntaxhighlight lang="nix">
OpenCL support using the ROCM runtime library can be enabled via the amdgpu module.<syntaxhighlight lang="nix">
hardware.graphics.extraPackages = with pkgs; [ rocmPackages.clr.icd ];
hardware.amdgpu.opencl.enable = true;
</syntaxhighlight>
</syntaxhighlight>


Line 162: Line 116:


=== Vulkan ===
=== Vulkan ===
Vulkan is already enabled by default (using Mesa RADV) on 64 bit applications. The settings to control it are:
Vulkan is already enabled by default (using Mesa RADV) on 64 bit applications. The settings to control it are now the same as those shown in the basic setup:


<syntaxhighlight lang="nix">
<syntaxhighlight lang="nix">hardware.graphics.enable = true;
hardware.opengl.driSupport = true; # This is already enabled by default
hardware.graphics.enable32Bit = true; # Replaced 'driSupport32Bit'
hardware.opengl.driSupport32Bit = true; # For 32 bit applications
</syntaxhighlight>
</syntaxhighlight>


==== AMDVLK ====
==== AMDVLK ====
The AMDVLK drivers can be used in addition to the Mesa RADV drivers. The program will choose which one to use:
 
{{Warning|1=AMDVLK is being discontinued, so the only reason you may want to use it is if you have issues with RADV. See: https://github.com/GPUOpen-Drivers/AMDVLK/discussions/416.}}
 
The AMDVLK drivers can be used in addition to the Mesa RADV drivers. The program will choose which one to use (mostly defaulting to AMDVLK):


<syntaxhighlight lang="nix">
<syntaxhighlight lang="nix">
#24.11
hardware.graphics.extraPackages = with pkgs; [
hardware.graphics.extraPackages = with pkgs; [
   amdvlk
   amdvlk
Line 181: Line 136:
   driversi686Linux.amdvlk
   driversi686Linux.amdvlk
];
];
</syntaxhighlight>
More information can be found here: https://nixos.org/manual/nixos/unstable/index.html#sec-gpu-accel-vulkan
One way to avoid installing it system-wide, is creating a wrapper like this:


#24.05 and below
<syntaxhighlight lang="nix">
hardware.opengl.extraPackages = with pkgs; [
with pkgs; writeShellScriptBin "amdvlk-run" ''
   amdvlk
   export VK_DRIVER_FILES="${amdvlk}/share/vulkan/icd.d/amd_icd64.json:${pkgsi686Linux.amdvlk}/share/vulkan/icd.d/amd_icd32.json"
];
   exec "$@"
# For 32 bit applications
'';
hardware.opengl.extraPackages32 = with pkgs; [
   driversi686Linux.amdvlk
];
</syntaxhighlight>
</syntaxhighlight>


More information can be found here: https://nixos.org/manual/nixos/unstable/index.html#sec-gpu-accel-vulkan
This wrapper can be used per-game inside Steam (<syntaxhighlight lang="sh" inline>amdvlk-run %command%</syntaxhighlight>) and lets RADV to be the system's default.
 
===== Performance Issues with AMDVLK =====
Some games choose AMDVLK over RADV, which can cause noticeable performance issues (e.g. <50% less FPS in games)
 
To force RADV<syntaxhighlight lang="nix">
environment.variables.AMD_VULKAN_ICD = "RADV";
</syntaxhighlight>


=== GUI tools ===
=== GUI tools ===


==== LACT - Linux AMDGPU Controller ====
==== LACT - Linux AMDGPU Controller ====
This application allows you to overclock, undervolt, set fans curves of AMD GPUs on a Linux system.
This application allows you to overclock, undervolt, set fans curves of AMD GPUs on a Linux system.


In order to install the daemon service you need to add the package to <code>systemd.packages</code>. Also the <code>wantedBy</code> field should be set to <code>multi-user.target</code> to start the service during boot.
In order to install the daemon service you need to add the package to <code>systemd.packages</code>. Also the <code>wantedBy</code> field should be set to <code>multi-user.target</code> to start the service during boot.
<syntaxhighlight lang="nix">
<syntaxhighlight lang="nix">
environment.systemPackages = with pkgs; [ lact ];
environment.systemPackages = with pkgs; [ lact ];
systemd.packages = with pkgs; [ lact ];
systemd.packages = with pkgs; [ lact ];
systemd.services.lactd.wantedBy = ["multi-user.target"];
systemd.services.lactd.wantedBy = ["multi-user.target"];
</syntaxhighlight>Simple version is:<syntaxhighlight lang="nix">
services.lact.enable = true;
# https://github.com/NixOS/nixpkgs/blob/nixos-unstable/nixos/modules/services/hardware/lact.nix
</syntaxhighlight>
== Troubleshooting ==
=== Low resolution during initramfs phase ===
If you encounter a low resolution output during early boot phases, you can load the amdgpu module in the initial ramdisk<syntaxhighlight lang="nix">
hardware.amdgpu.initrd.enable = true; # sets boot.initrd.kernelModules = ["amdgpu"];
</syntaxhighlight>
=== Dual Monitors ===
If you encounter problems having multiple monitors connected to your GPU, adding `video` parameters for each connector to the kernel command line sometimes helps.
For example:
<syntaxhighlight lang="nix">
boot.kernelParams = [
  "video=DP-1:2560x1440@144"
  "video=DP-2:2560x1440@144"
];
</syntaxhighlight>
With the connector names (like `DP-1`), the resolution and frame rate adjusted accordingly.
To figure out the connector names, execute the following command while your monitors are connected:
<syntaxhighlight lang="bash">
head /sys/class/drm/*/status
</syntaxhighlight>
=== System Hang with Vega Graphics (and select GPUs) ===
Currently on the latest kernel/mesa (currently 6.13 and 24.3.4 respectively), Vega integrated graphics (and other GPUs like the RX 6600<ref>https://bbs.archlinux.org/viewtopic.php?pid=2224147#p2224147</ref>) will have a possibility to hang due to context-switching between Graphics and Compute.<ref>https://bbs.archlinux.org/viewtopic.php?id=301798</ref> There are currently two sets of patches to choose between stability or speed that can be applied: [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu amdgpu-stable] and [https://github.com/SeryogaBrigada/linux/commits/v6.13-amdgpu-testing amdgpu-testing].
See [[Linux Kernel#Patching a single In-tree kernel module]], keep in mind how to make [https://stackoverflow.com/a/23525893 patch diffs from commits from GitHub], and consider this example configuration:<syntaxhighlight lang="nix">
{ config, pkgs, ... }:
let
  amdgpu-kernel-module = pkgs.callPackage ./packages/amdgpu-kernel-module.nix {
    # Make sure the module targets the same kernel as your system is using.
    kernel = config.boot.kernelPackages.kernel;
  };
  # linuxPackages_latest 6.13 (or linuxPackages_zen 6.13)
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch";
    url = "https://github.com/torvalds/linux/compare/ffd294d346d185b70e28b1a28abe367bbfe53c04...SeryogaBrigada:linux:4c55a12d64d769f925ef049dd6a92166f7841453.diff";
    hash = "sha256-q/gWUPmKHFBHp7V15BW4ixfUn1kaeJhgDs0okeOGG9c=";
  };
  /*
  # linuxPackages_zen 6.12
  amdgpu-stability-patch = pkgs.fetchpatch {
    name = "amdgpu-stability-patch-zen";
    url = "https://github.com/zen-kernel/zen-kernel/compare/fd00d197bb0a82b25e28d26d4937f917969012aa...WhiteHusky:zen-kernel:f4c32ca166ad55d7e2bbf9adf121113500f3b42b.diff";
    hash = "sha256-bMT5OqBCyILwspWJyZk0j0c8gbxtcsEI53cQMbhbkL8=";
  };
  */
in
{
  # amdgpu instability with context switching between compute and graphics
  # https://bbs.archlinux.org/viewtopic.php?id=301798
  # side-effects: plymouth fails to show at boot, but does not interfere with booting
  boot.extraModulePackages = [
    (amdgpu-kernel-module.overrideAttrs (_: {
      patches = [
        amdgpu-stability-patch
      ];
    }))
  ];
}
</syntaxhighlight>
</syntaxhighlight>
=== Sporadic Crashes ===
If getting error messages in <code>dmesg</code> with <code>page fault</code> or <code>GCVM_L2_PROTECTION_FAULT_STATUS</code> it might be from AMD GPU boosting too high without enough voltage
Use a tool like LACT to increase power usage limit to 15%, undervolt by moderate amount (e.g. -50mV for 7900 XTX) and optionally decrease maximum GPU clock.
* https://wiki.gentoo.org/wiki/AMDGPU#Frequent_and_Sporadic_Crashes
* https://gitlab.freedesktop.org/mesa/mesa/-/issues/11532
* https://gitlab.freedesktop.org/drm/amd/-/issues/3067
=== Error: <code>amdgpu: Failed to get gpu_info firmware</code> ===
Solution:
hardware.firmware = [ pkgs.linux-firmware ];
== See Also ==
* https://wiki.archlinux.org/title/AMDGPU
* https://wiki.gentoo.org/wiki/AMDGPU
== References ==
[[Category:Video]]
[[Category:Video]]