Llama-cpp: Difference between revisions

Woile (talk | contribs)
Notes about AMD ROCm
Woile (talk | contribs)
improve styles and added troubleshooting
Line 19: Line 19:
==== in NixOS ====
==== in NixOS ====


After enable Unfree software in NixOS add CUDA to your packages
After enable Unfree software in NixOS add CUDA to your packages<syntaxhighlight lang="nixos">
 
{
<pre>
  environment.systemPackages = [
environment.systemPackages = [
    (pkgs.llama-cpp.override { cudaSupport = true; })
  (pkgs.llama-cpp.override { cudaSupport = true; })
  ];
];
}
</pre>
</syntaxhighlight>And do a switch to the new configuration
 
And do a switch to the new configuration
  sudo nixos-rebuild switch
  sudo nixos-rebuild switch


==== in a shell ====
==== in a shell ====


If you want take the CUDA package for a spin, before adding it to your system, you can open it in a shell:
If you want take the CUDA package for a spin, before adding it to your system, you can open it in a shell:<syntaxhighlight lang="bash">
 
<pre>
export NIXPKGS_ALLOW_UNFREE=1
export NIXPKGS_ALLOW_UNFREE=1
nix shell --impure --expr '(import (builtins.getFlake "nixpkgs") {}).llama-cpp.override { cudaSupport = true; }'
nix shell --impure --expr '(import (builtins.getFlake "nixpkgs") {}).llama-cpp.override { cudaSupport = true; }'
</pre>
</syntaxhighlight>


=== BLAS Support ===
=== BLAS Support ===


BLAS support is automatically enabled if none of the GPU accelerators are enabled. You can still manually enable it in your nix configuration by doing:
BLAS support is automatically enabled if none of the GPU accelerators are enabled. You can still manually enable it in your nix configuration by doing:<syntaxhighlight lang="nixos">
 
{
<pre>
  environment.systemPackages = [
environment.systemPackages = [
    (pkgs.llama-cpp.override { blasSupport = true; })
  (pkgs.llama-cpp.override { blasSupport = true; })
  ];
];
}
</pre>
</syntaxhighlight>


=== AMD ROCm ===
=== AMD ROCm ===
Line 143: Line 139:
Once you've made <code>llama-cpp</code> available in your system. You can use <code>llama-cli</code>, which is a straightforward to use tool.  
Once you've made <code>llama-cpp</code> available in your system. You can use <code>llama-cli</code>, which is a straightforward to use tool.  


In your shell:
In your shell:<syntaxhighlight lang="bash">
 
llama-cli \  
<pre>
  -hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \  
llama-cli \
  --temp 1.0 --top-p 0.95 --top-k 40 \  
    -hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \
  -p "briefly explain journalctl in one paragraph"
    --temp 1.0 --top-p 0.95 --top-k 40 \
</syntaxhighlight>
    -p "briefly explain journalctl in one paragraph"
</pre>


== llama-server ==
== llama-server ==
Line 156: Line 150:
<code>llama-server</code> runs a server, and it can run models on demand. It's quite similar to [[Ollama]].
<code>llama-server</code> runs a server, and it can run models on demand. It's quite similar to [[Ollama]].


You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, but we are going to see the integration with NixOS as a service.
You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, but we are going to see the integration with NixOS as a service.<syntaxhighlight lang="nixos">
services.llama-cpp = {
{
  enable = true;
  services.llama-cpp = {
  package = pkgs.llama-cpp-vulkan;
    enable = true;
    package = pkgs.llama-cpp-vulkan;
  # Takes care of downloading if model not present
    # Takes care of downloading if model not present
  modelsPreset = {
    modelsPreset = {
    "Qwen3-Coder-Next" = {
      "Qwen3-Coder-Next" = {
      hf-repo = "unsloth/Qwen3-Coder-Next-GGUF";
        hf-repo = "unsloth/Qwen3-Coder-Next-GGUF";
      hf-file = "Qwen3-Coder-Next-UD-Q4_K_XL.gguf";
        hf-file = "Qwen3-Coder-Next-UD-Q4_K_XL.gguf";
      alias = "unsloth/Qwen3-Coder-Next";
        alias = "unsloth/Qwen3-Coder-Next";
      temp = "1.0";
        temp = "1.0";
      top-p = "0.95";
        top-p = "0.95";
      top-k = "40";
        top-k = "40";
    };
      };
  };
    };
};
  };
And do a switch to the new configuration
}
 
</syntaxhighlight>And do a switch to the new configuration


<pre>
<pre>
sudo nixos-rebuild switch
sudo nixos-rebuild switch
</pre>
</pre>
=== Troubleshooting ===
==== Failed to create //.cache for shader cache ====
This is a known issue ([https://github.com/NixOS/nixpkgs/issues/441531 441531]), until it gets fixed, you can add to your conf:<syntaxhighlight lang="nix">
systemd.services.llama-cpp = {
  environment = {
    XDG_CACHE_HOME = "/var/cache/llama-cpp";
    MESA_SHADER_CACHE_DIR = "/var/cache/llama-cpp";
  };
};
</syntaxhighlight>