Llama-cpp: Difference between revisions
Notes about AMD ROCm |
improve styles and added troubleshooting |
||
| Line 19: | Line 19: | ||
==== in NixOS ==== | ==== in NixOS ==== | ||
After enable Unfree software in NixOS add CUDA to your packages | After enable Unfree software in NixOS add CUDA to your packages<syntaxhighlight lang="nixos"> | ||
{ | |||
< | environment.systemPackages = [ | ||
environment.systemPackages = [ | (pkgs.llama-cpp.override { cudaSupport = true; }) | ||
]; | |||
]; | } | ||
</ | </syntaxhighlight>And do a switch to the new configuration | ||
And do a switch to the new configuration | |||
sudo nixos-rebuild switch | sudo nixos-rebuild switch | ||
==== in a shell ==== | ==== in a shell ==== | ||
If you want take the CUDA package for a spin, before adding it to your system, you can open it in a shell: | If you want take the CUDA package for a spin, before adding it to your system, you can open it in a shell:<syntaxhighlight lang="bash"> | ||
< | |||
export NIXPKGS_ALLOW_UNFREE=1 | export NIXPKGS_ALLOW_UNFREE=1 | ||
nix shell --impure --expr '(import (builtins.getFlake "nixpkgs") {}).llama-cpp.override { cudaSupport = true; }' | nix shell --impure --expr '(import (builtins.getFlake "nixpkgs") {}).llama-cpp.override { cudaSupport = true; }' | ||
</ | </syntaxhighlight> | ||
=== BLAS Support === | === BLAS Support === | ||
BLAS support is automatically enabled if none of the GPU accelerators are enabled. You can still manually enable it in your nix configuration by doing: | BLAS support is automatically enabled if none of the GPU accelerators are enabled. You can still manually enable it in your nix configuration by doing:<syntaxhighlight lang="nixos"> | ||
{ | |||
< | environment.systemPackages = [ | ||
environment.systemPackages = [ | (pkgs.llama-cpp.override { blasSupport = true; }) | ||
]; | |||
]; | } | ||
</ | </syntaxhighlight> | ||
=== AMD ROCm === | === AMD ROCm === | ||
| Line 143: | Line 139: | ||
Once you've made <code>llama-cpp</code> available in your system. You can use <code>llama-cli</code>, which is a straightforward to use tool. | Once you've made <code>llama-cpp</code> available in your system. You can use <code>llama-cli</code>, which is a straightforward to use tool. | ||
In your shell: | In your shell:<syntaxhighlight lang="bash"> | ||
llama-cli \ | |||
< | -hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \ | ||
llama-cli \ | --temp 1.0 --top-p 0.95 --top-k 40 \ | ||
-p "briefly explain journalctl in one paragraph" | |||
</syntaxhighlight> | |||
</ | |||
== llama-server == | == llama-server == | ||
| Line 156: | Line 150: | ||
<code>llama-server</code> runs a server, and it can run models on demand. It's quite similar to [[Ollama]]. | <code>llama-server</code> runs a server, and it can run models on demand. It's quite similar to [[Ollama]]. | ||
You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, but we are going to see the integration with NixOS as a service. | You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, but we are going to see the integration with NixOS as a service.<syntaxhighlight lang="nixos"> | ||
{ | |||
services.llama-cpp = { | |||
enable = true; | |||
package = pkgs.llama-cpp-vulkan; | |||
# Takes care of downloading if model not present | |||
modelsPreset = { | |||
"Qwen3-Coder-Next" = { | |||
hf-repo = "unsloth/Qwen3-Coder-Next-GGUF"; | |||
hf-file = "Qwen3-Coder-Next-UD-Q4_K_XL.gguf"; | |||
alias = "unsloth/Qwen3-Coder-Next"; | |||
temp = "1.0"; | |||
top-p = "0.95"; | |||
top-k = "40"; | |||
}; | |||
}; | |||
}; | |||
And do a switch to the new configuration | } | ||
</syntaxhighlight>And do a switch to the new configuration | |||
<pre> | <pre> | ||
sudo nixos-rebuild switch | sudo nixos-rebuild switch | ||
</pre> | </pre> | ||
=== Troubleshooting === | |||
==== Failed to create //.cache for shader cache ==== | |||
This is a known issue ([https://github.com/NixOS/nixpkgs/issues/441531 441531]), until it gets fixed, you can add to your conf:<syntaxhighlight lang="nix"> | |||
systemd.services.llama-cpp = { | |||
environment = { | |||
XDG_CACHE_HOME = "/var/cache/llama-cpp"; | |||
MESA_SHADER_CACHE_DIR = "/var/cache/llama-cpp"; | |||
}; | |||
}; | |||
</syntaxhighlight> | |||