Llama-cpp: Difference between revisions

Line 19:

==== in NixOS ====

After enable Unfree software in NixOS add CUDA to your packages

After enable Unfree software in NixOS add CUDA to your packages<syntaxhighlight lang="nixos">

{

<~~pre~~>

environment.systemPackages = [

(pkgs.llama-cpp.override { cudaSupport = true; })

];

}

</~~pre~~>

</syntaxhighlight>And do a switch to the new configuration

And do a switch to the new configuration

sudo nixos-rebuild switch

==== in a shell ====

If you want take the CUDA package for a spin, before adding it to your system, you can open it in a shell:

If you want take the CUDA package for a spin, before adding it to your system, you can open it in a shell:<syntaxhighlight lang="bash">

<~~pre~~>

export NIXPKGS_ALLOW_UNFREE=1

nix shell --impure --expr '(import (builtins.getFlake "nixpkgs") {}).llama-cpp.override { cudaSupport = true; }'

</~~pre~~>

</syntaxhighlight>

=== BLAS Support ===

BLAS support is automatically enabled if none of the GPU accelerators are enabled. You can still manually enable it in your nix configuration by doing:

BLAS support is automatically enabled if none of the GPU accelerators are enabled. You can still manually enable it in your nix configuration by doing:<syntaxhighlight lang="nixos">

{

<~~pre~~>

environment.systemPackages = [

(pkgs.llama-cpp.override { blasSupport = true; })

];

}

</~~pre~~>

</syntaxhighlight>

=== AMD ROCm ===

Line 143:

Line 139:

Once you've made <code>llama-cpp</code> available in your system. You can use <code>llama-cli</code>, which is a straightforward to use tool.

In your shell:

In your shell:<syntaxhighlight lang="bash">

llama-cli \

<~~pre~~>

-hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \

llama-cli \

--temp 1.0 --top-p 0.95 --top-k 40 \

-hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \

-p "briefly explain journalctl in one paragraph"

--temp 1.0 --top-p 0.95 --top-k 40 \

</syntaxhighlight>

-p "briefly explain journalctl in one paragraph"

</~~pre~~>

== llama-server ==

Line 156:

Line 150:

<code>llama-server</code> runs a server, and it can run models on demand. It's quite similar to [[Ollama]].

You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, but we are going to see the integration with NixOS as a service.

You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, but we are going to see the integration with NixOS as a service.<syntaxhighlight lang="nixos">

services.llama-cpp = {

{

enable = true;

services.llama-cpp = {

package = pkgs.llama-cpp-vulkan;

enable = true;

package = pkgs.llama-cpp-vulkan;

# Takes care of downloading if model not present

modelsPreset = {

"Qwen3-Coder-Next" = {

hf-repo = "unsloth/Qwen3-Coder-Next-GGUF";

hf-file = "Qwen3-Coder-Next-UD-Q4_K_XL.gguf";

alias = "unsloth/Qwen3-Coder-Next";

temp = "1.0";

top-p = "0.95";

top-k = "40";

};

And do a switch to the new configuration

}

</syntaxhighlight>And do a switch to the new configuration

<pre>

sudo nixos-rebuild switch

</pre>

=== Troubleshooting ===

==== Failed to create //.cache for shader cache ====

This is a known issue ([https://github.com/NixOS/nixpkgs/issues/441531 441531]), until it gets fixed, you can add to your conf:<syntaxhighlight lang="nix">

systemd.services.llama-cpp = {

environment = {

XDG_CACHE_HOME = "/var/cache/llama-cpp";

MESA_SHADER_CACHE_DIR = "/var/cache/llama-cpp";

};

</syntaxhighlight>

@@ Line 19: / Line 19: @@
 ==== in NixOS ====
-After enable Unfree software in NixOS add CUDA to your packages
+After enable Unfree software in NixOS add CUDA to your packages<syntaxhighlight lang="nixos">
+{
-<pre>
+  environment.systemPackages = [
-environment.systemPackages = [
+    (pkgs.llama-cpp.override { cudaSupport = true; })
-  (pkgs.llama-cpp.override { cudaSupport = true; })
+  ];
-];
+}
-</pre>
+</syntaxhighlight>And do a switch to the new configuration
-And do a switch to the new configuration
   sudo nixos-rebuild switch
 ==== in a shell ====
-If you want take the CUDA package for a spin, before adding it to your system, you can open it in a shell:
+If you want take the CUDA package for a spin, before adding it to your system, you can open it in a shell:<syntaxhighlight lang="bash">
-<pre>
 export NIXPKGS_ALLOW_UNFREE=1
 nix shell --impure --expr '(import (builtins.getFlake "nixpkgs") {}).llama-cpp.override { cudaSupport = true; }'
-</pre>
+</syntaxhighlight>
 === BLAS Support ===
-BLAS support is automatically enabled if none of the GPU accelerators are enabled. You can still manually enable it in your nix configuration by doing:
+BLAS support is automatically enabled if none of the GPU accelerators are enabled. You can still manually enable it in your nix configuration by doing:<syntaxhighlight lang="nixos">
+{
-<pre>
+  environment.systemPackages = [
-environment.systemPackages = [
+    (pkgs.llama-cpp.override { blasSupport = true; })
-  (pkgs.llama-cpp.override { blasSupport = true; })
+  ];
-];
+}
-</pre>
+</syntaxhighlight>
 === AMD ROCm ===
@@ Line 143: / Line 139: @@
 Once you've made <code>llama-cpp</code> available in your system. You can use <code>llama-cli</code>, which is a straightforward to use tool.
-In your shell:
+In your shell:<syntaxhighlight lang="bash">
+llama-cli \
-<pre>
+  -hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \
-llama-cli \
+  --temp 1.0 --top-p 0.95 --top-k 40 \
-    -hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \
+  -p "briefly explain journalctl in one paragraph"
-    --temp 1.0 --top-p 0.95 --top-k 40 \
+</syntaxhighlight>
-    -p "briefly explain journalctl in one paragraph"
-</pre>
 == llama-server ==
@@ Line 156: / Line 150: @@
 <code>llama-server</code> runs a server, and it can run models on demand. It's quite similar to [[Ollama]].
-You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, but we are going to see the integration with NixOS as a service.
+You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, but we are going to see the integration with NixOS as a service.<syntaxhighlight lang="nixos">
- services.llama-cpp = {
+{
-   enable = true;
+  services.llama-cpp = {
-   package = pkgs.llama-cpp-vulkan;
+    enable = true;
+    package = pkgs.llama-cpp-vulkan;
-   # Takes care of downloading if model not present
+    # Takes care of downloading if model not present
-   modelsPreset = {
+    modelsPreset = {
-     "Qwen3-Coder-Next" = {
+      "Qwen3-Coder-Next" = {
-       hf-repo = "unsloth/Qwen3-Coder-Next-GGUF";
+        hf-repo = "unsloth/Qwen3-Coder-Next-GGUF";
-       hf-file = "Qwen3-Coder-Next-UD-Q4_K_XL.gguf";
+        hf-file = "Qwen3-Coder-Next-UD-Q4_K_XL.gguf";
-       alias = "unsloth/Qwen3-Coder-Next";
+        alias = "unsloth/Qwen3-Coder-Next";
-       temp = "1.0";
+        temp = "1.0";
-       top-p = "0.95";
+        top-p = "0.95";
-       top-k = "40";
+        top-k = "40";
-     };
+      };
-   };
+    };
- };
+  };
-And do a switch to the new configuration
+}
+</syntaxhighlight>And do a switch to the new configuration
 <pre>
 sudo nixos-rebuild switch
 </pre>
+=== Troubleshooting ===
+==== Failed to create //.cache for shader cache ====
+This is a known issue ([https://github.com/NixOS/nixpkgs/issues/441531 441531]), until it gets fixed, you can add to your conf:<syntaxhighlight lang="nix">
+systemd.services.llama-cpp = {
+  environment = {
+    XDG_CACHE_HOME = "/var/cache/llama-cpp";
+    MESA_SHADER_CACHE_DIR = "/var/cache/llama-cpp";
+  };
+};
+</syntaxhighlight>