Ollama: Difference between revisions

(14 intermediate revisions by 8 users not shown)

Line 2:

== Setup ==

~~Add following line~~ to your system configuration<syntaxhighlight lang="nix">

You can add Ollama in two ways to your system configuration.

services.ollama {

As a standalone package:

environment.systemPackages = [ pkgs.ollama ];

</syntaxhighlight>

As a systemd service:

services.ollama = {

enable = true;

# Optional: ~~load~~ models ~~on startup~~

# Optional: preload models, see https://ollama.com/library

loadModels = [ ... ];

loadModels = [ "llama3.2:3b" "deepseek-r1:1.5b"];

};

</syntaxhighlight>

Line 15:

Line 23:

* "rocm": supported by most modern AMD GPUs

* "cuda": supported by most modern NVIDIA GPUs

* "vulkan": supported by most modern GPUs on Linux

Example: Enable GPU acceleration for Nvidia graphic cards

As a standalone package:

environment.systemPackages = [

(pkgs.ollama.override {

acceleration = "cuda";

})

];

</syntaxhighlight>

~~Example~~: ~~Enable GPU acceleration for Nvidia graphic cards~~<syntaxhighlight lang="nix">

As a systemd service:

services.ollama = {

enable = true;

Line 23:

Line 44:

};

</syntaxhighlight>

To find out whether a model is running on CPU or GPU, you can either

look at the logs of

$ ollama serve

</syntaxhighlight>

and search for "looking for compatible GPUs" and "new model will fit in available VRAM in single GPU, loading"

or while a model is answering run in another terminal

$ ollama ps

NAME ID SIZE PROCESSOR UNTIL

gemma3:4b c0494fe00251 4.7 GB 100% GPU 4 minutes from now

</syntaxhighlight>

In this example we see "100% GPU".

== Usage via CLI ==

=== Download a model and run interactive prompt ===

Example: Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash">

ollama run mistral

$ ollama run mistral

</syntaxhighlight>For other models see [https://ollama.ai/library Ollama library].

Line 33:

Line 69:

Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt:

ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."

$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."

</syntaxhighlight>

=== See usage and speed statistics ===

Add "--verbose" to see statistics after each prompt:

$ ollama run codellama:13b-instruct --verbose "Write an extended Python program..."

...

total duration: 50.302071991s

load duration: 50.912267ms

prompt eval count: 49 token(s)

prompt eval duration: 4.654s

prompt eval rate: 10.53 tokens/s <- how fast it processed your input prompt

eval count: 182 token(s)

eval duration: 45.595s

eval rate: 3.99 tokens/s <- how fast it printed a response

</syntaxhighlight>

== Usage via web API ==

Other software can use the web API (default at: http://localhost:11434 ) to query ~~ollama~~. This works well e.g. in Intellij-IDEs with the ~~CodeGPT~~ and the "Ollama Commit Summarizer" plugins.

Other software can use the web API (default at: http://localhost:11434 ) to query Ollama. This works well e.g. in Intellij-IDEs with the "ProxyAI" and the "Ollama Commit Summarizer" plugins.

Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/:

services.open-webui.enable = true;

== Troubleshooting ==

=== AMD GPU with open source driver ===

~~In certain cases~~ ollama ~~might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.~~

Use the ollama-rocm nix package:

However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides</ref>

environment.systemPackages = [ pkgs.ollama-rocm ];

</syntaxhighlight>

And make sure the kernel loads the amdgpu driver:

boot.initrd.kernelModules = [ "amdgpu" ];

</syntaxhighlight>

In certain cases Ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.

However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.mdx#overrides-on-linux</ref>

You can get the version for your GPU from the logs or like so:

# classical

$ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx"

Name: gfx1031

# flakes

$ nix run nixpkgs#"rocmPackages.rocminfo" -- --run "rocminfo" | grep "gfx"

Name: gfx1031

</syntaxhighlight>

In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for ~~ollama~~:

In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for Ollama for the systemd service:

services.ollama = {

Line 61:

Line 132:

HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore

};

rocmOverrideGfx = "10.3.1";

# results in environment variable "HSA_OVERRIDE_GFX_VERSION=10.3.0"

rocmOverrideGfx = "10.3.0";

};

</syntaxhighlight>

or via an environment variable in front of the standalone app

HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve

</syntaxhighlight>

If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here].

@@ Line 2: / Line 2: @@
 == Setup ==
-Add following line to your system configuration<syntaxhighlight lang="nix">
+You can add Ollama in two ways to your system configuration.
-services.ollama {
+As a standalone package:
+<syntaxhighlight lang="nix">
+environment.systemPackages = [ pkgs.ollama ];
+</syntaxhighlight>
+As a systemd service:
+<syntaxhighlight lang="nix">
+services.ollama = {
    enable = true;
-   # Optional: load models on startup
+   # Optional: preload models, see https://ollama.com/library
-   loadModels = [ ... ];
+   loadModels = [ "llama3.2:3b" "deepseek-r1:1.5b"];
 };
 </syntaxhighlight>
@@ Line 15: / Line 23: @@
 * "rocm": supported by most modern AMD GPUs
 * "cuda": supported by most modern NVIDIA GPUs
+* "vulkan": supported by most modern GPUs on Linux
+Example: Enable GPU acceleration for Nvidia graphic cards
+As a standalone package:
+<syntaxhighlight lang="nix">
+environment.systemPackages = [
+   (pkgs.ollama.override {
+      acceleration = "cuda";
+    })
+  ];
+</syntaxhighlight>
-Example: Enable GPU acceleration for Nvidia graphic cards<syntaxhighlight lang="nix">
+As a systemd service:
+<syntaxhighlight lang="nix">
 services.ollama = {
    enable = true;
@@ Line 23: / Line 44: @@
 };
 </syntaxhighlight>
+To find out whether a model is running on CPU or GPU, you can either
+look at the logs of
+<syntaxhighlight lang="bash">
+$ ollama serve
+</syntaxhighlight>
+and search for "looking for compatible GPUs" and "new model will fit in available VRAM in single GPU, loading"
+or while a model is answering run in another terminal
+<syntaxhighlight lang="bash">
+$ ollama ps
+NAME         ID              SIZE      PROCESSOR    UNTIL
+gemma3:4b    c0494fe00251    4.7 GB    100% GPU     4 minutes from now
+</syntaxhighlight>
+In this example we see "100% GPU".
 == Usage via CLI ==
 === Download a model and run interactive prompt ===
 Example: Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash">
-ollama run mistral
+$ ollama run mistral
 </syntaxhighlight>For other models see [https://ollama.ai/library Ollama library].
@@ Line 33: / Line 69: @@
 Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt:
 <syntaxhighlight lang="bash">
-ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
+$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
+</syntaxhighlight>
+=== See usage and speed statistics ===
+Add "--verbose" to see statistics after each prompt:
+<syntaxhighlight lang="bash">
+$ ollama run codellama:13b-instruct --verbose "Write an extended Python program..."
+...
+total duration:       50.302071991s
+load duration:        50.912267ms
+prompt eval count:    49 token(s)
+prompt eval duration: 4.654s
+prompt eval rate:     10.53 tokens/s <- how fast it processed your input prompt
+eval count:           182 token(s)
+eval duration:        45.595s
+eval rate:            3.99 tokens/s  <- how fast it printed a response
 </syntaxhighlight>
 == Usage via web API ==
-Other software can use the web API (default at: http://localhost:11434 ) to query ollama. This works well e.g. in Intellij-IDEs with the CodeGPT and the "Ollama Commit Summarizer" plugins.
+Other software can use the web API (default at: http://localhost:11434 ) to query Ollama. This works well e.g. in Intellij-IDEs with the "ProxyAI" and the "Ollama Commit Summarizer" plugins.
+Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/:
+ services.open-webui.enable = true;
 == Troubleshooting ==
 === AMD GPU with open source driver ===
-In certain cases ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.
+Use the ollama-rocm nix package:
-However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides</ref>
+<syntaxhighlight lang="nix">
+environment.systemPackages = [ pkgs.ollama-rocm ];
+</syntaxhighlight>
+And make sure the kernel loads the amdgpu driver:
+<syntaxhighlight lang="nix">
+  boot.initrd.kernelModules = [ "amdgpu" ];
+</syntaxhighlight>
+In certain cases Ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.
+However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.mdx#overrides-on-linux</ref>
 You can get the version for your GPU from the logs or like so:
 <syntaxhighlight lang="bash">
+# classical
 $ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx"
+Name:                    gfx1031
+# flakes
+$ nix run nixpkgs#"rocmPackages.rocminfo" -- --run "rocminfo" | grep "gfx"
 Name:                    gfx1031
 </syntaxhighlight>
-In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for ollama:
+In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for Ollama for the systemd service:
 <syntaxhighlight lang="nix">
 services.ollama = {
@@ Line 61: / Line 132: @@
      HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore
    };
-   rocmOverrideGfx = "10.3.1";
+  # results in environment variable "HSA_OVERRIDE_GFX_VERSION=10.3.0"
+   rocmOverrideGfx = "10.3.0";
 };
 </syntaxhighlight>
+or via an environment variable in front of the standalone app
+<syntaxhighlight lang="bash">
+HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve
+</syntaxhighlight>
 If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here].