Ollama: Difference between revisions

Newer edit →

Retrieved from "https://wiki.nixos.org/wiki/Ollama"

@@ Line 2: / Line 2: @@
 == Setup ==
-Add following line to your system configuration<syntaxhighlight lang="nix">
+You can add Ollama in two ways to your system configuration.
-services.ollama.enable = true;
+As a standalone package:
+<syntaxhighlight lang="nix">
+environment.systemPackages = [ pkgs.ollama ];
+</syntaxhighlight>
+As a systemd service:
+<syntaxhighlight lang="nix">
+services.ollama = {
+  enable = true;
+  # Optional: load models on startup
+  loadModels = [ ... ];
+};
+</syntaxhighlight>
+== Configuration of GPU acceleration ==
+Its possible to use following values for acceleration:
+* false: disable GPU, only use CPU
+* "rocm": supported by most modern AMD GPUs
+* "cuda": supported by most modern NVIDIA GPUs
+Example: Enable GPU acceleration for Nvidia graphic cards
+As a standalone package:
+<syntaxhighlight lang="nix">
+environment.systemPackages = [
+   (pkgs.ollama.override {
+      acceleration = "cuda";
+    })
+  ];
 </syntaxhighlight>
-== Configuration ==
+As a systemd service:
-Enable GPU acceleration for Nvidia graphic cards<syntaxhighlight lang="nix">
+<syntaxhighlight lang="nix">
 services.ollama = {
    enable = true;
@@ Line 14: / Line 44: @@
 </syntaxhighlight>
-== Usage ==
+To find out whether a model is running on CPU or GPU, you can either
-Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash">
+look at the logs of
-ollama run mistral
+<syntaxhighlight lang="bash">
+$ ollama serve
 </syntaxhighlight>
+and search for "looking for compatible GPUs" and "new model will fit in available VRAM in single GPU, loading"
+or while a model is answering run in another terminal
+<syntaxhighlight lang="bash">
+$ ollama ps
+NAME         ID              SIZE      PROCESSOR    UNTIL
+gemma3:4b    c0494fe00251    4.7 GB    100% GPU     4 minutes from now
+</syntaxhighlight>
+In this example we see "100% GPU".
+== Usage via CLI ==
+=== Download a model and run interactive prompt ===
+Example: Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash">
+$ ollama run mistral
+</syntaxhighlight>For other models see [https://ollama.ai/library Ollama library].
+=== Send a prompt to ollama ===
+Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt:
+<syntaxhighlight lang="bash">
+$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
+</syntaxhighlight>
+=== See usage and speed statistics ===
+Add "--verbose" to see statistics after each prompt:
+<syntaxhighlight lang="bash">
+$ ollama run codellama:13b-instruct --verbose "Write an extended Python program..."
+...
+total duration:       50.302071991s
+load duration:        50.912267ms
+prompt eval count:    49 token(s)
+prompt eval duration: 4.654s
+prompt eval rate:     10.53 tokens/s <- how fast it processed your input prompt
+eval count:           182 token(s)
+eval duration:        45.595s
+eval rate:            3.99 tokens/s  <- how fast it printed a response
+</syntaxhighlight>
+== Usage via web API ==
+Other software can use the web API (default at: http://localhost:11434 ) to query Ollama. This works well e.g. in Intellij-IDEs with the "ProxyAI" and the "Ollama Commit Summarizer" plugins.
+Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/:
+ services.open-webui.enable = true;
+== Troubleshooting ==
+=== AMD GPU with open source driver ===
+In certain cases Ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.
+However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides</ref>
+You can get the version for your GPU from the logs or like so:
+<syntaxhighlight lang="bash">
+$ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx"
+Name:                    gfx1031
+</syntaxhighlight>
+In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for Ollama for the systemd service:
+<syntaxhighlight lang="nix">
+services.ollama = {
+  enable = true;
+  acceleration = "rocm";
+  environmentVariables = {
+    HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore
+  };
+  # results in environment variable "HSA_OVERRIDE_GFX_VERSION=10.3.1"
+  rocmOverrideGfx = "10.3.1";
+};
+</syntaxhighlight>
+or via an environment variable in front of the standalone app
+<syntaxhighlight lang="bash">
+HSA_OVERRIDE_GFX_VERSION=10.3.1 ollama serve
+</syntaxhighlight>
+If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here].
+[[Category:Server]]
+[[Category:Applications]]
+[[Category:CLI Applications]]