Jump to content

Ollama: Difference between revisions

From NixOS Wiki
m Add missing '=' char after "services.ollama".
This has been changed so that it actually overrides the proper env vars. The last format shouldn't work as the convention is to use 10.3.0 for other rocm devices.
 
(10 intermediate revisions by 4 users not shown)
Line 2: Line 2:


== Setup ==
== Setup ==
Add following line to your system configuration<syntaxhighlight lang="nix">
You can add Ollama in two ways to your system configuration.
 
As a standalone package:
<syntaxhighlight lang="nix">
environment.systemPackages = [ pkgs.ollama ];
</syntaxhighlight>
 
As a systemd service:
<syntaxhighlight lang="nix">
services.ollama = {
services.ollama = {
   enable = true;
   enable = true;
   # Optional: load models on startup
   # Optional: preload models, see https://ollama.com/library
   loadModels = [ ... ];
   loadModels = [ "llama3.2:3b" "deepseek-r1:1.5b"];
};
};
</syntaxhighlight>
</syntaxhighlight>
Line 17: Line 25:




Example: Enable GPU acceleration for Nvidia graphic cards<syntaxhighlight lang="nix">
Example: Enable GPU acceleration for Nvidia graphic cards
 
As a standalone package:
<syntaxhighlight lang="nix">
environment.systemPackages = [
  (pkgs.ollama.override {
      acceleration = "cuda";
    })
  ];
</syntaxhighlight>
 
As a systemd service:
<syntaxhighlight lang="nix">
services.ollama = {
services.ollama = {
   enable = true;
   enable = true;
Line 23: Line 43:
};
};
</syntaxhighlight>
</syntaxhighlight>
To find out whether a model is running on CPU or GPU, you can either
look at the logs of
<syntaxhighlight lang="bash">
$ ollama serve
</syntaxhighlight>
and search for "looking for compatible GPUs" and "new model will fit in available VRAM in single GPU, loading"
or while a model is answering run in another terminal
<syntaxhighlight lang="bash">
$ ollama ps
NAME        ID              SIZE      PROCESSOR    UNTIL
gemma3:4b    c0494fe00251    4.7 GB    100% GPU    4 minutes from now
</syntaxhighlight>
In this example we see "100% GPU".


== Usage via CLI ==
== Usage via CLI ==
=== Download a model and run interactive prompt ===
=== Download a model and run interactive prompt ===
Example: Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash">
Example: Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash">
ollama run mistral
$ ollama run mistral
</syntaxhighlight>For other models see [https://ollama.ai/library Ollama library].
</syntaxhighlight>For other models see [https://ollama.ai/library Ollama library].


Line 33: Line 68:
Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt:
Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
</syntaxhighlight>
 
=== See usage and speed statistics ===
Add "--verbose" to see statistics after each prompt:
<syntaxhighlight lang="bash">
$ ollama run codellama:13b-instruct --verbose "Write an extended Python program..."
...
total duration:      50.302071991s
load duration:        50.912267ms
prompt eval count:    49 token(s)
prompt eval duration: 4.654s
prompt eval rate:    10.53 tokens/s <- how fast it processed your input prompt
eval count:          182 token(s)
eval duration:        45.595s
eval rate:            3.99 tokens/s  <- how fast it printed a response
</syntaxhighlight>
</syntaxhighlight>


== Usage via web API ==
== Usage via web API ==
Other software can use the web API (default at: http://localhost:11434 ) to query ollama. This works well e.g. in Intellij-IDEs with the CodeGPT and the "Ollama Commit Summarizer" plugins.
Other software can use the web API (default at: http://localhost:11434 ) to query Ollama. This works well e.g. in Intellij-IDEs with the "ProxyAI" and the "Ollama Commit Summarizer" plugins.
 
Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/:
services.open-webui.enable = true;


== Troubleshooting ==
== Troubleshooting ==
=== AMD GPU with open source driver ===  
=== AMD GPU with open source driver ===  


In certain cases ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.
In certain cases Ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.


However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides</ref>
However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides</ref>
Line 49: Line 102:


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# classical
$ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx"
$ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx"
Name:                    gfx1031
# flakes
$ nix run nixpkgs#"rocmPackages.rocminfo" -- --run "rocminfo" | grep "gfx"
Name:                    gfx1031
Name:                    gfx1031
</syntaxhighlight>
</syntaxhighlight>


In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for ollama:
In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for Ollama for the systemd service:
<syntaxhighlight lang="nix">
<syntaxhighlight lang="nix">
services.ollama = {
services.ollama = {
Line 61: Line 119:
     HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore
     HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore
   };
   };
   rocmOverrideGfx = "10.3.1";
  # results in environment variable "HSA_OVERRIDE_GFX_VERSION=10.3.0"
   rocmOverrideGfx = "10.3.0";
};
};
</syntaxhighlight>
</syntaxhighlight>
or via an environment variable in front of the standalone app
<syntaxhighlight lang="bash">
HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve
</syntaxhighlight>
If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here].
If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here].



Latest revision as of 19:05, 20 April 2025

Ollama is an open-source framework designed to facilitate the deployment of large language models on local environments. It aims to simplify the complexities involved in running and managing these models, providing a seamless experience for users across different operating systems.

Setup

You can add Ollama in two ways to your system configuration.

As a standalone package:

environment.systemPackages = [ pkgs.ollama ];

As a systemd service:

services.ollama = {
  enable = true;
  # Optional: preload models, see https://ollama.com/library
  loadModels = [ "llama3.2:3b" "deepseek-r1:1.5b"];
};

Configuration of GPU acceleration

Its possible to use following values for acceleration:

  • false: disable GPU, only use CPU
  • "rocm": supported by most modern AMD GPUs
  • "cuda": supported by most modern NVIDIA GPUs


Example: Enable GPU acceleration for Nvidia graphic cards

As a standalone package:

environment.systemPackages = [
   (pkgs.ollama.override { 
      acceleration = "cuda";
    })
  ];

As a systemd service:

services.ollama = {
  enable = true;
  acceleration = "cuda";
};

To find out whether a model is running on CPU or GPU, you can either look at the logs of

$ ollama serve

and search for "looking for compatible GPUs" and "new model will fit in available VRAM in single GPU, loading"

or while a model is answering run in another terminal

$ ollama ps
NAME         ID              SIZE      PROCESSOR    UNTIL
gemma3:4b    c0494fe00251    4.7 GB    100% GPU     4 minutes from now

In this example we see "100% GPU".

Usage via CLI

Download a model and run interactive prompt

Example: Download and run Mistral LLM model as an interactive prompt

$ ollama run mistral

For other models see Ollama library.

Send a prompt to ollama

Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt:

$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."

See usage and speed statistics

Add "--verbose" to see statistics after each prompt:

$ ollama run codellama:13b-instruct --verbose "Write an extended Python program..."
...
total duration:       50.302071991s
load duration:        50.912267ms
prompt eval count:    49 token(s)
prompt eval duration: 4.654s
prompt eval rate:     10.53 tokens/s <- how fast it processed your input prompt
eval count:           182 token(s)
eval duration:        45.595s
eval rate:            3.99 tokens/s  <- how fast it printed a response

Usage via web API

Other software can use the web API (default at: http://localhost:11434 ) to query Ollama. This works well e.g. in Intellij-IDEs with the "ProxyAI" and the "Ollama Commit Summarizer" plugins.

Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/:

services.open-webui.enable = true;

Troubleshooting

AMD GPU with open source driver

In certain cases Ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.

However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. [1]

You can get the version for your GPU from the logs or like so:

# classical
$ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx"
Name:                    gfx1031

# flakes
$ nix run nixpkgs#"rocmPackages.rocminfo" -- --run "rocminfo" | grep "gfx"
Name:                    gfx1031

In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for Ollama for the systemd service:

services.ollama = {
  enable = true;
  acceleration = "rocm";
  environmentVariables = {
    HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore
  };
  # results in environment variable "HSA_OVERRIDE_GFX_VERSION=10.3.0"
  rocmOverrideGfx = "10.3.0";
};

or via an environment variable in front of the standalone app

HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve

If there are still errors, you can attempt to set a similar value that is listed here.