Ollama: Difference between revisions

m Add standalone systempackage example
This has been changed so that it actually overrides the proper env vars. The last format shouldn't work as the convention is to use 10.3.0 for other rocm devices.
 
(5 intermediate revisions by one other user not shown)
Line 13: Line 13:
services.ollama = {
services.ollama = {
   enable = true;
   enable = true;
   # Optional: load models on startup
   # Optional: preload models, see https://ollama.com/library
   loadModels = [ ... ];
   loadModels = [ "llama3.2:3b" "deepseek-r1:1.5b"];
};
};
</syntaxhighlight>
</syntaxhighlight>
Line 46: Line 46:
To find out whether a model is running on CPU or GPU, you can either
To find out whether a model is running on CPU or GPU, you can either
look at the logs of  
look at the logs of  
ollama serve  
<syntaxhighlight lang="bash">
$ ollama serve  
</syntaxhighlight>
and search for "looking for compatible GPUs" and "new model will fit in available VRAM in single GPU, loading"


or while a model is answering run  
or while a model is answering run in another terminal
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ ollama ps
$ ollama ps
Line 66: Line 69:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
</syntaxhighlight>
=== See usage and speed statistics ===
Add "--verbose" to see statistics after each prompt:
<syntaxhighlight lang="bash">
$ ollama run codellama:13b-instruct --verbose "Write an extended Python program..."
...
total duration:      50.302071991s
load duration:        50.912267ms
prompt eval count:    49 token(s)
prompt eval duration: 4.654s
prompt eval rate:    10.53 tokens/s <- how fast it processed your input prompt
eval count:          182 token(s)
eval duration:        45.595s
eval rate:            3.99 tokens/s  <- how fast it printed a response
</syntaxhighlight>
</syntaxhighlight>


== Usage via web API ==
== Usage via web API ==
Other software can use the web API (default at: http://localhost:11434 ) to query ollama. This works well e.g. in Intellij-IDEs with the "ProxyAI" and the "Ollama Commit Summarizer" plugins.
Other software can use the web API (default at: http://localhost:11434 ) to query Ollama. This works well e.g. in Intellij-IDEs with the "ProxyAI" and the "Ollama Commit Summarizer" plugins.


Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/:
Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/:
Line 77: Line 95:
=== AMD GPU with open source driver ===  
=== AMD GPU with open source driver ===  


In certain cases ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.
In certain cases Ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.


However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides</ref>
However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides</ref>
Line 84: Line 102:


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# classical
$ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx"
$ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx"
Name:                    gfx1031
# flakes
$ nix run nixpkgs#"rocmPackages.rocminfo" -- --run "rocminfo" | grep "gfx"
Name:                    gfx1031
Name:                    gfx1031
</syntaxhighlight>
</syntaxhighlight>


In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for ollama:
In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for Ollama for the systemd service:
<syntaxhighlight lang="nix">
<syntaxhighlight lang="nix">
services.ollama = {
services.ollama = {
Line 96: Line 119:
     HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore
     HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore
   };
   };
   rocmOverrideGfx = "10.3.1";
  # results in environment variable "HSA_OVERRIDE_GFX_VERSION=10.3.0"
   rocmOverrideGfx = "10.3.0";
};
};
</syntaxhighlight>
</syntaxhighlight>
or via an environment variable in front of the standalone app
<syntaxhighlight lang="bash">
HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve
</syntaxhighlight>
If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here].
If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here].