Ollama: Difference between revisions
Tag: Undo |
m →Configuration of GPU acceleration: Document vulkan option |
||
| (10 intermediate revisions by 5 users not shown) | |||
| Line 2: | Line 2: | ||
== Setup == | == Setup == | ||
You can add Ollama in two ways to your system configuration. | |||
As a standalone package: | |||
<syntaxhighlight lang="nix"> | |||
environment.systemPackages = [ pkgs.ollama ]; | |||
</syntaxhighlight> | |||
As a systemd service: | |||
<syntaxhighlight lang="nix"> | |||
services.ollama = { | services.ollama = { | ||
enable = true; | enable = true; | ||
# Optional: | # Optional: preload models, see https://ollama.com/library | ||
loadModels = [ .. | loadModels = [ "llama3.2:3b" "deepseek-r1:1.5b"]; | ||
}; | }; | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| Line 15: | Line 23: | ||
* "rocm": supported by most modern AMD GPUs | * "rocm": supported by most modern AMD GPUs | ||
* "cuda": supported by most modern NVIDIA GPUs | * "cuda": supported by most modern NVIDIA GPUs | ||
* "vulkan": supported by most modern GPUs on Linux | |||
Example: Enable GPU acceleration for Nvidia graphic cards<syntaxhighlight lang="nix"> | Example: Enable GPU acceleration for Nvidia graphic cards | ||
As a standalone package: | |||
<syntaxhighlight lang="nix"> | |||
environment.systemPackages = [ | |||
(pkgs.ollama.override { | |||
acceleration = "cuda"; | |||
}) | |||
]; | |||
</syntaxhighlight> | |||
As a systemd service: | |||
<syntaxhighlight lang="nix"> | |||
services.ollama = { | services.ollama = { | ||
enable = true; | enable = true; | ||
| Line 23: | Line 44: | ||
}; | }; | ||
</syntaxhighlight> | </syntaxhighlight> | ||
To find out whether a model is running on CPU or GPU, you can either | |||
look at the logs of | |||
<syntaxhighlight lang="bash"> | |||
$ ollama serve | |||
</syntaxhighlight> | |||
and search for "looking for compatible GPUs" and "new model will fit in available VRAM in single GPU, loading" | |||
or while a model is answering run in another terminal | |||
<syntaxhighlight lang="bash"> | |||
$ ollama ps | |||
NAME ID SIZE PROCESSOR UNTIL | |||
gemma3:4b c0494fe00251 4.7 GB 100% GPU 4 minutes from now | |||
</syntaxhighlight> | |||
In this example we see "100% GPU". | |||
== Usage via CLI == | == Usage via CLI == | ||
=== Download a model and run interactive prompt === | === Download a model and run interactive prompt === | ||
Example: Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash"> | Example: Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash"> | ||
ollama run mistral | $ ollama run mistral | ||
</syntaxhighlight>For other models see [https://ollama.ai/library Ollama library]. | </syntaxhighlight>For other models see [https://ollama.ai/library Ollama library]. | ||
| Line 33: | Line 69: | ||
Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt: | Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt: | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output." | $ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output." | ||
</syntaxhighlight> | |||
=== See usage and speed statistics === | |||
Add "--verbose" to see statistics after each prompt: | |||
<syntaxhighlight lang="bash"> | |||
$ ollama run codellama:13b-instruct --verbose "Write an extended Python program..." | |||
... | |||
total duration: 50.302071991s | |||
load duration: 50.912267ms | |||
prompt eval count: 49 token(s) | |||
prompt eval duration: 4.654s | |||
prompt eval rate: 10.53 tokens/s <- how fast it processed your input prompt | |||
eval count: 182 token(s) | |||
eval duration: 45.595s | |||
eval rate: 3.99 tokens/s <- how fast it printed a response | |||
</syntaxhighlight> | </syntaxhighlight> | ||
== Usage via web API == | == Usage via web API == | ||
Other software can use the web API (default at: http://localhost:11434 ) to query | Other software can use the web API (default at: http://localhost:11434 ) to query Ollama. This works well e.g. in Intellij-IDEs with the "ProxyAI" and the "Ollama Commit Summarizer" plugins. | ||
Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/: | Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/: | ||
| Line 45: | Line 96: | ||
=== AMD GPU with open source driver === | === AMD GPU with open source driver === | ||
In certain cases | Use the ollama-rocm nix package: | ||
<syntaxhighlight lang="nix"> | |||
environment.systemPackages = [ pkgs.ollama-rocm ]; | |||
</syntaxhighlight> | |||
And make sure the kernel loads the amdgpu driver: | |||
<syntaxhighlight lang="nix"> | |||
boot.initrd.kernelModules = [ "amdgpu" ]; | |||
</syntaxhighlight> | |||
In certain cases Ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible. | |||
However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu. | However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.mdx#overrides-on-linux</ref> | ||
You can get the version for your GPU from the logs or like so: | You can get the version for your GPU from the logs or like so: | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
# classical | |||
$ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx" | $ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx" | ||
Name: gfx1031 | |||
# flakes | |||
$ nix run nixpkgs#"rocmPackages.rocminfo" -- --run "rocminfo" | grep "gfx" | |||
Name: gfx1031 | Name: gfx1031 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for | In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for Ollama for the systemd service: | ||
<syntaxhighlight lang="nix"> | <syntaxhighlight lang="nix"> | ||
services.ollama = { | services.ollama = { | ||
| Line 64: | Line 132: | ||
HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore | HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore | ||
}; | }; | ||
rocmOverrideGfx = "10.3. | # results in environment variable "HSA_OVERRIDE_GFX_VERSION=10.3.0" | ||
rocmOverrideGfx = "10.3.0"; | |||
}; | }; | ||
</syntaxhighlight> | </syntaxhighlight> | ||
or via an environment variable in front of the standalone app | |||
<syntaxhighlight lang="bash"> | |||
HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve | |||
</syntaxhighlight> | |||
If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here]. | If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here]. | ||