Ollama: Difference between revisions

Latest revision as of 04:20, 2 February 2025

Ollama is an open-source framework designed to facilitate the deployment of large language models on local environments. It aims to simplify the complexities involved in running and managing these models, providing a seamless experience for users across different operating systems.

Setup

Add following line to your system configuration

services.ollama = {
  enable = true;
  # Optional: load models on startup
  loadModels = [ ... ];
};

Configuration of GPU acceleration

Its possible to use following values for acceleration:

false: disable GPU, only use CPU
"rocm": supported by most modern AMD GPUs
"cuda": supported by most modern NVIDIA GPUs

Example: Enable GPU acceleration for Nvidia graphic cards

services.ollama = {
  enable = true;
  acceleration = "cuda";
};

Usage via CLI

Download a model and run interactive prompt

Example: Download and run Mistral LLM model as an interactive prompt

ollama run mistral

For other models see Ollama library.

Send a prompt to ollama

Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt:

ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."

Usage via web API

Other software can use the web API (default at: http://localhost:11434 ) to query ollama. This works well e.g. in Intellij-IDEs with the CodeGPT and the "Ollama Commit Summarizer" plugins.

Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/:

services.open-webui.enable = true;

Troubleshooting

AMD GPU with open source driver

In certain cases ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.

However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. ^[1]

You can get the version for your GPU from the logs or like so:

$ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx"
Name:                    gfx1031

In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for ollama:

services.ollama = {
  enable = true;
  acceleration = "rocm";
  environmentVariables = {
    HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore
  };
  rocmOverrideGfx = "10.3.1";
};

If there are still errors, you can attempt to set a similar value that is listed here.

↑ https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides

[1] ttps://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides

[1]

@@ Line 3: / Line 3: @@
 == Setup ==
 Add following line to your system configuration<syntaxhighlight lang="nix">
-services.ollama.enable = true;
+services.ollama = {
+  enable = true;
+  # Optional: load models on startup
+  loadModels = [ ... ];
+};
 </syntaxhighlight>
@@ Line 34: / Line 38: @@
 == Usage via web API ==
 Other software can use the web API (default at: http://localhost:11434 ) to query ollama. This works well e.g. in Intellij-IDEs with the CodeGPT and the "Ollama Commit Summarizer" plugins.
+Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/:
+ services.open-webui.enable = true;
 == Troubleshooting ==
@@ Line 62: / Line 69: @@
 If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here].
-[[Category:Server]] [[Category:Applications]]
+[[Category:Server]]
+[[Category:Applications]]
+[[Category:CLI Applications]]