Ollama: Difference between revisions

(7 intermediate revisions by 5 users not shown)

Line 2:

== Setup ==

~~Add following line~~ to your system configuration<syntaxhighlight lang="nix">

You can add Ollama in two ways to your system configuration.

services.ollama {

As a standalone package:

environment.systemPackages = [ pkgs.ollama ];

</syntaxhighlight>

As a systemd service:

services.ollama = {

enable = true;

# Optional: load models on startup

Line 17:

Line 25:

Example: Enable GPU acceleration for Nvidia graphic cards<syntaxhighlight lang="nix">

Example: Enable GPU acceleration for Nvidia graphic cards

As a standalone package:

environment.systemPackages = [

(pkgs.ollama.override {

acceleration = "cuda";

})

];

</syntaxhighlight>

As a systemd service:

services.ollama = {

enable = true;

Line 23:

Line 43:

};

</syntaxhighlight>

To find out whether a model is running on CPU or GPU, you can either

look at the logs of

$ ollama serve

</syntaxhighlight>

and search for "looking for compatible GPUs" and "new model will fit in available VRAM in single GPU, loading"

or while a model is answering run in another terminal

$ ollama ps

NAME ID SIZE PROCESSOR UNTIL

gemma3:4b c0494fe00251 4.7 GB 100% GPU 4 minutes from now

</syntaxhighlight>

In this example we see "100% GPU".

== Usage via CLI ==

=== Download a model and run interactive prompt ===

Example: Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash">

ollama run mistral

$ ollama run mistral

</syntaxhighlight>For other models see [https://ollama.ai/library Ollama library].

Line 33:

Line 68:

Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt:

ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."

$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."

</syntaxhighlight>

=== See usage and speed statistics ===

Add "--verbose" to see statistics after each prompt:

$ ollama run codellama:13b-instruct --verbose "Write an extended Python program..."

...

total duration: 50.302071991s

load duration: 50.912267ms

prompt eval count: 49 token(s)

prompt eval duration: 4.654s

prompt eval rate: 10.53 tokens/s <- how fast it processed your input prompt

eval count: 182 token(s)

eval duration: 45.595s

eval rate: 3.99 tokens/s <- how fast it printed a response

</syntaxhighlight>

== Usage via web API ==

Other software can use the web API (default at: http://localhost:11434 ) to query ~~ollama~~. This works well e.g. in Intellij-IDEs with the ~~CodeGPT~~ and the "Ollama Commit Summarizer" plugins.

Other software can use the web API (default at: http://localhost:11434 ) to query Ollama. This works well e.g. in Intellij-IDEs with the "ProxyAI" and the "Ollama Commit Summarizer" plugins.

Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/:

services.open-webui.enable = true;

== Troubleshooting ==

=== AMD GPU with open source driver ===

In certain cases ~~ollama~~ might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.

In certain cases Ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.

However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides</ref>

Line 53:

Line 106:

</syntaxhighlight>

In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for ~~ollama~~:

In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for Ollama for the systemd service:

services.ollama = {

Line 61:

Line 114:

HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore

};

# results in environment variable "HSA_OVERRIDE_GFX_VERSION=10.3.1"

rocmOverrideGfx = "10.3.1";

};

</syntaxhighlight>

or via an environment variable in front of the standalone app

HSA_OVERRIDE_GFX_VERSION=10.3.1 ollama serve

</syntaxhighlight>

If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here].

@@ Line 2: / Line 2: @@
 == Setup ==
-Add following line to your system configuration<syntaxhighlight lang="nix">
+You can add Ollama in two ways to your system configuration.
-services.ollama {
+As a standalone package:
+<syntaxhighlight lang="nix">
+environment.systemPackages = [ pkgs.ollama ];
+</syntaxhighlight>
+As a systemd service:
+<syntaxhighlight lang="nix">
+services.ollama = {
    enable = true;
    # Optional: load models on startup
@@ Line 17: / Line 25: @@
-Example: Enable GPU acceleration for Nvidia graphic cards<syntaxhighlight lang="nix">
+Example: Enable GPU acceleration for Nvidia graphic cards
+As a standalone package:
+<syntaxhighlight lang="nix">
+environment.systemPackages = [
+   (pkgs.ollama.override {
+      acceleration = "cuda";
+    })
+  ];
+</syntaxhighlight>
+As a systemd service:
+<syntaxhighlight lang="nix">
 services.ollama = {
    enable = true;
@@ Line 23: / Line 43: @@
 };
 </syntaxhighlight>
+To find out whether a model is running on CPU or GPU, you can either
+look at the logs of
+<syntaxhighlight lang="bash">
+$ ollama serve
+</syntaxhighlight>
+and search for "looking for compatible GPUs" and "new model will fit in available VRAM in single GPU, loading"
+or while a model is answering run in another terminal
+<syntaxhighlight lang="bash">
+$ ollama ps
+NAME         ID              SIZE      PROCESSOR    UNTIL
+gemma3:4b    c0494fe00251    4.7 GB    100% GPU     4 minutes from now
+</syntaxhighlight>
+In this example we see "100% GPU".
 == Usage via CLI ==
 === Download a model and run interactive prompt ===
 Example: Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash">
-ollama run mistral
+$ ollama run mistral
 </syntaxhighlight>For other models see [https://ollama.ai/library Ollama library].
@@ Line 33: / Line 68: @@
 Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt:
 <syntaxhighlight lang="bash">
-ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
+$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
+</syntaxhighlight>
+=== See usage and speed statistics ===
+Add "--verbose" to see statistics after each prompt:
+<syntaxhighlight lang="bash">
+$ ollama run codellama:13b-instruct --verbose "Write an extended Python program..."
+...
+total duration:       50.302071991s
+load duration:        50.912267ms
+prompt eval count:    49 token(s)
+prompt eval duration: 4.654s
+prompt eval rate:     10.53 tokens/s <- how fast it processed your input prompt
+eval count:           182 token(s)
+eval duration:        45.595s
+eval rate:            3.99 tokens/s  <- how fast it printed a response
 </syntaxhighlight>
 == Usage via web API ==
-Other software can use the web API (default at: http://localhost:11434 ) to query ollama. This works well e.g. in Intellij-IDEs with the CodeGPT and the "Ollama Commit Summarizer" plugins.
+Other software can use the web API (default at: http://localhost:11434 ) to query Ollama. This works well e.g. in Intellij-IDEs with the "ProxyAI" and the "Ollama Commit Summarizer" plugins.
+Alternatively, on enabling "open-webui", a web portal is available at: http://localhost:8080/:
+ services.open-webui.enable = true;
 == Troubleshooting ==
 === AMD GPU with open source driver ===
-In certain cases ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.
+In certain cases Ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.
 However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides</ref>
@@ Line 53: / Line 106: @@
 </syntaxhighlight>
-In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for ollama:
+In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for Ollama for the systemd service:
 <syntaxhighlight lang="nix">
 services.ollama = {
@@ Line 61: / Line 114: @@
      HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore
    };
+  # results in environment variable "HSA_OVERRIDE_GFX_VERSION=10.3.1"
    rocmOverrideGfx = "10.3.1";
 };
 </syntaxhighlight>
+or via an environment variable in front of the standalone app
+<syntaxhighlight lang="bash">
+HSA_OVERRIDE_GFX_VERSION=10.3.1 ollama serve
+</syntaxhighlight>
 If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here].