Ollama: Difference between revisions
Improved the configuration and usage for ollama |
|||
(5 intermediate revisions by 3 users not shown) | |||
Line 6: | Line 6: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== Configuration == | == Configuration of GPU acceleration == | ||
Enable GPU acceleration for Nvidia graphic cards<syntaxhighlight lang="nix"> | Its possible to use following values for acceleration: | ||
* false: disable GPU, only use CPU | |||
* "rocm": supported by most modern AMD GPUs | |||
* "cuda": supported by most modern NVIDIA GPUs | |||
Example: Enable GPU acceleration for Nvidia graphic cards<syntaxhighlight lang="nix"> | |||
services.ollama = { | services.ollama = { | ||
enable = true; | enable = true; | ||
Line 14: | Line 20: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== Usage == | == Usage via CLI == | ||
Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash"> | === Download a model and run interactive prompt === | ||
Example: Download and run Mistral LLM model as an interactive prompt<syntaxhighlight lang="bash"> | |||
ollama run mistral | ollama run mistral | ||
</syntaxhighlight>For | </syntaxhighlight>For other models see [https://ollama.ai/library Ollama library]. | ||
=== Send a prompt to ollama === | |||
Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt: | |||
<syntaxhighlight lang="bash"> | |||
ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output." | |||
</syntaxhighlight> | |||
== Usage via web API == | |||
Other software can use the web API (default at: http://localhost:11434 ) to query ollama. This works well e.g. in Intellij-IDEs with the CodeGPT and the "Ollama Commit Summarizer" plugins. | |||
== Troubleshooting == | |||
=== AMD GPU with open source driver === | |||
In certain cases ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible. | |||
However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. <ref>https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides</ref> | |||
You can get the version for your GPU from the logs or like so: | |||
<syntaxhighlight lang="bash"> | |||
$ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx" | |||
Name: gfx1031 | |||
</syntaxhighlight> | |||
In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for ollama: | |||
<syntaxhighlight lang="nix"> | |||
services.ollama = { | |||
enable = true; | |||
acceleration = "rocm"; | |||
environmentVariables = { | |||
HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore | |||
}; | |||
rocmOverrideGfx = "10.3.1"; | |||
}; | |||
</syntaxhighlight> | |||
If there are still errors, you can attempt to set a similar value that is listed [https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides here]. | |||
[[Category:Server]] | [[Category:Server]] [[Category:Applications]] |
Latest revision as of 21:16, 30 September 2024
Ollama is an open-source framework designed to facilitate the deployment of large language models on local environments. It aims to simplify the complexities involved in running and managing these models, providing a seamless experience for users across different operating systems.
Setup
Add following line to your system configuration
services.ollama.enable = true;
Configuration of GPU acceleration
Its possible to use following values for acceleration:
- false: disable GPU, only use CPU
- "rocm": supported by most modern AMD GPUs
- "cuda": supported by most modern NVIDIA GPUs
Example: Enable GPU acceleration for Nvidia graphic cards
services.ollama = {
enable = true;
acceleration = "cuda";
};
Usage via CLI
Download a model and run interactive prompt
Example: Download and run Mistral LLM model as an interactive prompt
ollama run mistral
For other models see Ollama library.
Send a prompt to ollama
Example: To download and run codellama with 13 billion parameters in the "instruct" variant and send a prompt:
ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
Usage via web API
Other software can use the web API (default at: http://localhost:11434 ) to query ollama. This works well e.g. in Intellij-IDEs with the CodeGPT and the "Ollama Commit Summarizer" plugins.
Troubleshooting
AMD GPU with open source driver
In certain cases ollama might not allow your system to use GPU acceleration if it cannot be sure your GPU/driver is compatible.
However you can attempt to force-enable the usage of your GPU by overriding the LLVM target. [1]
You can get the version for your GPU from the logs or like so:
$ nix-shell -p "rocmPackages.rocminfo" --run "rocminfo" | grep "gfx"
Name: gfx1031
In this example the LLVM target is "gfx1031", that is, version "10.3.1", you can then override that value for ollama:
services.ollama = {
enable = true;
acceleration = "rocm";
environmentVariables = {
HCC_AMDGPU_TARGET = "gfx1031"; # used to be necessary, but doesn't seem to anymore
};
rocmOverrideGfx = "10.3.1";
};
If there are still errors, you can attempt to set a similar value that is listed here.