Ollama

Ollama is an open-source framework designed to facilitate the deployment of large language models on local environments. It aims to simplify the complexities involved in running and managing these models, providing a seamless experience for users across different operating systems.

Setup

Add following line to your system configuration

services.ollama.enable = true;

Configuration

Enable GPU acceleration for Nvidia graphic cards

services.ollama = {
  enable = true;
  acceleration = "cuda";
};

Usage

Download and run Mistral LLM model as an interactive prompt

ollama run mistral

For ruther models see Ollama library.