Llama-cpp: Difference between revisions

Woile (talk | contribs)
new heading MoE and styling improvements
Woile (talk | contribs)
m added example for llama-server on the terminal
 
(One intermediate revision by the same user not shown)
Line 19: Line 19:
==== in NixOS ====
==== in NixOS ====


After enable Unfree software in NixOS add CUDA to your packages<syntaxhighlight lang="nixos">
After enable Unfree software in NixOS add CUDA to your packages<syntaxhighlight lang="nixos">{
{
   environment.systemPackages = [
   environment.systemPackages = [
     (pkgs.llama-cpp.override { cudaSupport = true; })
     (pkgs.llama-cpp.override { cudaSupport = true; })
   ];
   ];
}
}</syntaxhighlight>And do a switch to the new configuration
</syntaxhighlight>And do a switch to the new configuration
  sudo nixos-rebuild switch
  sudo nixos-rebuild switch


Line 148: Line 146:
Once you've made <code>llama-cpp</code> available in your system. You can use <code>llama-cli</code>, which is a straightforward to use tool.  
Once you've made <code>llama-cpp</code> available in your system. You can use <code>llama-cli</code>, which is a straightforward to use tool.  


In your shell:<syntaxhighlight lang="bash">
In your shell:<syntaxhighlight lang="bash">llama-cli \  
llama-cli \  
   -hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \  
   -hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \  
   --temp 1.0 --top-p 0.95 --top-k 40 \  
   --temp 1.0 --top-p 0.95 --top-k 40 \  
   -p "briefly explain journalctl in one paragraph"
   -p "briefly explain journalctl in one paragraph"</syntaxhighlight>
</syntaxhighlight>


== llama-server ==
== llama-server ==


<code>llama-server</code> runs a server, and it can run models on demand. It's quite similar to [[Ollama]].
<code>llama-server</code> runs a server, and it can run models on demand. It supports OpenAI API standard. It's quite similar to [[Ollama]].


You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, but we are going to see the integration with NixOS as a service.<syntaxhighlight lang="nixos">
You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, <syntaxhighlight lang="bash">
llama-server \
    -hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \
    --temp 1.0 --top-p 0.95 --top-k 40
</syntaxhighlight>
Or alternatively, you can '''enable the NixOS service''' for llama-cpp, which runs the server.{{Warning|Pay attention, that the service is actually called llama-cpp not llama-server}}<syntaxhighlight lang="nixos">
{
{
   services.llama-cpp = {
   services.llama-cpp = {
     enable = true;
     enable = true;
     package = pkgs.llama-cpp-vulkan;
     package = pkgs.llama-cpp-vulkan;
    # package = (pkgs.llama-cpp.override { cudaSupport = true; })
    # package = pkgs.llama-cpp-rocm;
     # Takes care of downloading if model not present
     # Takes care of downloading if model not present
     modelsPreset = {
     modelsPreset = {
Line 183: Line 187:
sudo nixos-rebuild switch
sudo nixos-rebuild switch
</pre>
</pre>
=== Web UI ===
The llama-cpp service includes a web interface, where you can chat. To access you must navigate to http://localhost:8080 . Or the <code>services.llama-cpp.port</code> configured.


=== Troubleshooting ===
=== Troubleshooting ===