Llama-cpp: Difference between revisions
m added webui segment, and a warning about naming |
m added example for llama-server on the terminal |
||
| Line 146: | Line 146: | ||
Once you've made <code>llama-cpp</code> available in your system. You can use <code>llama-cli</code>, which is a straightforward to use tool. | Once you've made <code>llama-cpp</code> available in your system. You can use <code>llama-cli</code>, which is a straightforward to use tool. | ||
In your shell:<syntaxhighlight lang="bash"> | In your shell:<syntaxhighlight lang="bash">llama-cli \ | ||
llama-cli \ | |||
-hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \ | -hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \ | ||
--temp 1.0 --top-p 0.95 --top-k 40 \ | --temp 1.0 --top-p 0.95 --top-k 40 \ | ||
-p "briefly explain journalctl in one paragraph" | -p "briefly explain journalctl in one paragraph"</syntaxhighlight> | ||
</syntaxhighlight> | |||
== llama-server == | == llama-server == | ||
| Line 157: | Line 155: | ||
<code>llama-server</code> runs a server, and it can run models on demand. It supports OpenAI API standard. It's quite similar to [[Ollama]]. | <code>llama-server</code> runs a server, and it can run models on demand. It supports OpenAI API standard. It's quite similar to [[Ollama]]. | ||
You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, <syntaxhighlight lang="bash"> | |||
llama-server \ | |||
You can manually start the server from your terminal, it's usage, is not that different from <code>llama-cli</code>, | -hf bartowski/Qwen_Qwen3-Coder-Next-GGUF:Q4_K_M \ | ||
{{Warning|Pay attention, that the service is actually called llama-cpp not llama-server}}<syntaxhighlight lang="nixos"> | --temp 1.0 --top-p 0.95 --top-k 40 | ||
</syntaxhighlight> | |||
Or alternatively, you can '''enable the NixOS service''' for llama-cpp, which runs the server.{{Warning|Pay attention, that the service is actually called llama-cpp not llama-server}}<syntaxhighlight lang="nixos"> | |||
{ | { | ||
services.llama-cpp = { | services.llama-cpp = { | ||