Ollama: Difference between revisions
Malteneuss (talk | contribs) m Add standalone systempackage example |
Malteneuss (talk | contribs) m Add verbose flag hint |
||
Line 46: | Line 46: | ||
To find out whether a model is running on CPU or GPU, you can either | To find out whether a model is running on CPU or GPU, you can either | ||
look at the logs of | look at the logs of | ||
ollama serve | <syntaxhighlight lang="bash"> | ||
$ ollama serve | |||
</syntaxhighlight> | |||
and search for "looking for compatible GPUs" and "new model will fit in available VRAM in single GPU, loading" | |||
or while a model is answering run | or while a model is answering run in another terminal | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
$ ollama ps | $ ollama ps | ||
Line 66: | Line 69: | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output." | $ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output." | ||
</syntaxhighlight> | |||
=== See usage and speed statistics === | |||
Add "--verbose" to see statistics after each prompt: | |||
<syntaxhighlight lang="bash"> | |||
$ ollama run codellama:13b-instruct --verbose "Write an extended Python program..." | |||
... | |||
total duration: 50.302071991s | |||
load duration: 50.912267ms | |||
prompt eval count: 49 token(s) | |||
prompt eval duration: 4.654s | |||
prompt eval rate: 10.53 tokens/s <- how fast it processed your input prompt | |||
eval count: 182 token(s) | |||
eval duration: 45.595s | |||
eval rate: 3.99 tokens/s <- how fast it printed a response | |||
</syntaxhighlight> | </syntaxhighlight> | ||