Ollama: Difference between revisions

m Add standalone systempackage example
m Add verbose flag hint
Line 46: Line 46:
To find out whether a model is running on CPU or GPU, you can either
To find out whether a model is running on CPU or GPU, you can either
look at the logs of  
look at the logs of  
ollama serve  
<syntaxhighlight lang="bash">
$ ollama serve  
</syntaxhighlight>
and search for "looking for compatible GPUs" and "new model will fit in available VRAM in single GPU, loading"


or while a model is answering run  
or while a model is answering run in another terminal
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ ollama ps
$ ollama ps
Line 66: Line 69:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
$ ollama run codellama:13b-instruct "Write an extended Python program with a typical structure. It should print the numbers 1 to 10 to standard output."
</syntaxhighlight>
=== See usage and speed statistics ===
Add "--verbose" to see statistics after each prompt:
<syntaxhighlight lang="bash">
$ ollama run codellama:13b-instruct --verbose "Write an extended Python program..."
...
total duration:      50.302071991s
load duration:        50.912267ms
prompt eval count:    49 token(s)
prompt eval duration: 4.654s
prompt eval rate:    10.53 tokens/s <- how fast it processed your input prompt
eval count:          182 token(s)
eval duration:        45.595s
eval rate:            3.99 tokens/s  <- how fast it printed a response
</syntaxhighlight>
</syntaxhighlight>