Supported LLM Providers
Terminal Agent supports multiple LLM providers, giving you flexibility to choose the one that best meets your needs.
Configuring Providers
You can set your preferred provider and model using the config command:
Provider-Specific Setup
Llama.cpp
Setup: 1. Obtain a local GGUF model file. 2. Install llama.cpp shared libraries compatible with your machine. 3. Set the runtime library path:
4. Add a model alias to~/.config/terminal-agent/config.json under llama_models.
Example Linux CPU setup:
go install github.com/hybridgroup/yzma@v1.14.1
mkdir -p ~/.local/share/yzma/lib
~/go/bin/yzma install --lib ~/.local/share/yzma/lib --processor cpu --version b9180
export YZMA_LIB=$HOME/.local/share/yzma/lib
There are matching Taskfile helpers:
Configuration:
Example config:
{
"default_provider": "llama",
"providers": {
"llama": "llama3.2"
},
"llama_models": {
"llama3.2": "/absolute/path/to/llama3.2.gguf"
}
}
Special Features:
- Direct local inference without a separate HTTP server
- Supports streaming output with the --stream flag
- Uses the model's chat template when available
- Supports runtime device selection with --device auto|cpu|gpu and agent config set device ...
Limitations:
- Requires YZMA_LIB to point at the directory containing local llama.cpp shared libraries
- Requires local GGUF model files and alias configuration
- Currently supports ask and GUI query flows only
- Does not currently support the task command's tool usage capability
- The documented Linux install path uses llama.cpp runtime build b9180
--device and the device config key affect only the direct llama provider. They do not change ollama or remote provider execution.
OpenAI
Setup: 1. Create an account at OpenAI 2. Generate an API key 3. Set the key as an environment variable:
Configuration:
Custom API Endpoints: If you're using an OpenAI-compatible API endpoint (e.g., Azure OpenAI, local LLM servers), you can set a custom base URL:
Recommended Models:
- gpt-4o-mini - Good balance of capability and cost
- gpt-4o - Higher capability, higher cost
- gpt-3.5-turbo - Faster, less capable
Special Features:
- Supports streaming output with the --stream flag
- Tool usage capability for the task command
- Compatible with OpenAI-compatible endpoints via OPENAI_BASE_URL
Anthropic
Setup: 1. Create an account at Anthropic 2. Generate an API key from the Console 3. Set the key as an environment variable:
Configuration:
Recommended Models:
- claude-3-5-sonnet-latest - High capability
- claude-3-haiku-20240307 - Faster, more economical
- claude-3-opus-20240229 - Highest capability
Special Features:
- Excellent at complex reasoning tasks
- Tool usage capability for the task command
- Supports streaming output with the --stream flag
Google (Gemini)
Setup: 1. Get access to Google AI Studio 2. Generate an API key 3. Set the key as an environment variable:
Configuration:
Recommended Models:
- gemini-2.0-flash-lite - Fast response times
- gemini-2.0-pro - More capable model
Special Features:
- Tool usage capability for the task command
- Good integration with web search capabilities
Ollama
Setup:
1. Make sure you have Ollama installed
2. Make sure you downloaded the model you'd like to use, e.g. ollama pull llama3.2
Configuration:
In case your Ollama serve is running on a non-default server you can set the URL via env variable, i.e.
Special Features:
- Supports streaming output with the --stream flag
- Supports tool usage for the task command
Amazon Bedrock
Setup: 1. Set up an AWS account with Bedrock access 2. Configure your AWS credentials as usual:
# Either through AWS CLI:
aws configure
# Or through environment variables:
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION=your_region
Configuration:
Recommended Models:
- anthropic.claude-3-haiku-20240307-v1:0 - Good balance of capability and cost
- anthropic.claude-3-sonnet-20240229-v1:0 - Higher capability
- meta.llama3-8b-instruct-v1:0 - Open source alternative
Special Features:
- Access to multiple model families through one provider
- Tool usage capability for the task command
- Supports streaming output with the --stream flag
Perplexity
Setup: 1. Create an account at Perplexity AI 2. Generate an API key 3. Set the key as an environment variable:
Configuration:
Recommended Models:
- llama-3-8b-instruct - Compact open-source model
- llama-3-70b-instruct - More capable open-source model
- llama-3.1-8b-instruct - Updated version
Limitations:
- Does not support streaming output
- Does not support the task command's tool usage capability
Performance Considerations
Different providers excel at different tasks:
- Complex reasoning: Anthropic Claude models (direct or via Bedrock)
- Speed and cost-efficiency: OpenAI's GPT-3.5, Google's Gemini Flash models
- Creative tasks: OpenAI's GPT-4 series, Anthropic Claude Opus
- Open-source options: Llama models via the local
llamaprovider, Ollama, Perplexity, or Bedrock