Documentation Index
Fetch the complete documentation index at: https://docs.honcho.dev/llms.txt
Use this file to discover all available pages before exploring further.
Most users only need the setup from the Self-Hosting Guide. This page is the full reference for customizing providers, tuning features, and hardening your deployment.
Honcho loads configuration in this priority order (highest wins):
- Environment variables (always take precedence)
.env file
config.toml file
- Built-in defaults
Use .env for secrets and overrides, config.toml for base settings. Or use environment variables exclusively — whatever fits your deployment. Copy the examples to get started:
cp .env.template .env
cp config.toml.example config.toml
Environment Variable Naming
All config values map to environment variables:
{SECTION}_{KEY} for top-level section settings (e.g., DB_CONNECTION_URI → [db].CONNECTION_URI)
{KEY} for app-level settings (e.g., LOG_LEVEL → [app].LOG_LEVEL)
- Use
__ inside {KEY} for nested settings (e.g., DIALECTIC_LEVELS__minimal__MODEL_CONFIG__TRANSPORT, DERIVER_MODEL_CONFIG__OVERRIDES__BASE_URL)
LLM Configuration
The Self-Hosting Guide covers the basic setup: either the built-in OpenAI defaults or one OpenAI-compatible endpoint/model for all features. This section covers recommended model tiers, using multiple providers, and per-feature tuning.
All Honcho agents (deriver, dialectic, dream) require tool calling. Your models must support the OpenAI tool calling format.
Choosing Models
Model choice matters more for tool-use reliability than raw intelligence:
| Tier | Example models | Use case | Notes |
|---|
| Light | Gemini 2.5 Flash, GLM-4.7-Flash | Deriver, summary, dialectic minimal/low | High throughput, cheap, reliable tool use |
| Medium | Claude Haiku 4.5, Grok 4.1 Fast | Dialectic medium/high | Good reasoning + tool use balance |
| Heavy | Claude Sonnet 4, GLM-5 | Dream, dialectic max | Best quality for rare/complex tasks |
You can mix providers freely — for example, use Gemini for the deriver and Claude for dreaming.
Provider Types
| Transport value | What it connects to | API key env var |
|---|
openai | OpenAI or any OpenAI-compatible endpoint (OpenRouter, Together, Fireworks, LiteLLM, vLLM, Ollama) | LLM_OPENAI_API_KEY |
anthropic | Anthropic Claude (direct) | LLM_ANTHROPIC_API_KEY |
gemini | Google Gemini (direct) | LLM_GEMINI_API_KEY |
For OpenAI-compatible proxies (OpenRouter, vLLM, Ollama, etc.), use transport = "openai" and set MODEL_CONFIG__OVERRIDES__BASE_URL on each feature to point at your endpoint.
Tiered Model Setup
Once you’re past initial setup, you can assign different models per feature for better cost/quality tradeoffs. This example uses OpenRouter with light/medium/heavy tiers:
LLM_OPENAI_API_KEY=sk-or-v1-...
# All features route through OpenRouter via overrides.base_url
# (You can set this on each feature's MODEL_CONFIG)
# Light tier — high throughput, cheap
DERIVER_MODEL_CONFIG__TRANSPORT=openai
DERIVER_MODEL_CONFIG__MODEL=google/gemini-2.5-flash-lite
DERIVER_MODEL_CONFIG__OVERRIDES__BASE_URL=https://openrouter.ai/api/v1
SUMMARY_MODEL_CONFIG__TRANSPORT=openai
SUMMARY_MODEL_CONFIG__MODEL=google/gemini-2.5-flash
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__MODEL=google/gemini-2.5-flash-lite
DIALECTIC_LEVELS__low__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__low__MODEL_CONFIG__MODEL=google/gemini-2.5-flash-lite
# Medium tier — better reasoning
DIALECTIC_LEVELS__medium__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__medium__MODEL_CONFIG__MODEL=anthropic/claude-haiku-4-5
DIALECTIC_LEVELS__high__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__high__MODEL_CONFIG__MODEL=anthropic/claude-haiku-4-5
DIALECTIC_LEVELS__max__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__max__MODEL_CONFIG__MODEL=anthropic/claude-haiku-4-5
# Heavy tier — best quality for complex tasks
DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT=openai
DREAM_DEDUCTION_MODEL_CONFIG__MODEL=anthropic/claude-haiku-4-5
DREAM_INDUCTION_MODEL_CONFIG__TRANSPORT=openai
DREAM_INDUCTION_MODEL_CONFIG__MODEL=anthropic/claude-haiku-4-5
Direct Vendor Keys
Instead of an OpenAI-compatible proxy, you can use vendor APIs directly. Each transport picks up its own LLM_{TRANSPORT}_API_KEY.
If you keep the built-in defaults, only LLM_OPENAI_API_KEY is required:
LLM_OPENAI_API_KEY=...
# Built-in model defaults
# - deriver: openai / gpt-5.4-mini
# - dialectic (all levels): openai / gpt-5.4-mini
# - summary: openai / gpt-5.4-mini
# - dream specialists: openai / gpt-5.4-mini
# - embeddings: openai / text-embedding-3-small
To use Gemini or Anthropic directly, override the features you want to move:
LLM_GEMINI_API_KEY=...
DERIVER_MODEL_CONFIG__TRANSPORT=gemini
DERIVER_MODEL_CONFIG__MODEL=gemini-2.5-flash
LLM_ANTHROPIC_API_KEY=...
DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT=anthropic
DREAM_DEDUCTION_MODEL_CONFIG__MODEL=claude-haiku-4-5
Self-Hosted (vLLM / Ollama)
Use transport = "openai" and set MODEL_CONFIG__OVERRIDES__BASE_URL on each feature:
# vLLM
LLM_OPENAI_API_KEY=not-needed
DERIVER_MODEL_CONFIG__TRANSPORT=openai
DERIVER_MODEL_CONFIG__MODEL=your-model-name
DERIVER_MODEL_CONFIG__OVERRIDES__BASE_URL=http://localhost:8000/v1
# Ollama
LLM_OPENAI_API_KEY=ollama
DERIVER_MODEL_CONFIG__TRANSPORT=openai
DERIVER_MODEL_CONFIG__MODEL=llama3.3:70b
DERIVER_MODEL_CONFIG__OVERRIDES__BASE_URL=http://localhost:11434/v1
Set MODEL_CONFIG__TRANSPORT, MODEL_CONFIG__MODEL, and MODEL_CONFIG__OVERRIDES__BASE_URL for each feature the same way.
The same overrides are available in config.toml:
[deriver.model_config]
transport = "openai"
model = "my-local-model"
[deriver.model_config.overrides]
base_url = "http://localhost:8000/v1"
api_key_env = "DERIVER_LOCAL_API_KEY"
Thinking Budget
Built-in defaults do not set MODEL_CONFIG__THINKING_BUDGET_TOKENS or MODEL_CONFIG__THINKING_EFFORT. Add one only when your chosen model supports it.
Use MODEL_CONFIG__THINKING_EFFORT for OpenAI reasoning models:
DERIVER_MODEL_CONFIG__THINKING_EFFORT=minimal
DIALECTIC_LEVELS__max__MODEL_CONFIG__THINKING_EFFORT=medium
Use MODEL_CONFIG__THINKING_BUDGET_TOKENS for Anthropic and Gemini models. Set it to 0 or omit it for providers that don’t support extended thinking:
SUMMARY_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024
DREAM_DEDUCTION_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024
Provider-Specific Parameters
Each model config supports an overrides.provider_params dict for passing arbitrary parameters to the underlying provider SDK. Use this for vendor-specific features that aren’t part of the standard config:
[deriver.model_config.overrides.provider_params]
# These are passed directly to the provider SDK
verbosity = "low"
Changing Transport
When changing a feature’s transport, always specify model explicitly. Partial overrides that change transport without model will keep the previous model name, which may not be valid for the new provider.
General LLM Settings
LLM_DEFAULT_MAX_TOKENS=2500
# Tool output limits (to prevent token explosion)
LLM_MAX_TOOL_OUTPUT_CHARS=10000 # ~2500 tokens at 4 chars/token
LLM_MAX_MESSAGE_CONTENT_CHARS=2000 # Max chars per message in tool results
Embedding Configuration
Embeddings use their own nested model config, separate from the main text-generation LLM settings.
# Embedding vector settings
EMBEDDING_VECTOR_DIMENSIONS=1536
EMBEDDING_MAX_INPUT_TOKENS=8192
EMBEDDING_MAX_TOKENS_PER_REQUEST=300000
# Embedding transport/model selection
EMBEDDING_MODEL_CONFIG__TRANSPORT=openai # openai, gemini
EMBEDDING_MODEL_CONFIG__MODEL=text-embedding-3-small
# Optional endpoint overrides
EMBEDDING_MODEL_CONFIG__OVERRIDES__BASE_URL=http://localhost:8000/v1
EMBEDDING_MODEL_CONFIG__OVERRIDES__API_KEY_ENV=EMBEDDING_CUSTOM_API_KEY
Current constraint:
EMBEDDING_VECTOR_DIMENSIONS can be changed for fully migrated external vector stores, but pgvector and dual-write mode still require 1536 until the schema migration lands.
Feature-Specific Model Configuration
Each feature can use a different provider and model. Below are all the tuning knobs.
Dialectic API:
The Dialectic API provides theory-of-mind informed responses. It uses a tiered reasoning system with five levels:
# Global dialectic settings
DIALECTIC_MAX_OUTPUT_TOKENS=8192
DIALECTIC_MAX_INPUT_TOKENS=100000
DIALECTIC_HISTORY_TOKEN_LIMIT=8192
DIALECTIC_SESSION_HISTORY_MAX_TOKENS=4096
Per-Level Configuration:
Each reasoning level has its own provider, model, and settings:
# config.toml example
[dialectic.levels.minimal]
MAX_TOOL_ITERATIONS = 1
MAX_OUTPUT_TOKENS = 250
TOOL_CHOICE = "any"
[dialectic.levels.minimal.model_config]
transport = "openai"
model = "gpt-5.4-mini"
[dialectic.levels.low]
MAX_TOOL_ITERATIONS = 5
TOOL_CHOICE = "any"
[dialectic.levels.low.model_config]
transport = "openai"
model = "gpt-5.4-mini"
[dialectic.levels.medium]
MAX_TOOL_ITERATIONS = 2
[dialectic.levels.medium.model_config]
transport = "openai"
model = "gpt-5.4-mini"
[dialectic.levels.high]
MAX_TOOL_ITERATIONS = 4
[dialectic.levels.high.model_config]
transport = "openai"
model = "gpt-5.4-mini"
[dialectic.levels.max]
MAX_TOOL_ITERATIONS = 10
[dialectic.levels.max.model_config]
transport = "openai"
model = "gpt-5.4-mini"
Environment variables for nested levels use double underscores:
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__MODEL=gpt-5.4-mini
DIALECTIC_LEVELS__minimal__MAX_TOOL_ITERATIONS=1
DIALECTIC_LEVELS__minimal__MAX_OUTPUT_TOKENS=250
DIALECTIC_LEVELS__minimal__TOOL_CHOICE=any
Deriver (Theory of Mind):
The Deriver extracts facts from messages and builds theory-of-mind representations of peers.
DERIVER_ENABLED=true
# LLM settings
DERIVER_MODEL_CONFIG__TRANSPORT=openai
DERIVER_MODEL_CONFIG__MODEL=gpt-5.4-mini
DERIVER_MAX_INPUT_TOKENS=23000
# DERIVER_MODEL_CONFIG__THINKING_EFFORT=minimal
# DERIVER_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024
# DERIVER_MODEL_CONFIG__TEMPERATURE=0.7 # Optional temperature override
# Backup model (optional)
# DERIVER_MODEL_CONFIG__FALLBACK__MODEL=claude-haiku-4-5
# DERIVER_MODEL_CONFIG__FALLBACK__TRANSPORT=anthropic
# Worker settings
DERIVER_WORKERS=1 # Increase for higher throughput
DERIVER_POLLING_SLEEP_INTERVAL_SECONDS=1.0
DERIVER_STALE_SESSION_TIMEOUT_MINUTES=5
# Queue management
DERIVER_QUEUE_ERROR_RETENTION_SECONDS=2592000 # 30 days
# Observation settings
DERIVER_DEDUPLICATE=true
DERIVER_LOG_OBSERVATIONS=false
DERIVER_WORKING_REPRESENTATION_MAX_OBSERVATIONS=100
DERIVER_REPRESENTATION_BATCH_MAX_TOKENS=1024
Peer Card:
Summary Generation:
Session summaries provide compressed context for long conversations — short summaries (frequent) and long summaries (comprehensive).
SUMMARY_ENABLED=true
SUMMARY_MODEL_CONFIG__TRANSPORT=openai
SUMMARY_MODEL_CONFIG__MODEL=gpt-5.4-mini
SUMMARY_MAX_TOKENS_SHORT=1000
SUMMARY_MAX_TOKENS_LONG=4000
# SUMMARY_MODEL_CONFIG__THINKING_EFFORT=minimal
# SUMMARY_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024
SUMMARY_MESSAGES_PER_SHORT_SUMMARY=20
SUMMARY_MESSAGES_PER_LONG_SUMMARY=60
Dream Processing:
Dream processing consolidates and refines peer representations during idle periods.
DREAM_ENABLED=true
DREAM_DOCUMENT_THRESHOLD=50
DREAM_IDLE_TIMEOUT_MINUTES=60
DREAM_MIN_HOURS_BETWEEN_DREAMS=8
DREAM_ENABLED_TYPES=["omni"]
DREAM_MAX_TOOL_ITERATIONS=20
DREAM_HISTORY_TOKEN_LIMIT=16384
# Specialist model configs (each is independent)
DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT=openai
DREAM_DEDUCTION_MODEL_CONFIG__MODEL=gpt-5.4-mini
DREAM_INDUCTION_MODEL_CONFIG__TRANSPORT=openai
DREAM_INDUCTION_MODEL_CONFIG__MODEL=gpt-5.4-mini
Surprisal-Based Sampling (Advanced):
Optional subsystem for identifying unusual observations during dreaming:
DREAM_SURPRISAL__ENABLED=false
DREAM_SURPRISAL__TREE_TYPE=kdtree
DREAM_SURPRISAL__TREE_K=5
DREAM_SURPRISAL__SAMPLING_STRATEGY=recent
DREAM_SURPRISAL__SAMPLE_SIZE=200
DREAM_SURPRISAL__TOP_PERCENT_SURPRISAL=0.10
DREAM_SURPRISAL__MIN_HIGH_SURPRISAL_FOR_REPLACE=10
DREAM_SURPRISAL__INCLUDE_LEVELS=["explicit", "deductive"]
Core Configuration
Application Settings
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL
SESSION_OBSERVERS_LIMIT=10
GET_CONTEXT_MAX_TOKENS=100000
MAX_MESSAGE_SIZE=25000
MAX_FILE_SIZE=5242880 # 5MB
EMBED_MESSAGES=true
EMBEDDING_MAX_INPUT_TOKENS=8192
EMBEDDING_MAX_TOKENS_PER_REQUEST=300000
NAMESPACE=honcho
Optional Integrations:
LANGFUSE_HOST=https://cloud.langfuse.com
LANGFUSE_PUBLIC_KEY=your-langfuse-public-key
COLLECT_METRICS_LOCAL=false
LOCAL_METRICS_FILE=metrics.jsonl
REASONING_TRACES_FILE=traces.jsonl
Database
# Connection (required)
DB_CONNECTION_URI=postgresql+psycopg://postgres:postgres@localhost:5432/postgres
# Pool settings
DB_SCHEMA=public
DB_POOL_PRE_PING=true
DB_POOL_SIZE=10
DB_MAX_OVERFLOW=20
DB_POOL_TIMEOUT=30
DB_POOL_RECYCLE=300
DB_POOL_USE_LIFO=true
DB_SQL_DEBUG=false
Authentication
AUTH_USE_AUTH=false # Set to true to require JWT tokens
AUTH_JWT_SECRET=your-super-secret-jwt-key # Required when auth is enabled
Generate a secret: python scripts/generate_jwt_secret.py
Cache (Redis)
Redis caching is optional. Honcho works without it but benefits from caching in high-traffic scenarios.
CACHE_ENABLED=false
CACHE_URL=redis://localhost:6379/0?suppress=true
CACHE_NAMESPACE=honcho
CACHE_DEFAULT_TTL_SECONDS=300
CACHE_DEFAULT_LOCK_TTL_SECONDS=5 # Cache stampede prevention
Webhooks
WEBHOOK_SECRET=your-webhook-signing-secret
WEBHOOK_MAX_WORKSPACE_LIMIT=10
Vector Store
VECTOR_STORE_TYPE=pgvector # Options: pgvector, turbopuffer, lancedb
VECTOR_STORE_MIGRATED=false
VECTOR_STORE_NAMESPACE=honcho
VECTOR_STORE_DIMENSIONS=1536
# Turbopuffer-specific
VECTOR_STORE_TURBOPUFFER_API_KEY=your-turbopuffer-api-key
VECTOR_STORE_TURBOPUFFER_REGION=us-east-1
# LanceDB-specific
VECTOR_STORE_LANCEDB_PATH=./lancedb_data
Monitoring
Prometheus Metrics
Honcho exposes /metrics endpoints for scraping:
- API process: Port 8000
- Deriver process: Port 9090
METRICS_ENABLED=false
METRICS_NAMESPACE=honcho
CloudEvents Telemetry
TELEMETRY_ENABLED=false
TELEMETRY_ENDPOINT=https://telemetry.honcho.dev/v1/events
TELEMETRY_HEADERS='{"Authorization": "Bearer your-token"}'
TELEMETRY_BATCH_SIZE=100
TELEMETRY_FLUSH_INTERVAL_SECONDS=1.0
TELEMETRY_MAX_RETRIES=3
TELEMETRY_MAX_BUFFER_SIZE=10000
Sentry
SENTRY_ENABLED=false
SENTRY_DSN=https://your-sentry-dsn@sentry.io/project-id
SENTRY_ENVIRONMENT=production
SENTRY_TRACES_SAMPLE_RATE=0.1
SENTRY_PROFILES_SAMPLE_RATE=0.1
Reference config.toml
A complete config.toml with all defaults. Copy and modify what you need:
[app]
LOG_LEVEL = "INFO"
SESSION_OBSERVERS_LIMIT = 10
EMBED_MESSAGES = true
NAMESPACE = "honcho"
[db]
CONNECTION_URI = "postgresql+psycopg://postgres:postgres@localhost:5432/postgres"
POOL_SIZE = 10
MAX_OVERFLOW = 20
[auth]
USE_AUTH = false
[cache]
ENABLED = false
URL = "redis://localhost:6379/0?suppress=true"
DEFAULT_TTL_SECONDS = 300
[deriver]
ENABLED = true
WORKERS = 1
[deriver.model_config]
transport = "openai"
model = "gpt-5.4-mini"
[peer_card]
ENABLED = true
[dialectic]
MAX_OUTPUT_TOKENS = 8192
[dialectic.levels.minimal]
MAX_TOOL_ITERATIONS = 1
MAX_OUTPUT_TOKENS = 250
TOOL_CHOICE = "any"
[dialectic.levels.minimal.model_config]
transport = "openai"
model = "gpt-5.4-mini"
[dialectic.levels.low]
MAX_TOOL_ITERATIONS = 5
TOOL_CHOICE = "any"
[dialectic.levels.low.model_config]
transport = "openai"
model = "gpt-5.4-mini"
[dialectic.levels.medium]
MAX_TOOL_ITERATIONS = 2
[dialectic.levels.medium.model_config]
transport = "openai"
model = "gpt-5.4-mini"
[dialectic.levels.high]
MAX_TOOL_ITERATIONS = 4
[dialectic.levels.high.model_config]
transport = "openai"
model = "gpt-5.4-mini"
[dialectic.levels.max]
MAX_TOOL_ITERATIONS = 10
[dialectic.levels.max.model_config]
transport = "openai"
model = "gpt-5.4-mini"
[summary]
ENABLED = true
MAX_TOKENS_SHORT = 1000
MAX_TOKENS_LONG = 4000
[summary.model_config]
transport = "openai"
model = "gpt-5.4-mini"
[dream]
ENABLED = true
[dream.deduction_model_config]
transport = "openai"
model = "gpt-5.4-mini"
[dream.induction_model_config]
transport = "openai"
model = "gpt-5.4-mini"
[webhook]
MAX_WORKSPACE_LIMIT = 10
[metrics]
ENABLED = false
[telemetry]
ENABLED = false
[vector_store]
TYPE = "pgvector"
[sentry]
ENABLED = false
Database Migrations
uv run alembic current # Check status
uv run alembic upgrade head # Upgrade to latest
uv run alembic downgrade <rev> # Downgrade to specific revision
uv run alembic revision --autogenerate -m "Description" # Create new migration
Troubleshooting
-
Database connection errors — Ensure
DB_CONNECTION_URI uses postgresql+psycopg:// prefix. Verify database is running and pgvector extension is installed.
-
Authentication issues — Generate and set
AUTH_JWT_SECRET when AUTH_USE_AUTH=true. Use python scripts/generate_jwt_secret.py.
-
LLM provider errors — Verify API keys are set. Check model names match your provider’s format. Ensure models support tool calling.
-
Deriver not processing — Check logs. Increase
DERIVER_WORKERS for throughput. Verify database and LLM connectivity.
-
Dialectic level issues — Unset level fields inherit from the built-in defaults. For Anthropic,
THINKING_BUDGET_TOKENS must be >= 1024 when enabled. For providers without budgeted thinking, omit it or set it to 0. MAX_OUTPUT_TOKENS must exceed THINKING_BUDGET_TOKENS.
-
Vector store issues — For Turbopuffer, set the API key. Check
VECTOR_STORE_DIMENSIONS matches your embedding model.