Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.honcho.dev/llms.txt

Use this file to discover all available pages before exploring further.

Most users only need the setup from the Self-Hosting Guide. This page is the full reference for customizing providers, tuning features, and hardening your deployment.
Honcho loads configuration in this priority order (highest wins):
  1. Environment variables (always take precedence)
  2. .env file
  3. config.toml file
  4. Built-in defaults
Use .env for secrets and overrides, config.toml for base settings. Or use environment variables exclusively — whatever fits your deployment. Copy the examples to get started:
cp .env.template .env
cp config.toml.example config.toml

Environment Variable Naming

All config values map to environment variables:
  • {SECTION}_{KEY} for top-level section settings (e.g., DB_CONNECTION_URI[db].CONNECTION_URI)
  • {KEY} for app-level settings (e.g., LOG_LEVEL[app].LOG_LEVEL)
  • Use __ inside {KEY} for nested settings (e.g., DIALECTIC_LEVELS__minimal__MODEL_CONFIG__TRANSPORT, DERIVER_MODEL_CONFIG__OVERRIDES__BASE_URL)

LLM Configuration

The Self-Hosting Guide covers the basic setup: either the built-in OpenAI defaults or one OpenAI-compatible endpoint/model for all features. This section covers recommended model tiers, using multiple providers, and per-feature tuning.
All Honcho agents (deriver, dialectic, dream) require tool calling. Your models must support the OpenAI tool calling format.

Choosing Models

Model choice matters more for tool-use reliability than raw intelligence:
TierExample modelsUse caseNotes
LightGemini 2.5 Flash, GLM-4.7-FlashDeriver, summary, dialectic minimal/lowHigh throughput, cheap, reliable tool use
MediumClaude Haiku 4.5, Grok 4.1 FastDialectic medium/highGood reasoning + tool use balance
HeavyClaude Sonnet 4, GLM-5Dream, dialectic maxBest quality for rare/complex tasks
You can mix providers freely — for example, use Gemini for the deriver and Claude for dreaming.

Provider Types

Transport valueWhat it connects toAPI key env var
openaiOpenAI or any OpenAI-compatible endpoint (OpenRouter, Together, Fireworks, LiteLLM, vLLM, Ollama)LLM_OPENAI_API_KEY
anthropicAnthropic Claude (direct)LLM_ANTHROPIC_API_KEY
geminiGoogle Gemini (direct)LLM_GEMINI_API_KEY
For OpenAI-compatible proxies (OpenRouter, vLLM, Ollama, etc.), use transport = "openai" and set MODEL_CONFIG__OVERRIDES__BASE_URL on each feature to point at your endpoint.

Tiered Model Setup

Once you’re past initial setup, you can assign different models per feature for better cost/quality tradeoffs. This example uses OpenRouter with light/medium/heavy tiers:
LLM_OPENAI_API_KEY=sk-or-v1-...

# All features route through OpenRouter via overrides.base_url
# (You can set this on each feature's MODEL_CONFIG)

# Light tier — high throughput, cheap
DERIVER_MODEL_CONFIG__TRANSPORT=openai
DERIVER_MODEL_CONFIG__MODEL=google/gemini-2.5-flash-lite
DERIVER_MODEL_CONFIG__OVERRIDES__BASE_URL=https://openrouter.ai/api/v1
SUMMARY_MODEL_CONFIG__TRANSPORT=openai
SUMMARY_MODEL_CONFIG__MODEL=google/gemini-2.5-flash
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__MODEL=google/gemini-2.5-flash-lite
DIALECTIC_LEVELS__low__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__low__MODEL_CONFIG__MODEL=google/gemini-2.5-flash-lite

# Medium tier — better reasoning
DIALECTIC_LEVELS__medium__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__medium__MODEL_CONFIG__MODEL=anthropic/claude-haiku-4-5
DIALECTIC_LEVELS__high__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__high__MODEL_CONFIG__MODEL=anthropic/claude-haiku-4-5
DIALECTIC_LEVELS__max__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__max__MODEL_CONFIG__MODEL=anthropic/claude-haiku-4-5

# Heavy tier — best quality for complex tasks
DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT=openai
DREAM_DEDUCTION_MODEL_CONFIG__MODEL=anthropic/claude-haiku-4-5
DREAM_INDUCTION_MODEL_CONFIG__TRANSPORT=openai
DREAM_INDUCTION_MODEL_CONFIG__MODEL=anthropic/claude-haiku-4-5

Direct Vendor Keys

Instead of an OpenAI-compatible proxy, you can use vendor APIs directly. Each transport picks up its own LLM_{TRANSPORT}_API_KEY. If you keep the built-in defaults, only LLM_OPENAI_API_KEY is required:
LLM_OPENAI_API_KEY=...

# Built-in model defaults
# - deriver: openai / gpt-5.4-mini
# - dialectic (all levels): openai / gpt-5.4-mini
# - summary: openai / gpt-5.4-mini
# - dream specialists: openai / gpt-5.4-mini
# - embeddings: openai / text-embedding-3-small
To use Gemini or Anthropic directly, override the features you want to move:
LLM_GEMINI_API_KEY=...
DERIVER_MODEL_CONFIG__TRANSPORT=gemini
DERIVER_MODEL_CONFIG__MODEL=gemini-2.5-flash

LLM_ANTHROPIC_API_KEY=...
DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT=anthropic
DREAM_DEDUCTION_MODEL_CONFIG__MODEL=claude-haiku-4-5

Self-Hosted (vLLM / Ollama)

Use transport = "openai" and set MODEL_CONFIG__OVERRIDES__BASE_URL on each feature:
# vLLM
LLM_OPENAI_API_KEY=not-needed
DERIVER_MODEL_CONFIG__TRANSPORT=openai
DERIVER_MODEL_CONFIG__MODEL=your-model-name
DERIVER_MODEL_CONFIG__OVERRIDES__BASE_URL=http://localhost:8000/v1

# Ollama
LLM_OPENAI_API_KEY=ollama
DERIVER_MODEL_CONFIG__TRANSPORT=openai
DERIVER_MODEL_CONFIG__MODEL=llama3.3:70b
DERIVER_MODEL_CONFIG__OVERRIDES__BASE_URL=http://localhost:11434/v1
Set MODEL_CONFIG__TRANSPORT, MODEL_CONFIG__MODEL, and MODEL_CONFIG__OVERRIDES__BASE_URL for each feature the same way. The same overrides are available in config.toml:
[deriver.model_config]
transport = "openai"
model = "my-local-model"

[deriver.model_config.overrides]
base_url = "http://localhost:8000/v1"
api_key_env = "DERIVER_LOCAL_API_KEY"

Thinking Budget

Built-in defaults do not set MODEL_CONFIG__THINKING_BUDGET_TOKENS or MODEL_CONFIG__THINKING_EFFORT. Add one only when your chosen model supports it. Use MODEL_CONFIG__THINKING_EFFORT for OpenAI reasoning models:
DERIVER_MODEL_CONFIG__THINKING_EFFORT=minimal
DIALECTIC_LEVELS__max__MODEL_CONFIG__THINKING_EFFORT=medium
Use MODEL_CONFIG__THINKING_BUDGET_TOKENS for Anthropic and Gemini models. Set it to 0 or omit it for providers that don’t support extended thinking:
SUMMARY_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024
DREAM_DEDUCTION_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024

Provider-Specific Parameters

Each model config supports an overrides.provider_params dict for passing arbitrary parameters to the underlying provider SDK. Use this for vendor-specific features that aren’t part of the standard config:
[deriver.model_config.overrides.provider_params]
# These are passed directly to the provider SDK
verbosity = "low"

Changing Transport

When changing a feature’s transport, always specify model explicitly. Partial overrides that change transport without model will keep the previous model name, which may not be valid for the new provider.

General LLM Settings

LLM_DEFAULT_MAX_TOKENS=2500

# Tool output limits (to prevent token explosion)
LLM_MAX_TOOL_OUTPUT_CHARS=10000  # ~2500 tokens at 4 chars/token
LLM_MAX_MESSAGE_CONTENT_CHARS=2000  # Max chars per message in tool results

Embedding Configuration

Embeddings use their own nested model config, separate from the main text-generation LLM settings.
# Embedding vector settings
EMBEDDING_VECTOR_DIMENSIONS=1536
EMBEDDING_MAX_INPUT_TOKENS=8192
EMBEDDING_MAX_TOKENS_PER_REQUEST=300000

# Embedding transport/model selection
EMBEDDING_MODEL_CONFIG__TRANSPORT=openai  # openai, gemini
EMBEDDING_MODEL_CONFIG__MODEL=text-embedding-3-small

# Optional endpoint overrides
EMBEDDING_MODEL_CONFIG__OVERRIDES__BASE_URL=http://localhost:8000/v1
EMBEDDING_MODEL_CONFIG__OVERRIDES__API_KEY_ENV=EMBEDDING_CUSTOM_API_KEY
Current constraint:
  • EMBEDDING_VECTOR_DIMENSIONS can be changed for fully migrated external vector stores, but pgvector and dual-write mode still require 1536 until the schema migration lands.

Feature-Specific Model Configuration

Each feature can use a different provider and model. Below are all the tuning knobs. Dialectic API: The Dialectic API provides theory-of-mind informed responses. It uses a tiered reasoning system with five levels:
# Global dialectic settings
DIALECTIC_MAX_OUTPUT_TOKENS=8192
DIALECTIC_MAX_INPUT_TOKENS=100000
DIALECTIC_HISTORY_TOKEN_LIMIT=8192
DIALECTIC_SESSION_HISTORY_MAX_TOKENS=4096
Per-Level Configuration: Each reasoning level has its own provider, model, and settings:
# config.toml example
[dialectic.levels.minimal]
MAX_TOOL_ITERATIONS = 1
MAX_OUTPUT_TOKENS = 250
TOOL_CHOICE = "any"

[dialectic.levels.minimal.model_config]
transport = "openai"
model = "gpt-5.4-mini"

[dialectic.levels.low]
MAX_TOOL_ITERATIONS = 5
TOOL_CHOICE = "any"

[dialectic.levels.low.model_config]
transport = "openai"
model = "gpt-5.4-mini"

[dialectic.levels.medium]
MAX_TOOL_ITERATIONS = 2

[dialectic.levels.medium.model_config]
transport = "openai"
model = "gpt-5.4-mini"

[dialectic.levels.high]
MAX_TOOL_ITERATIONS = 4

[dialectic.levels.high.model_config]
transport = "openai"
model = "gpt-5.4-mini"

[dialectic.levels.max]
MAX_TOOL_ITERATIONS = 10

[dialectic.levels.max.model_config]
transport = "openai"
model = "gpt-5.4-mini"
Environment variables for nested levels use double underscores:
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__MODEL=gpt-5.4-mini
DIALECTIC_LEVELS__minimal__MAX_TOOL_ITERATIONS=1
DIALECTIC_LEVELS__minimal__MAX_OUTPUT_TOKENS=250
DIALECTIC_LEVELS__minimal__TOOL_CHOICE=any
Deriver (Theory of Mind): The Deriver extracts facts from messages and builds theory-of-mind representations of peers.
DERIVER_ENABLED=true

# LLM settings
DERIVER_MODEL_CONFIG__TRANSPORT=openai
DERIVER_MODEL_CONFIG__MODEL=gpt-5.4-mini
DERIVER_MAX_INPUT_TOKENS=23000
# DERIVER_MODEL_CONFIG__THINKING_EFFORT=minimal
# DERIVER_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024
# DERIVER_MODEL_CONFIG__TEMPERATURE=0.7  # Optional temperature override

# Backup model (optional)
# DERIVER_MODEL_CONFIG__FALLBACK__MODEL=claude-haiku-4-5
# DERIVER_MODEL_CONFIG__FALLBACK__TRANSPORT=anthropic

# Worker settings
DERIVER_WORKERS=1  # Increase for higher throughput
DERIVER_POLLING_SLEEP_INTERVAL_SECONDS=1.0
DERIVER_STALE_SESSION_TIMEOUT_MINUTES=5

# Queue management
DERIVER_QUEUE_ERROR_RETENTION_SECONDS=2592000  # 30 days

# Observation settings
DERIVER_DEDUPLICATE=true
DERIVER_LOG_OBSERVATIONS=false
DERIVER_WORKING_REPRESENTATION_MAX_OBSERVATIONS=100
DERIVER_REPRESENTATION_BATCH_MAX_TOKENS=1024
Peer Card:
PEER_CARD_ENABLED=true
Summary Generation: Session summaries provide compressed context for long conversations — short summaries (frequent) and long summaries (comprehensive).
SUMMARY_ENABLED=true
SUMMARY_MODEL_CONFIG__TRANSPORT=openai
SUMMARY_MODEL_CONFIG__MODEL=gpt-5.4-mini
SUMMARY_MAX_TOKENS_SHORT=1000
SUMMARY_MAX_TOKENS_LONG=4000
# SUMMARY_MODEL_CONFIG__THINKING_EFFORT=minimal
# SUMMARY_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024
SUMMARY_MESSAGES_PER_SHORT_SUMMARY=20
SUMMARY_MESSAGES_PER_LONG_SUMMARY=60
Dream Processing: Dream processing consolidates and refines peer representations during idle periods.
DREAM_ENABLED=true
DREAM_DOCUMENT_THRESHOLD=50
DREAM_IDLE_TIMEOUT_MINUTES=60
DREAM_MIN_HOURS_BETWEEN_DREAMS=8
DREAM_ENABLED_TYPES=["omni"]
DREAM_MAX_TOOL_ITERATIONS=20
DREAM_HISTORY_TOKEN_LIMIT=16384

# Specialist model configs (each is independent)
DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT=openai
DREAM_DEDUCTION_MODEL_CONFIG__MODEL=gpt-5.4-mini
DREAM_INDUCTION_MODEL_CONFIG__TRANSPORT=openai
DREAM_INDUCTION_MODEL_CONFIG__MODEL=gpt-5.4-mini
Surprisal-Based Sampling (Advanced): Optional subsystem for identifying unusual observations during dreaming:
DREAM_SURPRISAL__ENABLED=false
DREAM_SURPRISAL__TREE_TYPE=kdtree
DREAM_SURPRISAL__TREE_K=5
DREAM_SURPRISAL__SAMPLING_STRATEGY=recent
DREAM_SURPRISAL__SAMPLE_SIZE=200
DREAM_SURPRISAL__TOP_PERCENT_SURPRISAL=0.10
DREAM_SURPRISAL__MIN_HIGH_SURPRISAL_FOR_REPLACE=10
DREAM_SURPRISAL__INCLUDE_LEVELS=["explicit", "deductive"]

Core Configuration

Application Settings

LOG_LEVEL=INFO  # DEBUG, INFO, WARNING, ERROR, CRITICAL
SESSION_OBSERVERS_LIMIT=10
GET_CONTEXT_MAX_TOKENS=100000
MAX_MESSAGE_SIZE=25000
MAX_FILE_SIZE=5242880  # 5MB
EMBED_MESSAGES=true
EMBEDDING_MAX_INPUT_TOKENS=8192
EMBEDDING_MAX_TOKENS_PER_REQUEST=300000
NAMESPACE=honcho
Optional Integrations:
LANGFUSE_HOST=https://cloud.langfuse.com
LANGFUSE_PUBLIC_KEY=your-langfuse-public-key
COLLECT_METRICS_LOCAL=false
LOCAL_METRICS_FILE=metrics.jsonl
REASONING_TRACES_FILE=traces.jsonl

Database

# Connection (required)
DB_CONNECTION_URI=postgresql+psycopg://postgres:postgres@localhost:5432/postgres

# Pool settings
DB_SCHEMA=public
DB_POOL_PRE_PING=true
DB_POOL_SIZE=10
DB_MAX_OVERFLOW=20
DB_POOL_TIMEOUT=30
DB_POOL_RECYCLE=300
DB_POOL_USE_LIFO=true
DB_SQL_DEBUG=false

Authentication

AUTH_USE_AUTH=false  # Set to true to require JWT tokens
AUTH_JWT_SECRET=your-super-secret-jwt-key  # Required when auth is enabled
Generate a secret: python scripts/generate_jwt_secret.py

Cache (Redis)

Redis caching is optional. Honcho works without it but benefits from caching in high-traffic scenarios.
CACHE_ENABLED=false
CACHE_URL=redis://localhost:6379/0?suppress=true
CACHE_NAMESPACE=honcho
CACHE_DEFAULT_TTL_SECONDS=300
CACHE_DEFAULT_LOCK_TTL_SECONDS=5  # Cache stampede prevention

Webhooks

WEBHOOK_SECRET=your-webhook-signing-secret
WEBHOOK_MAX_WORKSPACE_LIMIT=10

Vector Store

VECTOR_STORE_TYPE=pgvector  # Options: pgvector, turbopuffer, lancedb
VECTOR_STORE_MIGRATED=false
VECTOR_STORE_NAMESPACE=honcho
VECTOR_STORE_DIMENSIONS=1536

# Turbopuffer-specific
VECTOR_STORE_TURBOPUFFER_API_KEY=your-turbopuffer-api-key
VECTOR_STORE_TURBOPUFFER_REGION=us-east-1

# LanceDB-specific
VECTOR_STORE_LANCEDB_PATH=./lancedb_data

Monitoring

Prometheus Metrics

Honcho exposes /metrics endpoints for scraping:
  • API process: Port 8000
  • Deriver process: Port 9090
METRICS_ENABLED=false
METRICS_NAMESPACE=honcho

CloudEvents Telemetry

TELEMETRY_ENABLED=false
TELEMETRY_ENDPOINT=https://telemetry.honcho.dev/v1/events
TELEMETRY_HEADERS='{"Authorization": "Bearer your-token"}'
TELEMETRY_BATCH_SIZE=100
TELEMETRY_FLUSH_INTERVAL_SECONDS=1.0
TELEMETRY_MAX_RETRIES=3
TELEMETRY_MAX_BUFFER_SIZE=10000

Sentry

SENTRY_ENABLED=false
SENTRY_DSN=https://your-sentry-dsn@sentry.io/project-id
SENTRY_ENVIRONMENT=production
SENTRY_TRACES_SAMPLE_RATE=0.1
SENTRY_PROFILES_SAMPLE_RATE=0.1

Reference config.toml

A complete config.toml with all defaults. Copy and modify what you need:
[app]
LOG_LEVEL = "INFO"
SESSION_OBSERVERS_LIMIT = 10
EMBED_MESSAGES = true
NAMESPACE = "honcho"

[db]
CONNECTION_URI = "postgresql+psycopg://postgres:postgres@localhost:5432/postgres"
POOL_SIZE = 10
MAX_OVERFLOW = 20

[auth]
USE_AUTH = false

[cache]
ENABLED = false
URL = "redis://localhost:6379/0?suppress=true"
DEFAULT_TTL_SECONDS = 300

[deriver]
ENABLED = true
WORKERS = 1

[deriver.model_config]
transport = "openai"
model = "gpt-5.4-mini"

[peer_card]
ENABLED = true

[dialectic]
MAX_OUTPUT_TOKENS = 8192

[dialectic.levels.minimal]
MAX_TOOL_ITERATIONS = 1
MAX_OUTPUT_TOKENS = 250
TOOL_CHOICE = "any"

[dialectic.levels.minimal.model_config]
transport = "openai"
model = "gpt-5.4-mini"

[dialectic.levels.low]
MAX_TOOL_ITERATIONS = 5
TOOL_CHOICE = "any"

[dialectic.levels.low.model_config]
transport = "openai"
model = "gpt-5.4-mini"

[dialectic.levels.medium]
MAX_TOOL_ITERATIONS = 2

[dialectic.levels.medium.model_config]
transport = "openai"
model = "gpt-5.4-mini"

[dialectic.levels.high]
MAX_TOOL_ITERATIONS = 4

[dialectic.levels.high.model_config]
transport = "openai"
model = "gpt-5.4-mini"

[dialectic.levels.max]
MAX_TOOL_ITERATIONS = 10

[dialectic.levels.max.model_config]
transport = "openai"
model = "gpt-5.4-mini"

[summary]
ENABLED = true
MAX_TOKENS_SHORT = 1000
MAX_TOKENS_LONG = 4000

[summary.model_config]
transport = "openai"
model = "gpt-5.4-mini"

[dream]
ENABLED = true

[dream.deduction_model_config]
transport = "openai"
model = "gpt-5.4-mini"

[dream.induction_model_config]
transport = "openai"
model = "gpt-5.4-mini"

[webhook]
MAX_WORKSPACE_LIMIT = 10

[metrics]
ENABLED = false

[telemetry]
ENABLED = false

[vector_store]
TYPE = "pgvector"

[sentry]
ENABLED = false

Database Migrations

uv run alembic current          # Check status
uv run alembic upgrade head     # Upgrade to latest
uv run alembic downgrade <rev>  # Downgrade to specific revision
uv run alembic revision --autogenerate -m "Description"  # Create new migration

Troubleshooting

  1. Database connection errors — Ensure DB_CONNECTION_URI uses postgresql+psycopg:// prefix. Verify database is running and pgvector extension is installed.
  2. Authentication issues — Generate and set AUTH_JWT_SECRET when AUTH_USE_AUTH=true. Use python scripts/generate_jwt_secret.py.
  3. LLM provider errors — Verify API keys are set. Check model names match your provider’s format. Ensure models support tool calling.
  4. Deriver not processing — Check logs. Increase DERIVER_WORKERS for throughput. Verify database and LLM connectivity.
  5. Dialectic level issues — Unset level fields inherit from the built-in defaults. For Anthropic, THINKING_BUDGET_TOKENS must be >= 1024 when enabled. For providers without budgeted thinking, omit it or set it to 0. MAX_OUTPUT_TOKENS must exceed THINKING_BUDGET_TOKENS.
  6. Vector store issues — For Turbopuffer, set the API key. Check VECTOR_STORE_DIMENSIONS matches your embedding model.