Configuration Guide

Honcho uses a flexible configuration system that supports both TOML files and environment variables. Configuration values are loaded in the following priority order (highest to lowest):

Environment variables (always take precedence)
.env file (for local development)
config.toml file (base configuration)
Default values

Recommended Configuration Approaches

Option 1: Environment Variables Only (Production)

Use environment variables for all configuration
No config files needed
Ideal for containerized deployments (Docker, Kubernetes)
Secrets managed by your deployment platform

Option 2: config.toml (Development/Simple Deployments)

Use config.toml for base configuration
Override sensitive values with environment variables
Good for development and simple deployments

Option 3: Hybrid Approach

Use config.toml for non-sensitive base settings
Use .env file for sensitive values (API keys, secrets)
Good for development teams

Option 4: .env Only (Local Development)

Use .env file for all configuration
Simple for local development
Never commit .env files to version control

Configuration Methods

Using config.toml

Copy the example configuration file to get started:

cp config.toml.example config.toml

Then modify the values as needed. The TOML file is organized into sections:

[app] - Application-level settings (log level, session limits, embedding settings, Langfuse integration, local metrics collection)
[db] - Database connection and pool settings (connection URI, pool size, timeouts, connection recycling)
[auth] - Authentication configuration (enable/disable auth, JWT secret)
[cache] - Redis cache configuration (enable/disable caching, Redis URL, TTL settings, lock configuration for cache stampede prevention)
[llm] - LLM provider API keys (Anthropic, OpenAI, Gemini, Groq, OpenAI-compatible endpoints) and general LLM settings
[dialectic] - Dialectic API configuration (provider, model, query generation settings, semantic search parameters, context window size)
[deriver] - Background worker settings (worker count, polling intervals, queue management) and theory of mind configuration (model, tokens, observation limits)
[peer_card] - Peer card generation settings (provider, model, token limits)
[summary] - Session summarization settings (frequency thresholds, provider, model, token limits for short and long summaries)
[dream] - Dream processing configuration (enable/disable, thresholds, idle timeouts, dream types, LLM settings)
[webhook] - Webhook configuration (webhook secret, workspace limits)
[metrics] - Metrics collection settings (enable/disable metrics, namespace)
[sentry] - Error tracking and monitoring settings (enable/disable, DSN, environment, sample rates)

Using Environment Variables

All configuration values can be overridden using environment variables. The environment variable names follow this pattern:

{SECTION}_{KEY} for nested settings
Just {KEY} for app-level settings

Examples:

DB_CONNECTION_URI → [db].CONNECTION_URI
DB_POOL_SIZE → [db].POOL_SIZE
AUTH_JWT_SECRET → [auth].JWT_SECRET
DIALECTIC_MODEL → [dialectic].MODEL
LOG_LEVEL (no section) → [app].LOG_LEVEL

Configuration Priority

When a configuration value is set in multiple places, Honcho uses this priority:

Environment variables - Always take precedence
.env file - Loaded for local development
config.toml - Base configuration
Default values - Built-in defaults

This allows you to:

Use config.toml for base configuration
Override specific values with environment variables in production
Use .env files for local development without modifying config.toml

Example

If you have this in config.toml:

[db]
CONNECTION_URI = "postgresql://localhost/honcho_dev"
POOL_SIZE = 10

You can override just the connection URI in production:

export DB_CONNECTION_URI="postgresql://prod-server/honcho_prod"

The application will use the production connection URI while keeping the pool size from config.toml.

Core Configuration

Application Settings

Application-level settings control core behavior of the Honcho server including logging, session limits, message handling, and optional integrations. Basic Application Configuration:

# Logging and server settings
LOG_LEVEL=INFO  # DEBUG, INFO, WARNING, ERROR, CRITICAL

# Session and context limits
SESSION_OBSERVERS_LIMIT=10  # Maximum number of observers per session
GET_CONTEXT_MAX_TOKENS=100000  # Maximum tokens for context retrieval
MAX_MESSAGE_SIZE=25000  # Maximum message size in characters

# Embedding settings
EMBED_MESSAGES=true  # Enable vector embeddings for messages
MAX_EMBEDDING_TOKENS=8192  # Maximum tokens per embedding
MAX_EMBEDDING_TOKENS_PER_REQUEST=300000  # Batch embedding limit

Optional Integrations:

# Langfuse integration for LLM observability
LANGFUSE_HOST=https://cloud.langfuse.com
LANGFUSE_PUBLIC_KEY=your-langfuse-public-key

# Local metrics collection
COLLECT_METRICS_LOCAL=false
LOCAL_METRICS_FILE=metrics.jsonl

Database Configuration

Required Database Settings:

# PostgreSQL connection string (required)
DB_CONNECTION_URI=postgresql+psycopg://username:password@host:port/database

# Example for local development
DB_CONNECTION_URI=postgresql+psycopg://postgres:postgres@localhost:5432/honcho

# Example for production
DB_CONNECTION_URI=postgresql+psycopg://honcho_user:secure_password@db.example.com:5432/honcho_prod

Database Pool Settings:

# Connection pool configuration
DB_SCHEMA=public
DB_POOL_SIZE=10
DB_MAX_OVERFLOW=20
DB_POOL_TIMEOUT=30
DB_POOL_RECYCLE=300
DB_POOL_PRE_PING=true
DB_SQL_DEBUG=false
DB_TRACING=false

Docker Compose for PostgreSQL:

# docker-compose.yml
version: '3.8'
services:
  database:
    image: pgvector/pgvector:pg15
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: honcho
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql

volumes:
  postgres_data:

Authentication Configuration

JWT Authentication:

# Enable/disable authentication
AUTH_USE_AUTH=false  # Set to true for production

# JWT settings (required if AUTH_USE_AUTH is true)
AUTH_JWT_SECRET=your-super-secret-jwt-key

Generate JWT Secret:

# Generate a secure JWT secret
python scripts/generate_jwt_secret.py

Cache Configuration

Honcho supports Redis caching to improve performance by caching frequently accessed data like peers, sessions, and working representations. Caching also includes lock mechanisms to prevent cache stampede scenarios. Redis Cache Settings:

# Enable/disable Redis caching
CACHE_ENABLED=false  # Set to true to enable caching

# Redis connection
CACHE_URL=redis://localhost:6379/0?suppress=false

# Cache namespace and TTL
CACHE_NAMESPACE=honcho  # Prefix for all cache keys
CACHE_DEFAULT_TTL_SECONDS=300  # How long items stay in cache (5 minutes)

# Lock settings for preventing cache stampede
CACHE_DEFAULT_LOCK_TTL_SECONDS=5  # Lock duration when fetching from DB on cache miss

When to Enable Caching:

High-traffic production environments
Applications with many repeated reads of the same data
When you need to reduce database load

Note: Caching requires a Redis instance. You can run Redis locally with Docker:

docker run -d -p 6379:6379 redis:latest

LLM Provider Configuration

Honcho supports multiple LLM providers for different tasks. API keys are configured in the [llm] section, while specific features use their own configuration sections.

API Keys

All provider API keys use the LLM_ prefix:

# Provider API Keys
LLM_ANTHROPIC_API_KEY=your-anthropic-api-key
LLM_OPENAI_API_KEY=your-openai-api-key
LLM_GEMINI_API_KEY=your-gemini-api-key
LLM_GROQ_API_KEY=your-groq-api-key

# OpenAI-compatible endpoints
LLM_OPENAI_COMPATIBLE_API_KEY=your-api-key
LLM_OPENAI_COMPATIBLE_BASE_URL=https://your-openai-compatible-endpoint.com

General LLM Settings

# Default settings for all LLM calls
LLM_DEFAULT_MAX_TOKENS=2500

# Embedding provider (used when EMBED_MESSAGES=true)
LLM_EMBEDDING_PROVIDER=openai  # Options: openai, gemini

Feature-Specific Model Configuration

Different features can use different providers and models: Dialectic API: The Dialectic API provides theory-of-mind informed responses by integrating long-term facts with current context.

# Main dialectic model (default: Anthropic)
DIALECTIC_PROVIDER=anthropic
DIALECTIC_MODEL=claude-sonnet-4-20250514
DIALECTIC_MAX_OUTPUT_TOKENS=2500
DIALECTIC_THINKING_BUDGET_TOKENS=1024  # Only used with Anthropic provider
DIALECTIC_CONTEXT_WINDOW_SIZE=100000  # Maximum context window tokens

# Query generation for dialectic searches
DIALECTIC_PERFORM_QUERY_GENERATION=false  # Enable query generation for semantic search
DIALECTIC_QUERY_GENERATION_PROVIDER=groq
DIALECTIC_QUERY_GENERATION_MODEL=llama-3.1-8b-instant

# Semantic search settings
DIALECTIC_SEMANTIC_SEARCH_TOP_K=10  # Number of results to retrieve
DIALECTIC_SEMANTIC_SEARCH_MAX_DISTANCE=0.85  # Maximum distance for relevance

Deriver (Theory of Mind): The Deriver is a background processing system that extracts facts from messages and builds theory-of-mind representations of peers.

# LLM settings for deriver
DERIVER_PROVIDER=google
DERIVER_MODEL=gemini-2.5-flash-lite
DERIVER_MAX_OUTPUT_TOKENS=10000
DERIVER_THINKING_BUDGET_TOKENS=1024  # Only used with Anthropic provider
DERIVER_MAX_INPUT_TOKENS=23000  # Maximum input tokens for deriver

# Worker settings
DERIVER_WORKERS=1  # Number of background worker processes
DERIVER_POLLING_SLEEP_INTERVAL_SECONDS=1.0  # Time between queue checks
DERIVER_STALE_SESSION_TIMEOUT_MINUTES=5  # Timeout for stale sessions

# Queue management
DERIVER_QUEUE_ERROR_RETENTION_SECONDS=2592000  # Keep errored items for 30 days

# Working representation settings
DERIVER_WORKING_REPRESENTATION_MAX_OBSERVATIONS=50  # Max observations stored
DERIVER_REPRESENTATION_BATCH_MAX_TOKENS=4096  # Max tokens per batch

Peer Card: Peer cards are short, structured summaries of peer identity and characteristics.

# Enable/disable peer card generation
PEER_CARD_ENABLED=true

# LLM settings for peer card generation
PEER_CARD_PROVIDER=openai
PEER_CARD_MODEL=gpt-5-nano-2025-08-07
PEER_CARD_MAX_OUTPUT_TOKENS=4000  # Includes thinking tokens for GPT-5 models

Summary Generation: Session summaries provide compressed context for long conversations. Honcho creates two types: short summaries (frequent) and long summaries (comprehensive).

# Enable/disable summarization
SUMMARY_ENABLED=true

# LLM settings for summary generation
SUMMARY_PROVIDER=openai
SUMMARY_MODEL=gpt-4o-mini-2024-07-18
SUMMARY_MAX_TOKENS_SHORT=1000  # Max tokens for short summaries
SUMMARY_MAX_TOKENS_LONG=4000  # Max tokens for long summaries
SUMMARY_THINKING_BUDGET_TOKENS=512  # Only used with Anthropic provider

# Summary frequency thresholds
SUMMARY_MESSAGES_PER_SHORT_SUMMARY=20  # Create short summary every N messages
SUMMARY_MESSAGES_PER_LONG_SUMMARY=60  # Create long summary every N messages

Default Provider Usage

By default, Honcho uses:

Anthropic (Claude) for dialectic API responses
Groq for query generation (fast, cost-effective)
Google (Gemini) for theory of mind derivation
OpenAI (GPT) for peer cards and summarization
OpenAI for embeddings (if EMBED_MESSAGES=true)

You only need to set the API keys for the providers you plan to use. All providers are configurable per feature.

Additional Features Configuration

Dream Processing

Dream processing consolidates and refines peer representations during idle periods, similar to how human memory consolidation works during sleep. Dream Settings:

# Enable/disable dream processing
DREAM_ENABLED=true

# Trigger thresholds
DREAM_DOCUMENT_THRESHOLD=50  # Minimum documents to trigger a dream
DREAM_IDLE_TIMEOUT_MINUTES=60  # Minutes of inactivity before dream can start
DREAM_MIN_HOURS_BETWEEN_DREAMS=8  # Minimum hours between dreams for a peer

# Dream types to enable
DREAM_ENABLED_TYPES=["consolidate"]  # Currently supported: consolidate

# LLM settings for dream processing
DREAM_PROVIDER=openai
DREAM_MODEL=gpt-4o-mini-2024-07-18
DREAM_MAX_OUTPUT_TOKENS=2000

Webhook Configuration

Webhooks allow you to receive real-time notifications when events occur in Honcho (e.g., new messages, session updates). Webhook Settings:

# Webhook secret for signing payloads (optional but recommended)
WEBHOOK_SECRET=your-webhook-signing-secret

# Limit on webhooks per workspace
WEBHOOK_MAX_WORKSPACE_LIMIT=10

Metrics Collection

Enable metrics collection for monitoring Honcho performance and usage. Metrics Settings:

# Enable/disable metrics collection
METRICS_ENABLED=false

# Namespace for metrics (used in metric names)
METRICS_NAMESPACE=honcho

Monitoring Configuration

Sentry Error Tracking

Sentry Settings:

# Enable/disable Sentry error tracking
SENTRY_ENABLED=false

# Sentry configuration
SENTRY_DSN=https://your-sentry-dsn@sentry.io/project-id
SENTRY_RELEASE=2.4.0  # Optional: track which version errors come from
SENTRY_ENVIRONMENT=production  # Environment name (development, staging, production)

# Sampling rates (0.0 to 1.0)
SENTRY_TRACES_SAMPLE_RATE=0.1  # 10% of transactions tracked
SENTRY_PROFILES_SAMPLE_RATE=0.1  # 10% of transactions profiled

Environment-Specific Examples

Development Configuration

config.toml for development:

[app]
LOG_LEVEL = "DEBUG"
SESSION_OBSERVERS_LIMIT = 10
EMBED_MESSAGES = false

[db]
CONNECTION_URI = "postgresql+psycopg://postgres:postgres@localhost:5432/honcho_dev"
POOL_SIZE = 5

[auth]
USE_AUTH = false

[cache]
ENABLED = false

[dialectic]
PROVIDER = "anthropic"
MODEL = "claude-sonnet-4-20250514"
PERFORM_QUERY_GENERATION = false
MAX_OUTPUT_TOKENS = 2500

[deriver]
WORKERS = 1
PROVIDER = "google"
MODEL = "gemini-2.5-flash-lite"

[peer_card]
ENABLED = true
PROVIDER = "openai"
MODEL = "gpt-5-nano-2025-08-07"

[summary]
ENABLED = true
PROVIDER = "openai"
MODEL = "gpt-4o-mini-2024-07-18"
MAX_TOKENS_SHORT = 1000
MAX_TOKENS_LONG = 4000

[dream]
ENABLED = true

[webhook]
MAX_WORKSPACE_LIMIT = 10

[metrics]
ENABLED = false

[sentry]
ENABLED = false

Environment variables for development:

# .env.development
LOG_LEVEL=DEBUG
DB_CONNECTION_URI=postgresql+psycopg://postgres:postgres@localhost:5432/honcho_dev
AUTH_USE_AUTH=false
CACHE_ENABLED=false

# LLM Provider API Keys
LLM_ANTHROPIC_API_KEY=your-dev-anthropic-key
LLM_OPENAI_API_KEY=your-dev-openai-key
LLM_GEMINI_API_KEY=your-dev-gemini-key

Production Configuration

config.toml for production:

[app]
LOG_LEVEL = "WARNING"
SESSION_OBSERVERS_LIMIT = 10
EMBED_MESSAGES = true

[db]
CONNECTION_URI = "postgresql+psycopg://honcho_user:secure_password@prod-db:5432/honcho_prod"
POOL_SIZE = 20
MAX_OVERFLOW = 40

[auth]
USE_AUTH = true

[cache]
ENABLED = true
URL = "redis://redis:6379/0"
DEFAULT_TTL_SECONDS = 300

[dialectic]
PROVIDER = "anthropic"
MODEL = "claude-sonnet-4-20250514"
PERFORM_QUERY_GENERATION = false
MAX_OUTPUT_TOKENS = 2500

[deriver]
WORKERS = 4
PROVIDER = "google"
MODEL = "gemini-2.5-flash-lite"

[peer_card]
ENABLED = true
PROVIDER = "openai"
MODEL = "gpt-5-nano-2025-08-07"

[summary]
ENABLED = true
PROVIDER = "openai"
MODEL = "gpt-4o-mini-2024-07-18"
MAX_TOKENS_SHORT = 1000
MAX_TOKENS_LONG = 4000

[dream]
ENABLED = true
PROVIDER = "openai"
MODEL = "gpt-4o-mini-2024-07-18"

[webhook]
MAX_WORKSPACE_LIMIT = 10

[metrics]
ENABLED = true

[sentry]
ENABLED = true
ENVIRONMENT = "production"
TRACES_SAMPLE_RATE = 0.1
PROFILES_SAMPLE_RATE = 0.1

Environment variables for production:

# .env.production
LOG_LEVEL=WARNING
DB_CONNECTION_URI=postgresql+psycopg://honcho_user:secure_password@prod-db:5432/honcho_prod

# Authentication
AUTH_USE_AUTH=true
AUTH_JWT_SECRET=your-super-secret-jwt-key

# Cache
CACHE_ENABLED=true
CACHE_URL=redis://redis:6379/0

# LLM Provider API Keys
LLM_ANTHROPIC_API_KEY=your-prod-anthropic-key
LLM_OPENAI_API_KEY=your-prod-openai-key
LLM_GEMINI_API_KEY=your-prod-gemini-key
LLM_GROQ_API_KEY=your-prod-groq-key

# Webhooks
WEBHOOK_SECRET=your-webhook-signing-secret

# Monitoring
SENTRY_DSN=https://your-sentry-dsn@sentry.io/project-id
SENTRY_ENVIRONMENT=production

Migration Management

Running Database Migrations:

# Check current migration status
uv run alembic current

# Upgrade to latest
uv run alembic upgrade head

# Downgrade to specific revision
uv run alembic downgrade revision_id

# Create new migration
uv run alembic revision --autogenerate -m "Description of changes"

Troubleshooting

Common Configuration Issues:

Database Connection Errors
- Ensure DB_CONNECTION_URI uses postgresql+psycopg:// prefix
- Verify database is running and accessible
- Check pgvector extension is installed
Authentication Issues
- Set AUTH_USE_AUTH=true for production
- Generate and set AUTH_JWT_SECRET if authentication is enabled
- Use python scripts/generate_jwt_secret.py to create a secure secret
LLM Provider Issues
- Verify API keys are set correctly
- Check model names match provider specifications
- Ensure provider is enabled in configuration
Deriver Issues
- Increase DERIVER_WORKERS for better performance
- Check DERIVER_STALE_SESSION_TIMEOUT_MINUTES for session cleanup
- Monitor background processing logs

This configuration guide covers all the settings available in Honcho. Always use environment-specific configuration files and never commit sensitive values like API keys or JWT secrets to version control.

Contributing

​Recommended Configuration Approaches

​Option 1: Environment Variables Only (Production)

​Option 2: config.toml (Development/Simple Deployments)

​Option 3: Hybrid Approach

​Option 4: .env Only (Local Development)

​Configuration Methods

​Using config.toml

​Using Environment Variables

​Configuration Priority

​Example

​Core Configuration

​Application Settings

​Database Configuration

​Authentication Configuration

​Cache Configuration

​LLM Provider Configuration

​API Keys

​General LLM Settings

​Feature-Specific Model Configuration

​Default Provider Usage

​Additional Features Configuration

​Dream Processing

​Webhook Configuration

​Metrics Collection

​Monitoring Configuration

​Sentry Error Tracking

​Environment-Specific Examples

​Development Configuration

​Production Configuration

​Migration Management

​Troubleshooting