Quickstart

Let’s get started with Honcho. In this quickstart, you will:

Set up a workspace with peers (user and assistant)
Ingest messages from across multiple sessions
Query the reasoning Honcho produces to get synthesized insights about the user

Running the code below requires an API key. Create and account and get your API key at app.honcho.dev under “API KEYS”.Every new tenant gets $100.00 in free credits on sign up. The code below costs ~$0.04 to run, so don’t worry—still plenty of free credits for iterating.

1. Install the SDK

uv add honcho-ai

2. Initialize the Client

The Honcho client is the main entry point for interacting with Honcho’s API. It uses a workspace called default unless specified, so let’s create a first-honcho-test workspace for this quickstart.

from honcho import Honcho

# Initialize client
honcho = Honcho(workspace="first-honcho-test", api_key=HONCHO_API_KEY)

3. Create Peers

user = honcho.peer("user")
assistant = honcho.peer("assistant")

4. Add Messages to Sessions

We’ve generated an example conversation dataset with 14 messages across 4 sessions. At a high level, the conversation contains a user chatting with an assistant to get help debugging software infrastructure problems for work and jam strategy on a side project they’re working on. Spoiler alert—the user is way more interested in their side project. Create a file called conversation.json and add the content in the accordion below. Then we’ll loop through the sessions and messages in that file and write them to Honcho.

Example conversation.json

Example Conversation

{
    "sessions": [
      {
        "id": "session_1",
        "messages": [
          {
            "role": "user",
            "content": "containers getting OOM killed in CI after someone on my team updated the base image. quick fix?"
          },
          {
            "role": "assistant",
            "content": "OOM kills after a base image update usually means the new image has higher memory requirements. Quick fix: bump the memory limit in your CI config. If you need to unblock fast, you could also pin back to the previous base image version temporarily."
          }
        ]
      },
      {
        "id": "session_2",
        "messages": [
          {
            "role": "user",
            "content": "hey, been playing with honcho for a side project. do you know if there's a limit on how many users i can have?"
          },
          {
            "role": "assistant",
            "content": "Honcho's free tier is based on context ingested, not number of users - so you can have as many users as you want, you're just limited by how much content you're processing. What are you building?"
          },
          {
            "role": "user",
            "content": "ok so basically it's a personal finance assistant that actually remembers your situation - like it knows you're saving for a house, that you hate subscriptions, that you always regret impulse buys over $50. i've been thinking a lot about the UX, like when should it proactively bring up past context vs wait for you to ask? and how do you balance feeling \"known\" vs feeling surveilled? i have a whole notion doc on this lol"
          },
          {
            "role": "assistant",
            "content": "That's a really thoughtful approach to the personalization problem! The \"known vs surveilled\" tension is exactly right - the best experiences feel like talking to a friend who remembers, not a system that tracks. Proactive context works best when it's in service of the user's stated goals."
          }
        ]
      },
      {
        "id": "session_3",
        "messages": [
          {
            "role": "user",
            "content": "do i need to update docker-compose too or just the CI yaml for that memory thing"
          },
          {
            "role": "assistant",
            "content": "Just the CI yaml for the memory limit - docker-compose is separate and only matters if you're hitting the same OOM issue locally."
          },
          {
            "role": "user",
            "content": "k. oh btw 3 friends are using the finance app now, they keep asking when they can pay me lol. one of them said it's the first budgeting thing that \"gets\" her"
          },
          {
            "role": "assistant",
            "content": "That's awesome - \"when can I pay you\" is the best kind of validation. And that quote about it being the first thing that \"gets\" her is exactly the value prop of personalized memory. Sounds like you're onto something real."
          }
        ]
      },
      {
        "id": "session_4",
        "messages": [
          {
            "role": "user",
            "content": "if i wanted to charge like $5/mo for the finance app, would honcho's pricing still make sense at that price point? trying to figure out unit economics early"
          },
          {
            "role": "assistant",
            "content": "You should be fine at that price point. Unlike other memory frameworks that charge you to access your own data, Honcho only charges for the reasoning layer - ingestion and inference. Storage and retrieval are free. So your unit economics are mostly about how often you're synthesizing insights, not how much history you're keeping."
          },
          {
            "role": "user",
            "content": "ok interesting. wish i had more time to work on this"
          },
          {
            "role": "assistant",
            "content": "Yeah, sounds like you've got something worth making time for. The early traction is real."
          }
        ]
      }
    ]
  }

import json

# Load conversation data
with open("conversation.json", "r") as f:
    data = json.load(f)

# Process each session
for session_data in data["sessions"]:
    session = honcho.session(session_data["id"])
    session.add_peers([user, assistant])

    # Add messages with correct roles
    messages = []
    for msg in session_data["messages"]:
        if msg["role"] == "user":
            messages.append(user.message(msg["content"]))
        elif msg["role"] == "assistant":
            messages.append(assistant.message(msg["content"]))

    session.add_messages(messages)

5. Query for Insights

Now ask Honcho what it’s learned—this is where the magic happens:

response = user.chat("What should I know about this user? 3 sentences max")
print(response)

Honcho needs a short amount of time to process messages you write to it. There are several utilities to check the status of the queue. Honcho also offers numerous ways to query reasoning to fit latency needs: see the Get Context page.

The response will look something like this:

User is a personal finance app developer building a personalized finance assistant that’s generating real demand (friends are already asking when they can pay). They’re notably thoughtful about product design, carefully considering the UX balance between making users feel “known” versus “surveilled” when their app proactively surfaces remembered context like savings goals and spending regrets. They’re business-minded and working through unit economics early, exploring a $5/month subscription model with usage-based cost structure focused on insight generation frequency rather than data storage—though they wish they had more time to dedicate to the project.

Honcho synthesizes signal by reasoning about the user to draw conclusions beyond what was explicitly stated. It identifies the user as “notably thoughtful about product design”, “business-minded” from the discussion of unit economics, and surfaces the signal that they desire to work on the project more. This is rich personal context for domain-specific agents to do what they want with.

A life coach agent might see “they wish they had more time to dedicate to the project” and “friends are already asking when they can pay” and ask “have you thought about what it would take to go full-time?”
A productivity agent might see the same pattern and say “let’s protect your weekend time for the finance app.”
A financial advisor agent might see it and ask “what runway would you need to make the leap?”

Honcho acts almost like a detective—it reasons about new and existing evidence in order to form conclusions that can be used to make a case. These conclusions wait to be composed dynamically based on how you, the ~~judge~~ developer, query it. This approach is what drives our pareto-frontier performance on memory benchmarks, and our custom models allow us to optimize speed and cost.

Next Steps

You just saw how Honcho reasons about data to build rich peer representations. In this quickstart, you:

Set up a workspace with peers (user and assistant)
Ingested messages across multiple sessions
Queried the reasoning to get synthesized insights about the user

Here’s the full working code if you want to run it yourself:

Full Scripts

# uv sync
# uv run python test.py

import json
import time
import uuid

from honcho import Honcho
from dotenv import load_dotenv

load_dotenv()

# Initialize Honcho client with a unique workspace
workspace_id = f"docs-example-{uuid.uuid4().hex[:8]}"
honcho = Honcho(environment="production", workspace_id=workspace_id)

# Create peers to represent the user and assistant
user = honcho.peer("user")
assistant = honcho.peer("assistant")

# Load conversation data from JSON file
with open("conversation.json", "r") as f:
    conversation_data = json.load(f)

# Import historical conversation sessions
for session_data in conversation_data["sessions"]:
    session = honcho.session(session_data["id"])
    session.add_peers([user, assistant])

    # Convert messages to peer messages with correct attribution
    messages = []
    for msg in session_data["messages"]:
        if msg["role"] == "user":
            messages.append(user.message(msg["content"]))
        elif msg["role"] == "assistant":
            messages.append(assistant.message(msg["content"]))

    session.add_messages(messages)

# Wait for Honcho to process the conversation history
def wait_for_processing():
    status = honcho.get_deriver_status()
    while status.pending_work_units > 0 or status.in_progress_work_units > 0:
        time.sleep(1)
        status = honcho.poll_deriver_status()

print("Processing conversation history...")
start_time = time.time()
wait_for_processing()
elapsed = int(time.time() - start_time)
print(f"Done in {elapsed}s! Querying user insights...\n")

# Query insights about the user based on conversation history
response = user.chat("What should I know about this user? 3 sentences max")
print(response)

From here, you can explore how to use Honcho’s features in your own applications:

Get Context

Learn how to fetch the right context for your agent’s next response

Architecture

Deep dive into how Honcho’s primitives fit together

Chat Endpoint

Query representations with natural language

Guides

Integration patterns and advanced use cases

Introduction

Core Concepts

Features

Reference

1. Install the SDK

2. Initialize the Client

3. Create Peers

4. Add Messages to Sessions

5. Query for Insights

Next Steps

Get Context

Architecture

Chat Endpoint

Guides

Introduction

Core Concepts

Features

Reference

​1. Install the SDK

​2. Initialize the Client

​3. Create Peers

​4. Add Messages to Sessions

​5. Query for Insights

​Next Steps

Get Context

Architecture

Chat Endpoint

Guides

1. Install the SDK

2. Initialize the Client

3. Create Peers

4. Add Messages to Sessions

5. Query for Insights

Next Steps