How Honcho creates summaries of conversations
session
). The general strategy for summarization is to combine a list of recent messages verbatim with a compressed LLM-generated summary of the older messages not included. Implementing this correctly, in such a way that the resulting context is:
get_context
method. This method has two parameters:
summary
: A boolean indicating whether to include the summary in the return type. The default is true.tokens
: An integer indicating the maximum number of tokens to use for the context. If not provided, get_context
will retrieve as many tokens as are required to create exhaustive conversation coverage.get_context
is a few times larger than the configured long summary size.
summary
to false and tokens
to some multiple of your desired message count. Note that context messages are not paginated, so there’s a hard limit on the number of messages that can be retrieved (currently 100,000 tokens).
As a final note, remember that summaries are generated asynchronously and therefore may not be available immediately. If you batch-save a large number of messages, assume that summaries will not be available until those messages are processed, which can take seconds to minutes depending on the number of messages and the configured LLM provider. Exhaustive get_context
calls performed during this time will likely just return the messages in the session.