This guide shows how to implement a simple user memory system that derives and stores facts about users that are then referenced later on.

A fully working example can be found on GitHub. It’s setup as a discord bot so view our Discord guide for more details on how to set that up.

Initial Setup

First we’ll have some setup code to initialize our LLM, Honcho client, and prompts.

from langchain import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
  ChatPromptTemplate,
  SystemMessagePromptTemplate,
  HumanMessagePromptTemplate,
)
from langchain.schema import (
  AIMessage,
  HumanMessage,
  SystemMessage
)
from langchain_core.output_parsers import NumberedListOutputParser

llm: ChatOpenAI = ChatOpenAI(model_name="gpt-4")
app_name = "Fact-Memory"

honcho = Honcho(environment="demo")

The 3 steps to this project are:

  1. Derive facts about the user.
  2. Store those facts in a per-user vector db (Collection).
  3. Retrieve those facts later on to improve our response.

Step 1 - Deriving Facts

For this part we will be leveraging LangChain and GPT-4 to derivce facts on the user based on their recent input.

def derive_facts(user_input):
  # Derive facts about the user
  
  fact_derivation = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate(prompt=prompt)
  ])

  chain = fact_derivation | llm

  response = await chain.ainvoke({
      "user_input": input
  })

  output_parser = NumberedListOutputParser()
  facts = output_parser.parse(response.content)

  # Return a list of facts
  return facts

Breaking down this function we start by using LCEL to make a chain that will take the user_input as a variable. The prompt used here is below:

_type: prompt
input_variables:
    ["user_input"]
template: >
    You are tasked with deriving discrete facts about the user based on their input. The goal is to only extract absolute facts from the message, do not make inferences beyond the text provided.

    ```{user_input}```

    Output the facts as a numbered list.

We then invoke the chain and parse the output to get our list of facts. We take advantage of one of LangChain’s built in output parsers for this.

Step 2 - Storing Facts

This is where Honcho comes into play. With Honcho we can initialize Collections for each user and can store facts as vector embeddings. You can use multiple collections if you want to segment different types of facts or data, but for now we just need one.

def store_facts(app_id, user_id, facts):
  user = honcho.apps.users.get_or_create(name=user_id, app_id=app.id)

  try:
    collection = honcho.apps.users.collections.get_by_name(app_id=app.id, user_id=user.id, name="discord")
  except NotFoundError as e:
    collection = honcho.apps.users.collections.create(app_id=app.id, user_id=user.id, name="discord")
  collection: Collection
 
  for fact in facts: # store each fact in the collection
    honcho.apps.users.collections.documents.create(
      app_id=app_id, user_id=user_id, collection_id=collection.id, content=fact
    )

That’s it. We just make the collection if it doesn’t already exist for the user and add the facts.

Step 3 - Retrieving Facts

This is where we actually make user of the facts. To do this we need to first determine a query to the collection. This query is natural language that is compared to other facts in the collection using cosine similarity.

The trick we use below is to have the LLM determine the query and then use it in a later step.

def introspect(chat_history, input):
  
  introspection_prompt = ChatPromptTemplate.from_messages([
    system_introspection
  ])

  # LCEL
  chain = introspection_prompt | llm

  # inference
  response = await chain.ainvoke({
      "chat_history": chat_history,
      "input": input
  })

  # parse output
  questions = output_parser.parse(response.content)

  return questions

def respond(chat_history, input):
  response_prompt = ChatPromptTemplate.from_messages([
      system_response,
      chat_history
  ])

  retrieved_facts = collection.query(query=input, top_k=10)

  # LCEL
  chain = response_prompt | llm

  # inference
  response = await chain.ainvoke({
      "facts": retrieved_facts,
  })

  return response.content

In the the introspect method we are using the user’s response as input to derive the query. The prompt used is below:

_type: prompt
input_variables:
    ["chat_history", "user_input"]
template: >
    Given the conversation history and user input, use your theory of mind skills to list out questions you'd like to know about the user in order to best respond to them. 
    
    Chat history: ```{chat_history}```
    User input: ```{user_input}```
    
    Output the questions as a numbered list.

Then we use the question that it generated as the query in our collection.query() method. the top_k parameter allows you to control how many retrieved facts to return with a max of 50.

The prompt use there is below as well:

_type: prompt
input_variables:
    ["facts"]
template: >
    You are a helpful assistant. Craft a useful response based on the context provided in the conversation history and the facts we know about the user:
    ```{facts}```

This is a very simple method of using Honcho to hold user context. For further reading on the limits read about violation of expectation.