Using a coding agent? Run this to install the Atulya docs skill:

npx skills add https://github.com/eight-atulya/atulya --skill atulya-docs

LiteLLM

Universal LLM memory integration via LiteLLM. Add persistent memory to any LLM application with just a few lines of code.

Features

Universal LLM Support - Works with 100+ LLM providers via LiteLLM (OpenAI, Anthropic, Groq, Azure, AWS Bedrock, Google Vertex AI, and more)
Simple Integration - Just configure, enable, and use atulya_litellm.completion()
Automatic Memory Injection - Relevant memories are injected into prompts before LLM calls
Automatic Conversation Storage - Conversations are stored to Atulya for future recall
Two Memory Modes - Choose between reflect (synthesized context) or recall (raw memory retrieval)
Direct Memory APIs - Query, synthesize, and store memories manually
Native Client Wrappers - Alternative wrappers for OpenAI and Anthropic SDKs

Installation

pip install atulya-litellm

Quick Start

import atulya_litellm

# Configure and enable memory integration
atulya_litellm.configure(
    atulya_api_url="http://localhost:8888",
    bank_id="my-agent",
)
atulya_litellm.enable()

# Use the convenience wrapper - memory is automatically injected and stored
response = atulya_litellm.completion(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What did we discuss about AI?"}]
)

How It Works

When you call completion(), the following happens automatically:

Memory Retrieval - Atulya is queried for relevant memories based on the conversation
Prompt Injection - Memories are injected into the system message
LLM Call - The enriched prompt is sent to the LLM
Conversation Storage - The conversation is stored to Atulya for future recall
Response Returned - You receive the response as normal

Configuration Options

atulya_litellm.configure(
    # Required
    atulya_api_url="http://localhost:8888",  # Atulya API server URL
    bank_id="my-agent",                          # Memory bank ID

    api_key="your-api-key",        # Optional API key for authentication

    # Optional - Memory behavior
    store_conversations=True,      # Store conversations after LLM calls
    inject_memories=True,          # Inject relevant memories into prompts
    use_reflect=False,             # Use reflect API (synthesized) vs recall (raw memories)
    reflect_include_facts=False,   # Include source facts with reflect responses
    max_memories=None,             # Maximum memories to inject (None = unlimited)
    max_memory_tokens=4096,        # Maximum tokens for memory context
    recall_budget="mid",           # Recall budget: "low", "mid", "high"
    fact_types=["world", "agent"], # Filter fact types to inject

    # Optional - Bank Configuration
    bank_name="My Agent",          # Human-readable display name for the memory bank
    background="This agent...",    # Instructions guiding what Atulya should remember

    # Optional - Advanced
    injection_mode="system_message",  # or "prepend_user"
    excluded_models=["gpt-3.5*"],     # Exclude certain models
    verbose=True,                     # Enable verbose logging and debug info
)

Bank Configuration

The background and bank_name parameters configure the memory bank itself. When provided, configure() will automatically create or update the bank with these settings.

atulya_litellm.configure(
    atulya_api_url="http://localhost:8888",
    bank_id="support-router",
    bank_name="Customer Support Router",
    background="""This agent routes customer support requests to the appropriate team.
    Remember which types of issues should go to which teams (billing, technical, sales).
    Track customer preferences for communication channels and past issue resolutions.""",
)

Memory Modes: Reflect vs Recall

Recall mode (use_reflect=False, default): Retrieves raw memory facts and injects them as a numbered list. Best when you need precise, individual memories.
Reflect mode (use_reflect=True): Synthesizes memories into a coherent context paragraph. Best for natural, conversational memory context.

# Recall mode - raw memories
atulya_litellm.configure(
    bank_id="my-agent",
    use_reflect=False,  # Default
)
# Injects: "1. [WORLD] User prefers Python\n2. [OPINION] User dislikes Java..."

# Reflect mode - synthesized context
atulya_litellm.configure(
    bank_id="my-agent",
    use_reflect=True,
)
# Injects: "Based on previous conversations, the user is a Python developer who..."

Multi-Provider Support

Works with any LiteLLM-supported provider:

import atulya_litellm

atulya_litellm.configure(
    atulya_api_url="http://localhost:8888",
    bank_id="my-agent",
)
atulya_litellm.enable()

# OpenAI
atulya_litellm.completion(model="gpt-4o", messages=[...])

# Anthropic
atulya_litellm.completion(model="claude-3-5-sonnet-20241022", messages=[...])

# Groq
atulya_litellm.completion(model="groq/llama-3.1-70b-versatile", messages=[...])

# Azure OpenAI
atulya_litellm.completion(model="azure/gpt-4", messages=[...])

# AWS Bedrock
atulya_litellm.completion(model="bedrock/anthropic.claude-3", messages=[...])

# Google Vertex AI
atulya_litellm.completion(model="vertex_ai/gemini-pro", messages=[...])

Direct Memory APIs

Recall - Query raw memories

from atulya_litellm import configure, recall

configure(bank_id="my-agent", atulya_api_url="http://localhost:8888")

memories = recall("what projects am I working on?", budget="mid")
for m in memories:
    print(f"- [{m.fact_type}] {m.text}")

Reflect - Get synthesized context

from atulya_litellm import configure, reflect

configure(bank_id="my-agent", atulya_api_url="http://localhost:8888")

result = reflect("what do you know about the user's preferences?")
print(result.text)

Retain - Store memories

from atulya_litellm import configure, retain

configure(bank_id="my-agent", atulya_api_url="http://localhost:8888")

result = retain(
    content="User mentioned they're working on a machine learning project",
    context="Discussion about current projects",
)

Async APIs

from atulya_litellm import arecall, areflect, aretain

# Async versions of all memory APIs
memories = await arecall("what do you know about me?")
context = await areflect("summarize user preferences")
result = await aretain(content="New information to remember")

Native Client Wrappers

Alternative to LiteLLM callbacks for direct SDK integration.

OpenAI Wrapper

from openai import OpenAI
from atulya_litellm import wrap_openai

client = OpenAI()
wrapped = wrap_openai(
    client,
    bank_id="my-agent",
    atulya_api_url="http://localhost:8888",
)

response = wrapped.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What do you know about me?"}]
)

Anthropic Wrapper

from anthropic import Anthropic
from atulya_litellm import wrap_anthropic

client = Anthropic()
wrapped = wrap_anthropic(
    client,
    bank_id="my-agent",
    atulya_api_url="http://localhost:8888",
)

response = wrapped.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

Debug Mode

When verbose=True, you can inspect exactly what memories are being injected:

from atulya_litellm import configure, enable, completion, get_last_injection_debug

configure(
    bank_id="my-agent",
    atulya_api_url="http://localhost:8888",
    verbose=True,
)
enable()

response = completion(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What's my favorite color?"}]
)

# Inspect what was injected
debug = get_last_injection_debug()
if debug:
    print(f"Mode: {debug.mode}")           # "reflect" or "recall"
    print(f"Injected: {debug.injected}")   # True/False
    print(f"Results: {debug.results_count}")
    print(f"Memory context:\n{debug.memory_context}")

Context Manager

from atulya_litellm import atulya_memory
import litellm

with atulya_memory(bank_id="user-123"):
    response = litellm.completion(model="gpt-4", messages=[...])
# Memory integration automatically disabled after context

Disabling and Cleanup

from atulya_litellm import disable, cleanup

# Temporarily disable memory integration
disable()

# Clean up all resources (call when shutting down)
cleanup()

API Reference

Main Functions

Function	Description
`configure(...)`	Configure global Atulya settings
`enable()`	Enable memory integration with LiteLLM
`disable()`	Disable memory integration
`is_enabled()`	Check if memory integration is enabled
`cleanup()`	Clean up all resources

Configuration Functions

Function	Description
`get_config()`	Get current configuration
`is_configured()`	Check if Atulya is configured
`reset_config()`	Reset configuration to defaults

Memory Functions

Function	Description
`recall(query, ...)`	Synchronously query raw memories
`arecall(query, ...)`	Asynchronously query raw memories
`reflect(query, ...)`	Synchronously get synthesized memory context
`areflect(query, ...)`	Asynchronously get synthesized memory context
`retain(content, ...)`	Synchronously store a memory
`aretain(content, ...)`	Asynchronously store a memory

Debug Functions

Function	Description
`get_last_injection_debug()`	Get debug info from last memory injection
`clear_injection_debug()`	Clear stored debug info

Client Wrappers

Function	Description
`wrap_openai(client, ...)`	Wrap OpenAI client with memory
`wrap_anthropic(client, ...)`	Wrap Anthropic client with memory

Requirements

Python >= 3.10
litellm >= 1.40.0
A running Atulya API server

Features​

Installation​

Quick Start​

How It Works​

Configuration Options​

Bank Configuration​

Memory Modes: Reflect vs Recall​

Multi-Provider Support​

Direct Memory APIs​

Recall - Query raw memories​

Reflect - Get synthesized context​

Retain - Store memories​

Async APIs​

Native Client Wrappers​

OpenAI Wrapper​

Anthropic Wrapper​

Debug Mode​

Context Manager​

Disabling and Cleanup​

API Reference​

Main Functions​

Configuration Functions​

Memory Functions​

Debug Functions​

Client Wrappers​

Requirements​