Using a coding agent? Run this to install the Atulya docs skill:

npx skills add https://github.com/eight-atulya/atulya --skill atulya-docs

Models

Atulya uses several machine learning models for different tasks.

Overview

Model Type	Purpose	Default	Configurable
LLM	Fact extraction, reasoning, generation	Provider-specific	Yes
Embedding	Vector representations for semantic search	`BAAI/bge-small-en-v1.5`	Yes
Cross-Encoder	Reranking search results	`cross-encoder/ms-marco-MiniLM-L-6-v2`	Yes

All local models (embedding, cross-encoder) are automatically downloaded from HuggingFace on first run.

LLM

Used for fact extraction, entity resolution, opinion generation, and answer synthesis.

Supported providers: OpenAI, Anthropic, Gemini, Groq, Ollama, LM Studio, and any OpenAI-compatible API

OpenAI-Compatible Providers

Atulya works with any provider that exposes an OpenAI-compatible API (e.g., Azure OpenAI). Simply set ATULYA_API_LLM_PROVIDER=openai and configure ATULYA_API_LLM_BASE_URL to point to your provider's endpoint.

See Configuration for setup examples.

Tested Models

The following models have been tested and verified to work correctly with Atulya:

Provider	Model
OpenAI	`gpt-5.2`
OpenAI	`gpt-5`
OpenAI	`gpt-5-mini`
OpenAI	`gpt-5-nano`
OpenAI	`gpt-4.1-mini`
OpenAI	`gpt-4.1-nano`
OpenAI	`gpt-4o-mini`
Anthropic	`claude-sonnet-4-20250514`
Anthropic	`claude-3-5-sonnet-20241022`
Gemini	`gemini-3-pro-preview`
Gemini	`gemini-2.5-flash`
Gemini	`gemini-2.5-flash-lite`
Groq	`openai/gpt-oss-120b`
Groq	`openai/gpt-oss-20b`

Using Other Models

Other LLM models not listed above may work with Atulya, but they must support at least 65,000 output tokens to ensure reliable fact extraction. If you need support for a specific model that doesn't meet this requirement, please open an issue to request an exception.

Configuration

# Groq (recommended)
export ATULYA_API_LLM_PROVIDER=groq
export ATULYA_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export ATULYA_API_LLM_MODEL=openai/gpt-oss-20b

# OpenAI
export ATULYA_API_LLM_PROVIDER=openai
export ATULYA_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export ATULYA_API_LLM_MODEL=gpt-4o

# Gemini
export ATULYA_API_LLM_PROVIDER=gemini
export ATULYA_API_LLM_API_KEY=xxxxxxxxxxxx
export ATULYA_API_LLM_MODEL=gemini-2.0-flash

# Anthropic
export ATULYA_API_LLM_PROVIDER=anthropic
export ATULYA_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export ATULYA_API_LLM_MODEL=claude-sonnet-4-20250514

# Ollama (local)
export ATULYA_API_LLM_PROVIDER=ollama
export ATULYA_API_LLM_BASE_URL=http://localhost:11434/v1
export ATULYA_API_LLM_MODEL=llama3

# LM Studio (local)
export ATULYA_API_LLM_PROVIDER=lmstudio
export ATULYA_API_LLM_BASE_URL=http://localhost:1234/v1
export ATULYA_API_LLM_MODEL=your-local-model

Note: The LLM is the primary bottleneck for retain operations. See Performance for optimization strategies.

Embedding Model

Converts text into dense vector representations for semantic similarity search.

Default: BAAI/bge-small-en-v1.5 (384 dimensions, ~130MB)

Supported Providers

Provider	Description	Best For
`local`	SentenceTransformers (default)	Development, low latency
`openai`	OpenAI embeddings API	Production, high quality
`cohere`	Cohere embeddings API	Production, multilingual
`tei`	HuggingFace Text Embeddings Inference	Production, self-hosted
`litellm`	LiteLLM proxy (unified gateway)	Multi-provider setups

Local Models

Model	Dimensions	Use Case
`BAAI/bge-small-en-v1.5`	384	Default, fast, good quality
`sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`	384	Multilingual (50+ languages)

OpenAI Models

Model	Dimensions	Use Case
`text-embedding-3-small`	1536	Default OpenAI, cost-effective
`text-embedding-3-large`	3072	Higher quality, more expensive
`text-embedding-ada-002`	1536	Legacy model

Cohere Models

Model	Dimensions	Use Case
`embed-english-v3.0`	1024	English text
`embed-multilingual-v3.0`	1024	100+ languages

Embedding Dimensions

Atulya automatically detects the embedding dimension at startup and adjusts the database schema. Once memories are stored, you cannot change dimensions without losing data.

Configuration Examples:

# Local provider (default)
export ATULYA_API_EMBEDDINGS_PROVIDER=local
export ATULYA_API_EMBEDDINGS_LOCAL_MODEL=BAAI/bge-small-en-v1.5

# OpenAI
export ATULYA_API_EMBEDDINGS_PROVIDER=openai
export ATULYA_API_EMBEDDINGS_OPENAI_API_KEY=sk-xxxxxxxxxxxx
export ATULYA_API_EMBEDDINGS_OPENAI_MODEL=text-embedding-3-small

# Cohere
export ATULYA_API_EMBEDDINGS_PROVIDER=cohere
export ATULYA_API_COHERE_API_KEY=your-api-key
export ATULYA_API_EMBEDDINGS_COHERE_MODEL=embed-english-v3.0

# TEI (self-hosted)
export ATULYA_API_EMBEDDINGS_PROVIDER=tei
export ATULYA_API_EMBEDDINGS_TEI_URL=http://localhost:8080

# LiteLLM proxy
export ATULYA_API_EMBEDDINGS_PROVIDER=litellm
export ATULYA_API_LITELLM_API_BASE=http://localhost:4000
export ATULYA_API_EMBEDDINGS_LITELLM_MODEL=text-embedding-3-small

See Configuration for all options including Azure OpenAI and custom endpoints.

Cross-Encoder (Reranker)

Reranks initial search results to improve precision.

Default: cross-encoder/ms-marco-MiniLM-L-6-v2 (~85MB)

Supported Providers

Provider	Description	Best For
`local`	SentenceTransformers CrossEncoder (default)	Development, low latency
`cohere`	Cohere rerank API	Production, high quality
`tei`	HuggingFace Text Embeddings Inference	Production, self-hosted
`flashrank`	FlashRank (lightweight, fast)	Resource-constrained environments
`litellm`	LiteLLM proxy (unified gateway)	Multi-provider setups
`rrf`	RRF-only (no neural reranking)	Testing, minimal resources

Local Models

Model	Use Case
`cross-encoder/ms-marco-MiniLM-L-6-v2`	Default, fast
`cross-encoder/ms-marco-MiniLM-L-12-v2`	Higher accuracy
`cross-encoder/mmarco-mMiniLMv2-L12-H384-v1`	Multilingual

Cohere Models

Model	Use Case
`rerank-english-v3.0`	English text
`rerank-multilingual-v3.0`	100+ languages

LiteLLM Supported Providers

LiteLLM supports multiple reranking providers via the /rerank endpoint:

Provider	Model Example
Cohere	`cohere/rerank-english-v3.0`
Together AI	`together_ai/...`
Voyage AI	`voyage/rerank-2`
Jina AI	`jina_ai/...`
AWS Bedrock	`bedrock/...`

Configuration Examples:

# Local provider (default)
export ATULYA_API_RERANKER_PROVIDER=local
export ATULYA_API_RERANKER_LOCAL_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

# Cohere
export ATULYA_API_RERANKER_PROVIDER=cohere
export ATULYA_API_COHERE_API_KEY=your-api-key
export ATULYA_API_RERANKER_COHERE_MODEL=rerank-english-v3.0

# TEI (self-hosted)
export ATULYA_API_RERANKER_PROVIDER=tei
export ATULYA_API_RERANKER_TEI_URL=http://localhost:8081

# FlashRank (lightweight)
export ATULYA_API_RERANKER_PROVIDER=flashrank

# LiteLLM proxy
export ATULYA_API_RERANKER_PROVIDER=litellm
export ATULYA_API_LITELLM_API_BASE=http://localhost:4000
export ATULYA_API_RERANKER_LITELLM_MODEL=cohere/rerank-english-v3.0

# RRF-only (no neural reranking)
export ATULYA_API_RERANKER_PROVIDER=rrf

See Configuration for all options including Azure-hosted endpoints and batch settings.

Overview​

LLM​

Tested Models​

Using Other Models​

Configuration​

Embedding Model​

Supported Providers​

Local Models​

OpenAI Models​

Cohere Models​

Cross-Encoder (Reranker)​

Supported Providers​

Local Models​

Cohere Models​

LiteLLM Supported Providers​

Overview

LLM

Tested Models

Using Other Models

Configuration

Embedding Model

Supported Providers

Local Models

OpenAI Models

Cohere Models

Cross-Encoder (Reranker)

Supported Providers

Local Models

Cohere Models

LiteLLM Supported Providers