← Back to research
·13 min read·company

Hindsight

Hindsight is Vectorize's MIT-licensed "agent memory that learns" — biomimetic retain/recall/reflect operations over four memory networks on a single PostgreSQL database. 16.2K GitHub stars in seven months and a vendor-reported 91.4% on LongMemEval, the highest score claimed in the category.

Key takeaways

  • 16.2K GitHub stars, 920 forks, and 129 contributors in roughly seven months since the October 2025 repo creation, with 60 releases and 1,763 commits — v0.8.1 shipped June 9, 2026
  • Claims state-of-the-art memory benchmarks: 91.4% on LongMemEval and 89.61% on LoCoMo (vs 75.78% for the strongest prior open system) per the project's own arXiv paper — vendor-reported, with reproduction credited to Virginia Tech and The Washington Post
  • Architecture is the differentiator: four typed memory networks (world facts, experiences, opinions, observations) plus reflect-synthesized mental models, all on one PostgreSQL database with pgvector — no graph database or search cluster required
  • MIT open source with a pay-as-you-go cloud: token-metered pricing ($15 per million tokens to retain, $0.75 to recall) instead of monthly tiers, plus a BYOC/on-prem enterprise plan

FAQ

What is Hindsight?

Hindsight is an open-source agent memory system from Vectorize that gives AI agents persistent, structured memory through three operations — retain, recall, and reflect — organizing knowledge into world facts, experiences, opinions, and synthesized observations.

How much does Hindsight cost?

Self-hosting is free under the MIT license. Hindsight Cloud is pay-as-you-go by token volume — $15 per million tokens retained, $0.75 per million recalled, $3 per million for reflect — with free starting credits and a custom-priced enterprise tier for BYOC or on-premises deployment.

How does Hindsight work?

An LLM extracts typed facts, entities, and relationships at retain time into four memory networks on PostgreSQL with pgvector; recall runs four parallel searches (semantic, keyword, graph traversal, temporal) fused with reciprocal rank fusion; reflect runs an agentic loop that synthesizes mental models from raw memories.

How is Hindsight different from Mem0?

Mem0 is a framework-agnostic memory layer focused on compressing and persisting conversation memory, while Hindsight models cognition more explicitly — typed fact networks, causal links, and a background consolidation process that synthesizes observations — and runs entirely on a single PostgreSQL database.

Executive Summary

Hindsight is Vectorize's bid to make agent memory biomimetic: instead of stuffing conversation history into a vector store, it runs three operations — retain, recall, and reflect — over memory organized the way cognitive science suggests human memory works, separating world facts, the agent's own experiences, opinions with confidence scores, and synthesized observations, with "mental models" formed by reflecting on the raw memories beneath them.[1][2][3] The pitch is agents that learn, not just remember, and the traction is real: from repo creation in October 2025 to 16.2K GitHub stars, 920 forks, and 129 contributors by June 2026, with 1,763 commits across 60 releases — v0.8.1 landed June 9, 2026.[1]

The headline claim is benchmark dominance. The project's arXiv paper reports 91.4% on LongMemEval — described in press coverage as the first agent memory system to break 90% — and up to 89.61% on LoCoMo versus 75.78% for the strongest prior open system; even with a 20B open-weight backbone it reports 83.6% on LongMemEval, surpassing full-context GPT-4o.[4][5] These numbers are vendor-authored: the paper comes from Vectorize and its Virginia Tech collaborators, the README's comparison chart (as of January 2026) plots competitors' self-reported scores, and the claimed independent reproduction — by Virginia Tech's Sanghani Center and The Washington Post — is itself asserted by the README.[1] Vectorize, the company behind it, is a RAG-pipeline startup that raised a $3.6M seed led by True Ventures in October 2024 and has since repositioned around agent memory.[6]

AttributeValue
CompanyVectorize, Inc.; CEO Chris Latimer[6][4]
Funding$3.6M seed led by True Ventures (October 2024)[6]
Project launchedRepo created October 30, 2025; public launch December 2025[1][3]
GitHub Stars16.2K (June 2026), 920 forks, 129 contributors[1]
LicenseMIT[1]
Release cadence60 releases, 1,763 commits; v0.8.1 on June 9, 2026[1]

Product Overview

Hindsight presents memory as an API with three verbs. Retain ingests information and uses an LLM to extract typed facts, entities, and relationships into memory banks. Recall retrieves by running four search strategies in parallel — semantic similarity, keyword matching, graph traversal, and temporal filtering — then fusing and ranking the results. Reflect is the learning step: an agentic reasoning loop with a configurable mission that analyzes existing memories, forms new connections, and synthesizes higher-level mental models.[2][1]

Underneath, knowledge is split into four logical networks: world facts (objective knowledge), experiences (what happened to the agent), opinions (beliefs carrying confidence scores), and observations — preference-neutral entity summaries generated by a background consolidation process rather than extracted from any single conversation.[4][7] A deduplication pass keeps repeated observations from bloating the store.[2]

Key Capabilities

CapabilityDescription
RetainLLM-driven extraction of typed facts, entities, relationships into memory networks[2]
RecallFour parallel strategies (semantic, BM25 keyword, graph traversal with spreading activation, temporal), fused via reciprocal rank fusion, cross-encoder reranked[7][2]
ReflectAgentic loop with mission/directives that searches mental models, then observations, then raw facts before answering[2][7]
Mental modelsSynthesized understanding refreshed from underlying memories, with dedicated retrieve/refresh operations[2][8]
Causal linksFacts linked by temporal, semantic, entity, and causal relationships; causal links get boosted retrieval weight[7]
Integrations40+ platform integrations plus an MCP server for agent frameworks[2][8]

Product Surfaces

SurfaceDescriptionAvailability
Self-hostedSingle Docker container (ghcr.io/vectorize-io/hindsight) with embedded PostgreSQLGA, MIT[1][8]
SDKsPython (pip install hindsight-client), TypeScript, Go, CLIGA[1][2]
Hindsight CloudManaged service, dashboard, team collaboration, 99.9% uptime SLA optionGA, token-metered[8]
EnterpriseBYOC/on-premises, SSO/RBAC, custom SLA to 99.95%Custom[8]

Technical Architecture

The infrastructure story is deliberate minimalism: everything runs on a single PostgreSQL database with pgvector — no Neo4j for the graph, no Elasticsearch for keyword search, no dedicated vector database — with Oracle AI Database supported as an alternative backend.[7][1] Memories are represented as entities, relationships, and time series alongside sparse and dense vector embeddings, which is what lets one store serve all four recall strategies.[1] Extraction and reflection are LLM-powered, with the provider configurable via environment variable across OpenAI, Anthropic, Gemini, Groq, Ollama, LM Studio, and MiniMax — local-model operation is a first-class path.[1]

Key Technical Details

AspectDetail
DeploymentSingle Docker container self-hosted; managed cloud; BYOC/on-prem enterprise[1][8]
StoragePostgreSQL + pgvector (embedded or external); Oracle AI Database option[7][1]
ModelsPluggable: OpenAI, Anthropic, Gemini, Groq, Ollama, LM Studio, MiniMax[1]
IntegrationsPython/TypeScript/Go SDKs, CLI, MCP server, 40+ framework integrations[2]
Open SourceMIT; 16.2K stars, 129 contributors, 60 releases as of June 2026[1]

Strengths

  • The strongest published benchmark numbers in the category — 91.4% on LongMemEval and 89.61% on LoCoMo (vs 75.78% for the strongest prior open system), with a peer-visible arXiv paper explaining the method rather than a bare marketing chart.[4][5]
  • Conceptually rich retrieval that one independent reviewer ranked best-in-class — typed fact networks, causal-link boosting, spreading activation, and reciprocal rank fusion led Synix's eight-system teardown to call it "arguably the most conceptually rich retrieval system of the eight."[7]
  • One-database operational footprint — competitors often require a graph database plus a search engine plus a vector store; Hindsight's everything-on-Postgres design makes self-hosting a single Docker command.[7][1]
  • Genuine open-source velocity — 1,763 commits, 60 releases, and 129 contributors in about seven months, under a permissive MIT license with local-LLM support, not an open-core teaser.[1]
  • Academic collaboration unusual for a startup project — the paper is co-authored with Virginia Tech researchers, and the README credits the Sanghani Center and The Washington Post with reproducing the benchmark results.[4][1]

Cautions

  • Benchmark claims are vendor-authored — the paper comes from Vectorize and its collaborators, competitor scores in the comparison chart are self-reported by those vendors, and the "independent reproduction" claim appears only in Vectorize's own README; no third party has published the reproduction.[1][4]
  • Seven months old — the repo dates to October 30, 2025, and the version number (v0.8.1) signals pre-1.0 API stability; production track record is necessarily thin.[1]
  • Vendor-controlled narrative — milestones like "fastest-growing open-source AI memory project" come from Vectorize's own blog, and its HN submissions have drawn near-zero independent engagement.[9]
  • Retain costs scale with ingestion on cloud — $15 per million tokens retained is the priciest operation in the metered model, and LLM-driven extraction means self-hosters pay inference costs on every write as well.[8][2]
  • Small-company risk — Vectorize disclosed only a $3.6M seed (October 2024) and pivoted its public positioning from RAG pipelines to agent memory within a year; no follow-on funding is publicly disclosed.[6]

What Developers Say

Independent community discussion is thin as of June 2026: the December 2025 HN launch drew 4 points and 2 comments — both from a Vectorize co-founder — and the project's 10K-star and "fastest-growing" milestone posts received no comments.[9] The most substantive independent evaluation is Mark Lubin's eight-system teardown at Synix, which built prototypes on each system and traced the open-source implementations end to end.[7][9]

"Hindsight (by Vectorize.io) runs on a single PostgreSQL database with pgvector — no Neo4j, no Elasticsearch, no Milvus." — Mark Lubin, Synix[7]

"[Hindsight] implements arguably the most conceptually rich retrieval system of the eight." — Mark Lubin, Synix[7]

Hindsight has "probably the most production-ready temporal retrieval" of the systems compared. — Mark Lubin, Synix[7]

The absence of war stories cuts both ways: no complaints about data loss or retrieval quality have surfaced, but neither have independent production testimonials at scale — 16.2K stars have not yet translated into visible public deployments.[9][1]


Pricing & Licensing

TierPriceIncludes
Self-hostedFree (MIT)Single Docker deployment, all four memory networks, retain/recall/reflect APIs, MCP server, embedded PostgreSQL, community support[8]
Hindsight CloudPay-as-you-go, no monthly feeManaged infrastructure, auto-scaling, daily backups, analytics dashboard, team collaboration, optional 99.9% SLA; free starting credits[8]
EnterpriseCustomBYOC/on-prem, dedicated infrastructure, up to 99.95% SLA, 24×7 support with 30-minute response, SSO/RBAC, onboarding[8]

Cloud metering is per operation, per million tokens: retain $15.00, recall $0.75, reflect $3.00, Iris Extract (structured extraction) $7.50, mental-model retrieve $0.25, mental-model refresh $3.00. All pricing as of June 2026.[8]

Licensing model: MIT for the full system — storage, extraction, retrieval, and MCP server — with the cloud differentiated by managed operations rather than withheld features.[1][8]

Hidden costs: self-hosters pay LLM inference on every retain and reflect call (or run local models with a quality trade-off); on cloud, write-heavy agents accumulate retain charges at 20x the recall rate.[8][1]


Competitive Positioning

Direct Competitors

CompetitorDifferentiation
Mem0The category's mindshare leader (58.4K stars, $24M raised); simpler compress-and-persist memory model with a mature SaaS. Hindsight counters with richer cognitive structure and stronger published benchmark numbers
ZepTemporal knowledge graphs via open-source Graphiti (~27.3K stars); enterprise "Context Lake" positioning with annual pricing. Hindsight is fully MIT including the platform, on plain Postgres instead of a graph database
LettaA full stateful agent runtime (MemGPT lineage) with tiered memory, not just a memory API; choose Letta to build the whole agent, Hindsight to add memory to agents you already have
LangMemLangChain's memory layer; requires LangGraph, while Hindsight is framework-agnostic via SDKs and MCP
SupermemoryCommercial memory API that Hindsight's benchmark chart positions itself against; its scores, like other vendors', are self-reported[1]

When to Choose Hindsight Over Alternatives

  • Choose Hindsight when: you want the most structurally ambitious open memory system — typed networks, causal links, reflection — self-hostable on infrastructure you already run (Postgres), with local-model support and no open-core feature gating.[1][7]
  • Choose Mem0 when: you want the largest ecosystem, the longest production track record, and a battle-tested managed platform.
  • Choose Zep when: temporal knowledge-graph queries over business data and enterprise governance are the requirement.
  • Choose Letta when: you want the agent runtime and its memory designed as one system.

Ideal Customer Profile

Best fit:

  • Teams building long-lived agents (support, personal assistants, ops copilots) that need to accumulate understanding across sessions, not just retrieve transcripts[2]
  • Self-hosters who want a serious memory system on a single Postgres-backed container, including air-gapped deployments with Ollama or LM Studio[1]
  • Researchers and evaluators who want a paper-documented architecture they can inspect and reproduce[4]

Poor fit:

  • Teams that need a long production track record or post-1.0 API stability guarantees today[1]
  • Write-heavy, low-value-per-token ingestion workloads on the metered cloud, where retain costs dominate[8]
  • Buyers who require independently audited benchmark claims before committing[1]

Viability Assessment

FactorAssessment
Financial HealthThin and stale — $3.6M seed (October 2024) is the only disclosed funding; the OSS traction is an obvious setup for a follow-on round that has not been announced[6]
Market PositionFast-rising challenger — 16.2K stars in seven months against Mem0's 58.4K and Zep/Graphiti's ~27.3K, with the category's strongest (vendor-reported) benchmark story[1][4]
Innovation PaceVery high — 60 releases and 1,763 commits in seven months, with an arXiv paper and academic collaboration alongside[1][4]
Community/EcosystemStars without voices — 129 contributors and 40+ integrations, but near-zero independent discussion on HN and no public production testimonials[1][9]
Long-term OutlookDepends on whether benchmark leadership survives independent scrutiny and whether the cloud converts OSS adoption into revenue before better-funded rivals close the architecture gap[4][8]

The star velocity is the standout — 10K stars by April 2026 and 16.2K by June puts Hindsight on one of the fastest growth curves the agent-memory category has seen — but the gap between GitHub enthusiasm and visible community discussion is unusually wide, and the company carrying it has disclosed only a seed round.[1][9][6] The benchmark paper is a real asset; an independent, third-party-published reproduction would convert it from marketing into a moat.[4]


Bottom Line

Hindsight is the most architecturally interesting entrant in agent memory right now: a genuinely MIT-licensed system whose typed memory networks, causal-weighted retrieval, and reflection loop go meaningfully beyond compress-and-retrieve, all while running on a single Postgres database. The benchmark claims are the best published in the category — and they are the vendor's own, with competitor comparisons built on self-reported scores and a reproduction claim that no third party has independently published. Seven months old and pre-1.0, it is a strong evaluate-now candidate rather than a default production choice.

Recommended for: teams adding learning memory to existing agents, self-hosters and air-gapped deployments, and anyone who wants to evaluate the current (claimed) LongMemEval leader from source.

Not recommended for: risk-averse production buyers who need maturity, independent benchmark validation, or a vendor with disclosed funding beyond a $3.6M seed.

Outlook: Watch for a v1.0 with API stability guarantees, a third-party-published benchmark reproduction, follow-on funding for Vectorize, and the first named production customers — each would materially de-risk what is otherwise the category's fastest-moving project.


Research by Ry Walker Research • methodology