Concepts

Browse docs

Core Concepts

Tap to expand

Contribute

ConceptsUpdated 2026-03-18

Extraction Reliability

Learn how RetainDB keeps async extraction usable in production and what signals you should trust when you are validating a new integration.

Extraction reliability is the part of RetainDB that determines whether a fast write turns into usable memory instead of an opaque background job.

For a first-time adopter, the important point is not every internal stage. It is knowing what behavior is intentional and what behavior means something is wrong.

The reliability model in plain English

RetainDB does not force every write to wait for full downstream processing.

Instead, the system tries to give you three things at once:

  1. fast write acknowledgment
  2. enough visibility to confirm the write is there
  3. eventual fully processed memory for later retrieval

That tradeoff is why you will sometimes see a memory as pending before you see it as fully processed.

What happens after a write

The path usually looks like this:

  1. your app submits memory or a session ingest request
  2. RetainDB validates the request and accepts it
  3. background extraction classifies, structures, and indexes the content
  4. read surfaces merge pending and processed data when asked

The system is working as intended if a fresh write appears through the pending overlay and later settles into the normal processed state.

What you should verify first

When you are evaluating reliability, start with behavior that matters to your app:

  • does the write return a usable acknowledgement?
  • can you read it back in the same scope?
  • does the pending result converge to processed memory?
  • can you poll the job if the route is async?

This is more useful than chasing internal pipeline details too early.

The most common false alarms

“Search lost my write”

Usually the scope changed between write and read.

“The system is inconsistent”

Usually include_pending was disabled during a first-run test, so you are seeing processed-only behavior and assuming the write vanished.

“Extraction is broken”

Usually the write is still in progress and the job or pending overlay would tell you that if you checked the right endpoint.

Signals worth trusting

These are the signals that help most during debugging:

  • response trace ids on write and read calls
  • include_pending behavior on search, profile, and session reads
  • job polling through GET /v1/memory/jobs/:jobId
  • the exact project, user_id, and session_id used on both sides
Info
Reliability debugging is mostly correlation debugging. You are trying to prove that the same content moved through the same scope, not prove that every internal subsystem ran synchronously.

What a healthy rollout looks like

A healthy integration rollout usually follows this order:

  1. validate one write and one read in a fixed project and user scope
  2. confirm immediate visibility with include_pending=true
  3. confirm processed visibility after background completion
  4. only then add batching, connectors, or more complex retrieval paths

What teams get wrong when scaling up

  • they ingest large volumes before validating a single clean loop
  • they mix user and session semantics
  • they treat pending visibility as an error instead of a feature
  • they debug retrieval quality before debugging scope hygiene

Next step

If you want the concrete API behavior behind this model, read read-after-write visibility. If you are ready to test session ingestion directly, continue to session ingest and extraction.

Was this page helpful?

Your feedback helps us prioritize docs improvements weekly.