Skip to content

Shadow Lighthouse

shadow-lighthouse is the read-side of the Shadow data stack.

It reads raw source workspaces written by shadow-octopus, builds source-local indexes and artifacts, and serves low-latency document lookup and search surfaces for agents, tools, and internal applications.

It does not crawl websites, own source checkpoints, or mutate the raw source of truth.

What It Is For

Use shadow-lighthouse when you need to:

  • list the latest documents across one or more sources
  • look up a document by raw id or AI-facing id
  • search indexed record text, extracted text, transcripts, or OCR artifacts
  • find documents by issuer name, stock code, or provider-specific issuer id
  • serve object metadata and original downloaded files
  • expose a stable HTTP read API over local lakehouse data

Boundary With Shadow Octopus

shadow-octopus owns write-side collection:

  • crawling and API sync
  • source checkpoints
  • raw record append
  • object manifest append
  • object download state

shadow-lighthouse owns read-side projection:

  • catalog indexes
  • FTS indexes
  • issuer lookup indexes
  • derived artifacts
  • AI-facing document bundles
  • HTTP and CLI read surfaces

The boundary is the Octopus raw contract under octopus/<source>/.

Source-Local First

Lighthouse keeps each source isolated under its own workspace:

lighthouse/
  <source_name>/
    jobs/
    canonical/
    indexes/
    artifacts/
  global/
    indexes/

The source-local model is intentional. It keeps rebuilds, failures, and operational costs bounded by source. Global views, such as merged document lists or news search, are projections over those source-local indexes.

Main Indexes

Index Location Purpose
Catalog <source>/indexes/catalog.sqlite Document metadata, resolved objects, issuer lookup
FTS <source>/indexes/fts.sqlite Search over record text and extracted text artifacts
Table index <source>/indexes/tables.sqlite Structured table row and cell search from table_json artifacts
News index global/indexes/news.sqlite Cross-source canonical news search