Shadow Lighthouse¶
shadow-lighthouse is the read-side of the Shadow data stack.
It reads raw source workspaces written by shadow-octopus, builds source-local indexes and artifacts, and serves low-latency document lookup and search surfaces for agents, tools, and internal applications.
It does not crawl websites, own source checkpoints, or mutate the raw source of truth.
What It Is For¶
Use shadow-lighthouse when you need to:
- list the latest documents across one or more sources
- look up a document by raw id or AI-facing id
- search indexed record text, extracted text, transcripts, or OCR artifacts
- find documents by issuer name, stock code, or provider-specific issuer id
- serve object metadata and original downloaded files
- expose a stable HTTP read API over local lakehouse data
Boundary With Shadow Octopus¶
shadow-octopus owns write-side collection:
- crawling and API sync
- source checkpoints
- raw record append
- object manifest append
- object download state
shadow-lighthouse owns read-side projection:
- catalog indexes
- FTS indexes
- issuer lookup indexes
- derived artifacts
- AI-facing document bundles
- HTTP and CLI read surfaces
The boundary is the Octopus raw contract under octopus/<source>/.
Source-Local First¶
Lighthouse keeps each source isolated under its own workspace:
lighthouse/
<source_name>/
jobs/
canonical/
indexes/
artifacts/
global/
indexes/
The source-local model is intentional. It keeps rebuilds, failures, and operational costs bounded by source. Global views, such as merged document lists or news search, are projections over those source-local indexes.
Main Indexes¶
| Index | Location | Purpose |
|---|---|---|
| Catalog | <source>/indexes/catalog.sqlite |
Document metadata, resolved objects, issuer lookup |
| FTS | <source>/indexes/fts.sqlite |
Search over record text and extracted text artifacts |
| Table index | <source>/indexes/tables.sqlite |
Structured table row and cell search from table_json artifacts |
| News index | global/indexes/news.sqlite |
Cross-source canonical news search |