Skip to content

Shadow Lighthouse Quick Start

This is the shortest path from an Octopus raw workspace to a queryable Lighthouse read-side index.

Install

cd /path/to/shadow-lighthouse
uv sync

1. Inspect the layout

uv run shadow-lighthouse --config config.example.toml show-layout \
  --source cninfo_announcements

The default config writes read-side data under:

data/lighthouse/

2. Ingest one Octopus source

uv run shadow-lighthouse --config config.example.toml ingest-raw-source \
  --source cninfo_announcements \
  --octopus-source-root ../shadow-octopus/data/octopus/cninfo_announcements \
  --skip-text

Use --skip-text when you only need catalog, document list, issuer lookup, and object metadata. Remove it when local objects are present and text extraction is appropriate for the machine.

3. Use incremental ingest for live append-only records

For minute-level news sources, use incremental ingest so Lighthouse consumes only new raw record lines:

uv run shadow-lighthouse --config config.example.toml ingest-raw-source \
  --source sina7x24 \
  --octopus-source-root ../shadow-octopus/data/octopus/sina7x24 \
  --incremental \
  --max-records-per-run 20000 \
  --skip-text

The incremental cursor is stored in:

<source>/indexes/raw-ingest-state.json

4. Verify the read-side workspace

uv run shadow-lighthouse --config config.example.toml verify-source \
  --source cninfo_announcements \
  --require-data

Check source status:

uv run shadow-lighthouse --config config.example.toml source-status \
  --source cninfo_announcements

5. Query from the CLI

List latest documents:

uv run shadow-lighthouse --config config.example.toml list-documents \
  --source cninfo_announcements \
  --limit 20

Look up documents for one issuer:

uv run shadow-lighthouse --config config.example.toml list-issuer-documents \
  --source cninfo_announcements \
  平安银行

Search source-local text:

uv run shadow-lighthouse --config config.example.toml search-text \
  --source cninfo_announcements \
  "年度报告"

6. Serve HTTP

Serve every initialized source workspace:

uv run shadow-lighthouse --config config.example.toml serve \
  --host 127.0.0.1 \
  --port 8766

Smoke test:

curl "http://127.0.0.1:8766/health"
curl "http://127.0.0.1:8766/documents?source=cninfo_announcements&limit=24"