Shadow Lighthouse Quick Start¶
This is the shortest path from an Octopus raw workspace to a queryable Lighthouse read-side index.
Install¶
cd /path/to/shadow-lighthouse
uv sync
1. Inspect the layout¶
uv run shadow-lighthouse --config config.example.toml show-layout \
--source cninfo_announcements
The default config writes read-side data under:
data/lighthouse/
2. Ingest one Octopus source¶
uv run shadow-lighthouse --config config.example.toml ingest-raw-source \
--source cninfo_announcements \
--octopus-source-root ../shadow-octopus/data/octopus/cninfo_announcements \
--skip-text
Use --skip-text when you only need catalog, document list, issuer lookup, and object metadata. Remove it when local objects are present and text extraction is appropriate for the machine.
3. Use incremental ingest for live append-only records¶
For minute-level news sources, use incremental ingest so Lighthouse consumes only new raw record lines:
uv run shadow-lighthouse --config config.example.toml ingest-raw-source \
--source sina7x24 \
--octopus-source-root ../shadow-octopus/data/octopus/sina7x24 \
--incremental \
--max-records-per-run 20000 \
--skip-text
The incremental cursor is stored in:
<source>/indexes/raw-ingest-state.json
4. Verify the read-side workspace¶
uv run shadow-lighthouse --config config.example.toml verify-source \
--source cninfo_announcements \
--require-data
Check source status:
uv run shadow-lighthouse --config config.example.toml source-status \
--source cninfo_announcements
5. Query from the CLI¶
List latest documents:
uv run shadow-lighthouse --config config.example.toml list-documents \
--source cninfo_announcements \
--limit 20
Look up documents for one issuer:
uv run shadow-lighthouse --config config.example.toml list-issuer-documents \
--source cninfo_announcements \
平安银行
Search source-local text:
uv run shadow-lighthouse --config config.example.toml search-text \
--source cninfo_announcements \
"年度报告"
6. Serve HTTP¶
Serve every initialized source workspace:
uv run shadow-lighthouse --config config.example.toml serve \
--host 127.0.0.1 \
--port 8766
Smoke test:
curl "http://127.0.0.1:8766/health"
curl "http://127.0.0.1:8766/documents?source=cninfo_announcements&limit=24"