Shadow Lighthouse Deployment¶
Lighthouse is deployed as a single local read-side service. It reads shared lakehouse files and serves local SQLite indexes over HTTP.
Production Layout¶
Current server convention:
/dev/data1/shadow-lakehouse/
octopus/
<source_name>/ # raw contract written by shadow-octopus
lighthouse/
<source_name>/ # read-side indexes and artifacts
The production Lighthouse config should point at the read-side root:
data_root = "/dev/data1/shadow-lakehouse/lighthouse"
Rebuild Read-Side Indexes¶
From the shadow-lighthouse repository:
scripts/ingest_lakehouse_raw.sh --skip-text
This reads all source workspaces under:
/dev/data1/shadow-lakehouse/octopus
and writes read-side indexes under:
/dev/data1/shadow-lakehouse/lighthouse
Use a source filter when rebuilding one source:
scripts/ingest_lakehouse_raw.sh --skip-text --source cninfo_announcements
Incremental Ingest For Live Sources¶
For minute-level append-only sources, prefer incremental source ingest:
/dev/data1/shadow-lighthouse/.venv/bin/shadow-lighthouse \
--config /dev/data1/shadow-lighthouse/config.prod.toml \
ingest-raw-source \
--source sina7x24 \
--octopus-source-root /dev/data1/shadow-lakehouse/octopus/sina7x24 \
--incremental \
--max-records-per-run 20000 \
--skip-text
This keeps CPU and memory bounded because Lighthouse reads only raw record lines beyond the stored byte offset.
Serve¶
Run the service:
/dev/data1/shadow-lighthouse/.venv/bin/shadow-lighthouse \
--config /dev/data1/shadow-lighthouse/config.prod.toml \
serve \
--host 127.0.0.1 \
--port 8766
In production this is managed by:
systemctl --user status shadow-lighthouse.service --no-pager
Smoke Tests¶
Check process health:
curl "http://127.0.0.1:8766/health"
Check source discovery:
curl "http://127.0.0.1:8766/sources"
Check CNInfo document serving:
curl "http://127.0.0.1:8766/documents?source=cninfo_announcements&limit=24"
Check source status from the CLI:
/dev/data1/shadow-lighthouse/.venv/bin/shadow-lighthouse \
--config /dev/data1/shadow-lighthouse/config.prod.toml \
source-status \
--source cninfo_announcements
Operational Notes¶
- Octopus and Lighthouse should remain separate: Octopus writes raw data; Lighthouse builds read-side projections.
- Prefer source-local ingest and indexes before adding global state.
- Use
--skip-textfor low-pressure catalog updates when PDF or HTML extraction is not needed. - Use incremental ingest for high-frequency append-only sources.
- Rebuild source-local indexes when raw contract migration changes the historical file layout.