Shadow Octopus Raw Contract¶
The raw contract is the stable handoff from Octopus to Lighthouse.
Octopus writes it. Lighthouse reads it. Other tools should treat it as append-oriented source data, not as a query database.
Source Workspace¶
octopus/
<source_name>/
state.db
records/
month=YYYY-MM/
detail.jsonl
manifests/
objects.jsonl
objects-resolved.jsonl
objects-failed.jsonl
objects/
sha256/
<first-two-hex>/
<next-two-hex>/
<sha256>.<ext>
Raw Record Envelope¶
Every raw record line is one JSON object:
{
"source_name": "cninfo_announcements",
"record_type": "announcement_detail",
"source_record_id": "announcement:1225267495",
"published_at": "2026-05-14",
"title": "平安银行:2025年年度报告",
"detail_url": "https://example.com/detail",
"issuer_names": ["平安银行"],
"security_codes": ["000001"],
"payload": {}
}
Contract fields:
| Field | Meaning |
|---|---|
source_name |
Source workspace name |
record_type |
Source-specific record type |
source_record_id |
Stable source-local record id |
published_at |
Source publish date or timestamp when available |
title |
Human-readable title |
detail_url |
Source detail URL when available |
issuer_names |
Issuer names for downstream identity lookup |
security_codes |
Security codes for downstream identity lookup |
payload |
Source-specific raw or normalized payload |
Object Intent Manifest¶
manifests/objects.jsonl records objects that should be downloaded or tracked.
Pending object rows use:
{
"source_name": "cninfo_announcements",
"source_record_id": "announcement:1225267495",
"object_role": "primary_attachment",
"sha256": "pending",
"size_bytes": 0,
"mime_type": "application/pdf",
"file_ext": "pdf",
"storage_rel_path": "pending",
"source_url": "https://static.cninfo.com.cn/finalpage/doc.pdf",
"filename_hint": "annual-report.pdf",
"fetched_at": "2026-05-20T02:49:43Z"
}
Resolved Object Manifest¶
manifests/objects-resolved.jsonl records objects that have been downloaded into the object store:
{
"source_name": "cninfo_announcements",
"source_record_id": "announcement:1225267495",
"object_role": "primary_attachment",
"sha256": "616f43b56f3638670cd19260190265439d6af3c226112b174e161a162544b13f",
"size_bytes": 100,
"mime_type": "application/pdf",
"file_ext": "pdf",
"storage_rel_path": "sha256/61/6f/616f43b56f3638670cd19260190265439d6af3c226112b174e161a162544b13f.pdf",
"source_url": "https://static.cninfo.com.cn/finalpage/doc.pdf",
"filename_hint": "annual-report.pdf",
"fetched_at": "2026-05-20T02:49:43Z"
}
Objects are content-addressed by SHA-256 and placed under:
objects/sha256/<first-two-hex>/<next-two-hex>/<sha256>.<ext>
Failed Object Manifest¶
manifests/objects-failed.jsonl records download failures and retry/dead state for auditability.
The source-local SQLite queue may mirror this state for efficient execution, but the JSONL log remains the durable audit trail.
Compatibility Rule¶
Readers should:
- discover
records/**/*.jsonlrecursively - ignore pending object rows when they require local object bytes
- treat
state.dbas operational and rebuildable - prefer
objects-resolved.jsonlfor local object lookup - handle duplicate
source_record_idrows by taking the latest projection where appropriate