Skip to content

Shadow Octopus Raw Contract

The raw contract is the stable handoff from Octopus to Lighthouse.

Octopus writes it. Lighthouse reads it. Other tools should treat it as append-oriented source data, not as a query database.

Source Workspace

octopus/
  <source_name>/
    state.db
    records/
      month=YYYY-MM/
        detail.jsonl
    manifests/
      objects.jsonl
      objects-resolved.jsonl
      objects-failed.jsonl
    objects/
      sha256/
        <first-two-hex>/
          <next-two-hex>/
            <sha256>.<ext>

Raw Record Envelope

Every raw record line is one JSON object:

{
  "source_name": "cninfo_announcements",
  "record_type": "announcement_detail",
  "source_record_id": "announcement:1225267495",
  "published_at": "2026-05-14",
  "title": "平安银行:2025年年度报告",
  "detail_url": "https://example.com/detail",
  "issuer_names": ["平安银行"],
  "security_codes": ["000001"],
  "payload": {}
}

Contract fields:

Field Meaning
source_name Source workspace name
record_type Source-specific record type
source_record_id Stable source-local record id
published_at Source publish date or timestamp when available
title Human-readable title
detail_url Source detail URL when available
issuer_names Issuer names for downstream identity lookup
security_codes Security codes for downstream identity lookup
payload Source-specific raw or normalized payload

Object Intent Manifest

manifests/objects.jsonl records objects that should be downloaded or tracked.

Pending object rows use:

{
  "source_name": "cninfo_announcements",
  "source_record_id": "announcement:1225267495",
  "object_role": "primary_attachment",
  "sha256": "pending",
  "size_bytes": 0,
  "mime_type": "application/pdf",
  "file_ext": "pdf",
  "storage_rel_path": "pending",
  "source_url": "https://static.cninfo.com.cn/finalpage/doc.pdf",
  "filename_hint": "annual-report.pdf",
  "fetched_at": "2026-05-20T02:49:43Z"
}

Resolved Object Manifest

manifests/objects-resolved.jsonl records objects that have been downloaded into the object store:

{
  "source_name": "cninfo_announcements",
  "source_record_id": "announcement:1225267495",
  "object_role": "primary_attachment",
  "sha256": "616f43b56f3638670cd19260190265439d6af3c226112b174e161a162544b13f",
  "size_bytes": 100,
  "mime_type": "application/pdf",
  "file_ext": "pdf",
  "storage_rel_path": "sha256/61/6f/616f43b56f3638670cd19260190265439d6af3c226112b174e161a162544b13f.pdf",
  "source_url": "https://static.cninfo.com.cn/finalpage/doc.pdf",
  "filename_hint": "annual-report.pdf",
  "fetched_at": "2026-05-20T02:49:43Z"
}

Objects are content-addressed by SHA-256 and placed under:

objects/sha256/<first-two-hex>/<next-two-hex>/<sha256>.<ext>

Failed Object Manifest

manifests/objects-failed.jsonl records download failures and retry/dead state for auditability.

The source-local SQLite queue may mirror this state for efficient execution, but the JSONL log remains the durable audit trail.

Compatibility Rule

Readers should:

  • discover records/**/*.jsonl recursively
  • ignore pending object rows when they require local object bytes
  • treat state.db as operational and rebuildable
  • prefer objects-resolved.jsonl for local object lookup
  • handle duplicate source_record_id rows by taking the latest projection where appropriate