Skip to content

Shadow Octopus Supported Sources

Source definitions are TOML files under sources/.

The current production sources are enabled in the root sources/ directory. Example source kinds live under sources/examples/ and are disabled by default.

Production Sources

Source source_kind Remote source Current scheduler role Output
cninfo_announcements cninfo CNInfo company announcements and historical announcement search Historical backfill plus object download Announcement records, pending/resolved PDF manifests, PDF objects
cls_telegraph cls_telegraph 财联社电报 roll list Latest news sync Telegraph news raw records
sina7x24 sina7x24 新浪财经 7x24 feed Latest news sync 7x24 news raw records

CNInfo Announcements

Configured file:

sources/cninfo_announcements.toml

Current acquisition paths:

Path Command Purpose
Company page sync sync-cninfo Newest-first company announcement pages
Historical search backfill backfill-cninfo-history Reverse date/page backfill from CNInfo historical search
Object download download-objects Download pending PDF objects into objects/sha256/...
Queue rebuild rebuild-object-download-queue Rebuild the source-local object download queue from manifests
Legacy import import-shadow-pdf Import old shadow-pdf CNInfo metadata and cached PDFs

Current scheduler settings:

Setting Value
enabled true
mode history
streams ["a_share_all"]
start_date 2000-01-01
max_pages 10
history_interval_seconds 600
download_limit 50
download_interval_seconds 600
request_delay 1.0 second

CNInfo latest sync and historical backfill both append to the same raw contract. They are intentionally separate acquisition commands so historical fill and latest refresh can be scheduled independently.

CLS Telegraph

Configured file:

sources/cls_telegraph.toml

Current scheduler settings:

Setting Value
enabled true
interval_seconds 300
count 20
pages 1
request_delay 1.0 second

Command:

uv run shadow-octopus --config config.example.toml sync-cls-telegraph \
  --source cls_telegraph \
  --count 20 \
  --pages 1

Sina 7x24

Configured file:

sources/sina7x24.toml

Current scheduler settings:

Setting Value
enabled true
interval_seconds 300
count 100
pages 1
request_delay 1.0 second

Commands:

uv run shadow-octopus --config config.example.toml sync-sina7x24 \
  --source sina7x24 \
  --count 100 \
  --pages 1
uv run shadow-octopus --config config.example.toml backfill-sina7x24 \
  --source sina7x24 \
  --pages 100

The backfill command pages backward from the current minimum Sina item id when no explicit cursor is provided.

Reusable Source Kinds

These source kinds are supported by code and examples, but the example configs are disabled by default:

Example source source_kind Config file Use case
example_news_feed feed sources/examples/news_feed.toml RSS/Atom feeds
example_json_news json_api sources/examples/json_news.toml Simple JSON list APIs with configured paths and pagination
example_html_news html_page sources/examples/html_news.toml Simple HTML listing pages parsed by configured patterns

Generic source kinds are for simple sources. Prefer a dedicated adapter when a website needs signing, anti-abuse handling, unusual pagination, or source-specific normalization.

Manual Acquisition Paths

Octopus also supports manual or semi-manual ingestion:

Command Purpose
import-object Import a local PDF, MP3, MP4, XLSX, or similar file into a source workspace
capture-url Fetch one URL directly into the source-local object store
capture-url-list Fetch a slow URL list from text or JSONL input
discover-links Discover supported object links from an HTML page into pending manifests

Use these for one-off datasets, manual media, external reports, and migration tasks that still need to land in the same Octopus raw contract.