Skip to content

Shadow Ingest

shadow-ingest is a dataframe-first, read-only access layer over parquet datasets.

For most Python users, the intended experience is simple:

  • import shadow_ingest as si
  • use the small stable SDK surface
  • receive either a final polars.DataFrame or a small discovery list

Stable Public APIs

The public SDK surface is intentionally small and currently split into three groups.

Core data APIs:

  • gather_daily_price(...)
  • gather_daily_snapshot(...)
  • gather_financial_snapshot(...)

Discoverability APIs:

  • list_fields(...)
  • list_market_calendar(...)
  • list_universe(...)

Industry APIs:

  • get_industry_standards()
  • get_industry_mapping(...)
  • get_industry_members(...)

The dataframe-returning APIs return polars.DataFrame.

Recent Updates

Recent shadow-ingest changes that are now reflected in this documentation:

  • list_fields(...) is the stable public field-discovery helper for gather_daily_price(...)
  • total_turnover is documented as traded value, not share volume
  • list_universe(...) now requires an explicit date; the SDK accepts YYYYMMDD inputs and normalizes backing order_book_id columns to public stock_code results
  • industry lookup APIs were added for standards, stock-to-industry mapping, and industry-to-members queries
  • serve-fastapi is now the maintained HTTP entrypoint; the legacy serve HTTP entrypoint has been removed

Who This Is For

Use shadow-ingest if you need:

  • a simple Python data pull interface
  • a maintained remote service mode
  • efficient transport for larger dataframe-shaped responses
  • a stable query interface that hides transport and batching details

If you are new to the SDK, the easiest mental model is:

  1. use list_market_calendar(...) to pick valid trading dates
  2. use list_universe(...) to pick valid stock codes for a date
  3. use list_fields(...) when you are calling gather_daily_price(...)
  4. use the default standard="sws" when that taxonomy is acceptable, or call get_industry_standards() when you need to inspect available taxonomy names
  5. call one of the dataframe-returning APIs to fetch the final table