Data as a Factor¶

One of the most important ideas in shadow-factor is that source data should already be usable as factor material.

What This Means¶

In many research stacks, users first export raw data, then hand-write separate transformation code, and only then obtain a factor.

shadow-factor shortens that path.

It treats base fields as inputs to a factor system from the beginning.

Examples of useful base fields include:

These fields can be used directly or transformed through DSL operators.

A field can already behave like a factor-shaped series once it is evaluated over trade date and stock universe.

Examples:

"net_profit_mrq_0"
"operating_revenue_mrq_0"
"total_assets_mrq_0"

This is useful when you want the latest visible reported value, not a fully derived formula yet.

The next step is to treat those fields as ingredients for better research features.

Examples:

"TTM(net_profit)"
"YoY(TTM(operating_revenue))"
"SafeDiv(TTM(net_profit), total_assets)"

This lets you move smoothly from:

This approach is useful because it:

"SafeDiv(TTM(net_profit), total_assets)"

Interpretation: use base accounting fields to create an ROA-like factor.

"YoY(TTM(net_profit))"

Interpretation: growth is not stored as a separate raw column; it is generated from the same underlying data.

"SafeDiv(TTM(operating_profit), TTM(operating_revenue))"

Interpretation: margins are factor outputs created from reusable underlying statements.

In shadow-factor, the question is often not:

It is: