Data as a Factor¶
One of the most important ideas in shadow-factor is that source data should already be usable as factor material.
What This Means¶
In many research stacks, users first export raw data, then hand-write separate transformation code, and only then obtain a factor.
shadow-factor shortens that path.
It treats base fields as inputs to a factor system from the beginning.
Examples of useful base fields include:
net_profitoperating_revenuetotal_assetstotal_equitymarket_cap
These fields can be used directly or transformed through DSL operators.
Direct Field Access¶
A field can already behave like a factor-shaped series once it is evaluated over trade date and stock universe.
Examples:
"net_profit_mrq_0"
"operating_revenue_mrq_0"
"total_assets_mrq_0"
This is useful when you want the latest visible reported value, not a fully derived formula yet.
From Data to Factor¶
The next step is to treat those fields as ingredients for better research features.
Examples:
"TTM(net_profit)"
"YoY(TTM(operating_revenue))"
"SafeDiv(TTM(net_profit), total_assets)"
This lets you move smoothly from:
- raw field
- time-aware transformed field
- fully defined business factor
Why This Matters¶
This approach is useful because it:
- reduces manual preprocessing code
- keeps the research path shorter and easier to audit
- makes factor definitions easier to compare across strategies
- keeps data semantics closer to the final factor logic
Good Examples¶
Profitability¶
"SafeDiv(TTM(net_profit), total_assets)"
Interpretation: use base accounting fields to create an ROA-like factor.
Growth¶
"YoY(TTM(net_profit))"
Interpretation: growth is not stored as a separate raw column; it is generated from the same underlying data.
Efficiency¶
"SafeDiv(TTM(operating_profit), TTM(operating_revenue))"
Interpretation: margins are factor outputs created from reusable underlying statements.
Practical Takeaway¶
In shadow-factor, the question is often not:
- “where do I export the data first?”
It is:
- “which field or expression should become the factor?”