Welcome to Factor Pricing Model Universe documentation!

Factor Pricing Model Universe

CI Status Documentation Status Test coverage percentage

Poetry black pre-commit

PyPI Version Supported Python versions License

Package to build universes for factor pricing model. For further details, please refer to the documentation

Installation

Install this via pip (or your favourite package manager):

pip install factor-pricing-model-universe

Usage

The library contains the pipelines to build the universe. You can run the pipelines interactively in Jupyter Notebook.

from fpm_universe import pipeline

Alternatively, for scheduled runs, you can create a configuration and run the command line entry point to create the universe.

Configuration

The configuration is in yaml format and contains a few inputs

Name

Description

output_filename

Output filename

intermediate_directory

Intermediate directory to export the pipeline outputs

start_datetime

Start datetime of the universe

last_datetime

Last datetime of the universe

frequency

Frequency of the universe. For further details, please see the “Offset aliases” in pandas documentation

pipeline

List of pipelines to filter the universe

data

Defines the data used by pipeline, or referred by yaml tag !data

Each pipeline returns a pandas dataframe indicating if the instrument is included into the universe on the specified date / time. For example, the pipeline returns the following dataframe

+------------+--------+-------+
|    date    |  AAPL  | GOOGL |
+------------+--------+-------+
| 2022-11-17 |  True  | False |
+------------+--------+-------+
| 2022-11-18 |  True  |  True |
+------------+--------+-------+

and it indicates AAPL is included in the universe on both 2022-11-17 and 2022-11-18 while GOOGL only on 2022-11-18.

By default, the pipeline functions are imported from module fpm_universe.pipeline.

Each data defines the method to retrieve from the source, or the operator on the source data. The return type of each data is unconstrained. It can be a json-like dict, a list, a pandas series, or even a pandas dataframe.

In the configuration, Each data can be referred by yaml tag !data, and it is loaded in lazy only when it is referred by another data object or a pipeline.

Command

The entry point factor-pricing-model-universe is to generate the universe regarding the given configuration to the destination, with dynamically passing the parameters to format the configuration.

The arguments of the entry point are

Argument

Description

-c, --config TEXT

Required. Configuration file path.

-p, --parameter TEXT

Parameters to be formatted in the configuration.

For example, given the configuration as follows,

output_filename: "{output_directory}/{date}.parquet"
intermediate_directory: "{output_directory}/{date}"
start_datetime: "2015-01-01"
last_datetime: "{date}"
frequency: "B"
pipeline:
  - name: range_validity
    function: range_validity
    parameters:
      values: !data initial_validity
data:
  symbols:
    function: jq_compile
    parameters:
      json_filename: "{data_directory}/index/sp500/default/{date}.json"
      pattern: "[.[] | .tickers[]] | sort | unique | .[]"
  initial_validity:
    function: jq_compile
    parameters:
      json_filename: "{data_directory}/listings/{date}.json"
      pattern: ".[] | {{ symbol: .symbol, valid_start_datetime: .ipoDate, valid_last_datetime: .delistingDate }}"
      includes:
        symbol: !data symbols

and run the following command

factor-pricing-model-universe \
  --config <path> \
  --parameter output_directory=$HOME/output \
  --parameter data_directory=$HOME/data \
  --parameter date=2022-10-20

the universe dataframe is output to $HOME/output/2022-10-20.parquet (formatted with the parameter output_directory and date).

More details…

Installation & Usage