Pipelines¶

Introduction¶

The package contains pipelines to generate the indicators of universe inclusions.

Each pipeline returns a pandas dataframe indicating if the instrument is included into the universe on the specified date / time. For example, the pipeline returns the following dataframe

+------------+--------+-------+
|    date    |  AAPL  | GOOGL |
+------------+--------+-------+
| 2022-11-17 |  True  | False |
+------------+--------+-------+
| 2022-11-18 |  True  |  True |
+------------+--------+-------+

and it indicates AAPL is included in the universe on both 2022-11-17 and 2022-11-18 while GOOGL only on 2022-11-18.

The final universe inclusions is the intersection of instruments among all the pipelines.

Pipeline functions¶

fpm_universe.pipeline.combine_validity(*args: List[DataFrame]) → DataFrame¶

Combine validity.

Parameters¶

argsList[pd.DataFrame]: List of validity dataframes, each of which is produced by a single pipeline.

fpm_universe.pipeline.range_validity(values: List[Dict[str, str]], start_datetime: Union[str, datetime, Timestamp], last_datetime: Union[str, datetime, Timestamp], frequency: str) → DataFrame¶

Include the instrument into universe by the datetime range of validity.

Parameters:

values (list[dict[str, str]]) – The list of instrument including the symbol, valid start datetime and valid last datetime.
start_datetime (str, or any type convertible by pandas Timestamp.) – The universe start datetime.
last_datetime (str, or any type convertible by pandas Timestamp.) – The universe last datetime.
frequency (str) – The frequency string supported in pandas. For further details, please refer to [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)

Returns:

A dataframe indicating whether the instrument is included in the universe.

Return type:

pd.DataFrame.

fpm_universe.pipeline.ranking(values: DataFrame, threshold_pct: float, tolerance_timeframes: int, start_datetime: Union[str, datetime, Timestamp], last_datetime: Union[str, datetime, Timestamp], frequency: str) → DataFrame¶

Include the instrument into the universe by ranking.

The instrument is selected by a threshold percentage of ranking, and allow to stay in the universe for a specified number of tolerance timeframes.

For example, if the parameter threshold_pct is 0.4, and tolerance_timeframes is 21, the instrument will be selected into the universe only if it is ranked as the top 40% of the values, and will be stayed in the universe for the next 21 timeframes even if the condition is no longer fulfilled.

Parameters:

values (class:pandas.DataFrame.) – The values are sorted in cross sectional rank. The columns are instruments, and the index are in datetime.
threshold_pct (float.) – The threshold percentage which should be between 0 and 1.
tolerance_timeframes (int.) – The number of timeframes to allow the instrument stays in the universe even its values is outside of the threshold percentage. Default is 21.
start_datetime (str, or any type convertible by pandas Timestamp.) – The universe start datetime.
last_datetime (str, or any type convertible by pandas Timestamp.) – The universe last datetime.
frequency (str) – The frequency string supported in pandas. For further details, please refer to [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)

Return type:

pd.DataFrame.

fpm_universe.pipeline.rolling_correlation_rank_validity(values: DataFrame, rankings: DataFrame, rolling_window: int, threshold: float, start_datetime: Union[str, datetime, Timestamp], last_datetime: Union[str, datetime, Timestamp], frequency: str) → DataFrame¶

Exclude instruments if the correlations are too high and only the higher ranked instruments are selected.

Parameters:

values (class:pandas.DataFrame.) – The values to compare with their correlations, e.g. returns. The columns are instruments, and the index are in datetime.
rankings (class:pandas.DataFrame.) – The rankings are sorted in cross sectional rank. The columns are instruments, and the index are in datetime.
rolling_window (int.) – The number of rolling timeframes that in the past the values exist. The number must be non-negaive.
threshold (float.) – The threshold correlation which should be between 0 and 1.
start_datetime (str, or any type convertible by pandas Timestamp.) – The universe start datetime.
last_datetime (str, or any type convertible by pandas Timestamp.) – The universe last datetime.
frequency (str.) – The frequency string supported in pandas. For further details, please refer to [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)

Returns:

A dataframe indicating whether the instrument is included in the universe.

Return type:

pd.DataFrame.

fpm_universe.pipeline.rolling_validity(values: DataFrame, threshold_pct: float, rolling_window: int, tolerance_timeframes: int, start_datetime: Union[str, datetime, Timestamp], last_datetime: Union[str, datetime, Timestamp], frequency: str) → DataFrame¶

Include the instrument into the universe by rolling validity.

The instrument is included into the universe if the values in the rolling timeseries exist in a specified timeframe and threshold percentage.

For example, if the threshold_pct is 0.8 and rolling_window is 21, the instrument is included into the universe only if the previous 21 days have at least 80% valid values in the rolling basis.

Parameters:

values (class:pandas.DataFrame.) – The values are sorted in cross sectional rank. The columns are instruments, and the index are in datetime.
threshold_pct (float.) – The threshold percentage which should be between 0 and 1.
rolling_window (int.) – The number of rolling timeframes that in the past the values exist. The number must be non-negaive.
tolerance_timeframes (int.) – The number of timeframes to allow the instrument stays in the universe even its values is outside of the threshold percentage. Default is 21.
start_datetime (str, or any type convertible by pandas Timestamp.) – The universe start datetime.
last_datetime (str, or any type convertible by pandas Timestamp.) – The universe last datetime.
frequency (str) – The frequency string supported in pandas. For further details, please refer to [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)

Returns:

A dataframe indicating whether the instrument is included in the universe.

Return type:

pd.DataFrame.