Pipelines¶
Introduction¶
The package contains pipelines to generate the indicators of universe inclusions.
Each pipeline returns a pandas dataframe indicating if the instrument is included into the universe on the specified date / time. For example, the pipeline returns the following dataframe
+------------+--------+-------+
| date | AAPL | GOOGL |
+------------+--------+-------+
| 2022-11-17 | True | False |
+------------+--------+-------+
| 2022-11-18 | True | True |
+------------+--------+-------+
and it indicates AAPL is included in the universe on both 2022-11-17 and 2022-11-18 while GOOGL only on 2022-11-18.
The final universe inclusions is the intersection of instruments among all the pipelines.
Pipeline functions¶
- fpm_universe.pipeline.combine_validity(*args: List[DataFrame]) DataFrame¶
Combine validity.
Parameters¶
- argsList[pd.DataFrame]
List of validity dataframes, each of which is produced by a single pipeline.
- fpm_universe.pipeline.range_validity(values: List[Dict[str, str]], start_datetime: Union[str, datetime, Timestamp], last_datetime: Union[str, datetime, Timestamp], frequency: str) DataFrame¶
Include the instrument into universe by the datetime range of validity.
- Parameters:
values (list[dict[str, str]]) – The list of instrument including the symbol, valid start datetime and valid last datetime.
start_datetime (str, or any type convertible by pandas Timestamp.) – The universe start datetime.
last_datetime (str, or any type convertible by pandas Timestamp.) – The universe last datetime.
frequency (str) – The frequency string supported in pandas. For further details, please refer to [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)
- Returns:
A dataframe indicating whether the instrument is included in the universe.
- Return type:
pd.DataFrame.
- fpm_universe.pipeline.ranking(values: DataFrame, threshold_pct: float, tolerance_timeframes: int, start_datetime: Union[str, datetime, Timestamp], last_datetime: Union[str, datetime, Timestamp], frequency: str) DataFrame¶
Include the instrument into the universe by ranking.
The instrument is selected by a threshold percentage of ranking, and allow to stay in the universe for a specified number of tolerance timeframes.
For example, if the parameter threshold_pct is 0.4, and tolerance_timeframes is 21, the instrument will be selected into the universe only if it is ranked as the top 40% of the values, and will be stayed in the universe for the next 21 timeframes even if the condition is no longer fulfilled.
- Parameters:
values (class:pandas.DataFrame.) – The values are sorted in cross sectional rank. The columns are instruments, and the index are in datetime.
threshold_pct (float.) – The threshold percentage which should be between 0 and 1.
tolerance_timeframes (int.) – The number of timeframes to allow the instrument stays in the universe even its values is outside of the threshold percentage. Default is 21.
start_datetime (str, or any type convertible by pandas Timestamp.) – The universe start datetime.
last_datetime (str, or any type convertible by pandas Timestamp.) – The universe last datetime.
frequency (str) – The frequency string supported in pandas. For further details, please refer to [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)
- Return type:
pd.DataFrame.
- fpm_universe.pipeline.rolling_correlation_rank_validity(values: DataFrame, rankings: DataFrame, rolling_window: int, threshold: float, start_datetime: Union[str, datetime, Timestamp], last_datetime: Union[str, datetime, Timestamp], frequency: str) DataFrame¶
Exclude instruments if the correlations are too high and only the higher ranked instruments are selected.
- Parameters:
values (class:pandas.DataFrame.) – The values to compare with their correlations, e.g. returns. The columns are instruments, and the index are in datetime.
rankings (class:pandas.DataFrame.) – The rankings are sorted in cross sectional rank. The columns are instruments, and the index are in datetime.
rolling_window (int.) – The number of rolling timeframes that in the past the values exist. The number must be non-negaive.
threshold (float.) – The threshold correlation which should be between 0 and 1.
start_datetime (str, or any type convertible by pandas Timestamp.) – The universe start datetime.
last_datetime (str, or any type convertible by pandas Timestamp.) – The universe last datetime.
frequency (str.) – The frequency string supported in pandas. For further details, please refer to [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)
- Returns:
A dataframe indicating whether the instrument is included in the universe.
- Return type:
pd.DataFrame.
- fpm_universe.pipeline.rolling_validity(values: DataFrame, threshold_pct: float, rolling_window: int, tolerance_timeframes: int, start_datetime: Union[str, datetime, Timestamp], last_datetime: Union[str, datetime, Timestamp], frequency: str) DataFrame¶
Include the instrument into the universe by rolling validity.
The instrument is included into the universe if the values in the rolling timeseries exist in a specified timeframe and threshold percentage.
For example, if the threshold_pct is 0.8 and rolling_window is 21, the instrument is included into the universe only if the previous 21 days have at least 80% valid values in the rolling basis.
- Parameters:
values (class:pandas.DataFrame.) – The values are sorted in cross sectional rank. The columns are instruments, and the index are in datetime.
threshold_pct (float.) – The threshold percentage which should be between 0 and 1.
rolling_window (int.) – The number of rolling timeframes that in the past the values exist. The number must be non-negaive.
tolerance_timeframes (int.) – The number of timeframes to allow the instrument stays in the universe even its values is outside of the threshold percentage. Default is 21.
start_datetime (str, or any type convertible by pandas Timestamp.) – The universe start datetime.
last_datetime (str, or any type convertible by pandas Timestamp.) – The universe last datetime.
frequency (str) – The frequency string supported in pandas. For further details, please refer to [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)
- Returns:
A dataframe indicating whether the instrument is included in the universe.
- Return type:
pd.DataFrame.