etna.pipeline.HierarchicalPipeline#
- class HierarchicalPipeline(reconciliator: BaseReconciliator, model: NonPredictionIntervalContextIgnorantAbstractModel | NonPredictionIntervalContextRequiredAbstractModel | PredictionIntervalContextIgnorantAbstractModel | PredictionIntervalContextRequiredAbstractModel, transforms: Sequence[Transform] = (), horizon: int = 1)[source]#
Bases:
Pipeline
Pipeline of transforms with a final estimator for hierarchical time series data.
Notes
Aggregation of target quantiles and components is performed along with the target itself. It uses a provided hierarchical structure and a reconciliation method.
Create instance of HierarchicalPipeline with given parameters.
- Parameters:
reconciliator (BaseReconciliator) – Instance of reconciliation method
model (NonPredictionIntervalContextIgnorantAbstractModel | NonPredictionIntervalContextRequiredAbstractModel | PredictionIntervalContextIgnorantAbstractModel | PredictionIntervalContextRequiredAbstractModel) – Instance of the etna Model
transforms (Sequence[Transform]) – Sequence of the transforms
horizon (int) – Number of timestamps in the future for forecasting
Warning
Estimation of forecast intervals with forecast(prediction_interval=True) method and BottomUpReconciliator may be not reliable.
Methods
backtest
(ts, metrics[, n_folds, mode, ...])Run backtest with the pipeline.
fit
(ts[, save_ts])Fit the HierarchicalPipeline.
forecast
([ts, prediction_interval, ...])Make a forecast of the next points of a dataset at a target level.
get_historical_forecasts
(ts[, n_folds, ...])Estimate forecast for each fold on the historical dataset.
load
(path[, ts])Load an object.
Get hyperparameter grid to tune.
predict
([ts, start_timestamp, ...])Make in-sample predictions on dataset at the target level in a given range.
raw_forecast
(ts[, prediction_interval, ...])Make a forecast of the next points of a dataset at the source level.
raw_predict
(ts[, start_timestamp, ...])Make in-sample predictions on dataset at the source level in a given range.
save
(path)Save the object.
set_params
(**params)Return new object instance with modified parameters.
to_dict
()Collect all information about etna object in dict.
Attributes
This class stores its
__init__
parameters as attributes.- backtest(ts: TSDataset, metrics: List[Metric], n_folds: int | List[FoldMask] = 5, mode: str | None = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: bool | int = True, stride: int | None = None, joblib_params: Dict[str, Any] | None = None, forecast_params: Dict[str, Any] | None = None) Tuple[DataFrame, DataFrame, DataFrame] [source]#
Run backtest with the pipeline.
If
refit != True
and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.- Parameters:
ts (TSDataset) – Dataset to fit models in backtest
metrics (List[Metric]) – List of metrics to compute for each fold
n_folds (int | List[FoldMask]) – Number of folds or the list of fold masks
mode (str | None) – Train generation policy: ‘expand’ or ‘constant’. Works only if
n_folds
is integer. By default, is set to ‘expand’.aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise
n_jobs (int) – Number of jobs to run in parallel
Determines how often pipeline should be retrained during iteration over folds.
If
True
: pipeline is retrained on each fold.If
False
: pipeline is trained only on the first fold.If
value: int
: pipeline is trained everyvalue
folds starting from the first.
stride (int | None) – Number of points between folds. Works only if
n_folds
is integer. By default, is set tohorizon
.joblib_params (Dict[str, Any] | None) – Additional parameters for
joblib.Parallel
forecast_params (Dict[str, Any] | None) – Additional parameters for
forecast()
- Returns:
Metrics dataframe, forecast dataframe and dataframe with information about folds
- Return type:
metrics_df, forecast_df, fold_info_df
- Raises:
ValueError: – If
mode
is set whenn_folds
areList[FoldMask]
.ValueError: – If
stride
is set whenn_folds
areList[FoldMask]
.
- fit(ts: TSDataset, save_ts: bool = True) HierarchicalPipeline [source]#
Fit the HierarchicalPipeline.
Fit and apply given transforms to the data, then fit the model on the transformed data. Provided hierarchical dataset will be aggregated to the source level before fitting pipeline.
- Parameters:
- Returns:
Fitted HierarchicalPipeline instance
- Return type:
- forecast(ts: TSDataset | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) TSDataset [source]#
Make a forecast of the next points of a dataset at a target level.
The result of forecasting starts from the last point of
ts
, not including it.Method makes a prediction for target at the source level of hierarchy and then makes reconciliation to target level.
- Parameters:
ts (TSDataset | None) – Dataset to forecast. If not given, dataset given during :py:meth:
fit
is used.prediction_interval (bool) – If True returns prediction interval for forecast
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval
n_folds (int) – Number of folds to use in the backtest for prediction interval estimation
return_components (bool) – If True additionally returns forecast components
- Returns:
Dataset with predictions at the target level of hierarchy.
- Return type:
- get_historical_forecasts(ts: TSDataset, n_folds: int | List[FoldMask] = 5, mode: str | None = None, n_jobs: int = 1, refit: bool | int = True, stride: int | None = None, joblib_params: Dict[str, Any] | None = None, forecast_params: Dict[str, Any] | None = None) DataFrame [source]#
Estimate forecast for each fold on the historical dataset.
If
refit != True
and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.- Parameters:
ts (TSDataset) – Dataset to fit models in backtest
n_folds (int | List[FoldMask]) – Number of folds or the list of fold masks
mode (str | None) – Train generation policy: ‘expand’ or ‘constant’. Works only if
n_folds
is integer. By default, is set to ‘expand’.n_jobs (int) – Number of jobs to run in parallel
Determines how often pipeline should be retrained during iteration over folds.
If
True
: pipeline is retrained on each fold.If
False
: pipeline is trained only on the first fold.If
value: int
: pipeline is trained everyvalue
folds starting from the first.
stride (int | None) – Number of points between folds. Works only if
n_folds
is integer. By default, is set tohorizon
.joblib_params (Dict[str, Any] | None) – Additional parameters for
joblib.Parallel
forecast_params (Dict[str, Any] | None) – Additional parameters for
forecast()
- Returns:
Forecast dataframe
- Raises:
ValueError: – If
mode
is set whenn_folds
areList[FoldMask]
.ValueError: – If
stride
is set whenn_folds
areList[FoldMask]
.
- Return type:
- classmethod load(path: Path, ts: TSDataset | None = None) Self [source]#
Load an object.
Warning
This method uses
dill
module which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.
- params_to_tune() Dict[str, BaseDistribution] [source]#
Get hyperparameter grid to tune.
Parameters for model has prefix “model.”, e.g. “model.alpha”.
Parameters for transforms has prefix “transforms.idx.”, e.g. “transforms.0.mode”.
- Returns:
Grid with parameters from model and transforms.
- Return type:
- predict(ts: TSDataset | None = None, start_timestamp: Timestamp | None = None, end_timestamp: Timestamp | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) TSDataset [source]#
Make in-sample predictions on dataset at the target level in a given range.
Method makes a prediction for target at the source level of hierarchy and then makes reconciliation to the target level.
Currently, in situation when segments start with different timestamps we only guarantee to work with
start_timestamp
>= beginning of all segments.- Parameters:
ts (TSDataset | None) – Dataset to make predictions on. If not given, dataset given during :py:meth:
fit
is used.start_timestamp (Timestamp | None) – First timestamp of prediction range to return, should be >= than first timestamp in
ts
; expected that beginning of each segment <=start_timestamp
; if isn’t set the first timestamp where each segment began is taken.end_timestamp (Timestamp | None) – Last timestamp of prediction range to return; if isn’t set the last timestamp of
ts
is taken. Expected that value is less or equal to the last timestamp ints
.prediction_interval (bool) – If True returns prediction interval for forecast.
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.
return_components (bool) – If True additionally returns forecast components.
- Returns:
Dataset with predictions at the target level in
[start_timestamp, end_timestamp]
range.- Return type:
- raw_forecast(ts: TSDataset, prediction_interval: bool = False, quantiles: Sequence[float] = (0.25, 0.75), n_folds: int = 3, return_components: bool = False) TSDataset [source]#
Make a forecast of the next points of a dataset at the source level.
The result of forecasting starts from the last point of
ts
, not including it.- Parameters:
ts (TSDataset) – Dataset to forecast
prediction_interval (bool) – If True returns prediction interval for forecast
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval
n_folds (int) – Number of folds to use in the backtest for prediction interval estimation
return_components (bool) – If True additionally returns forecast components
- Returns:
Dataset with predictions at the source level
- Return type:
- raw_predict(ts: TSDataset, start_timestamp: Timestamp | None = None, end_timestamp: Timestamp | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) TSDataset [source]#
Make in-sample predictions on dataset at the source level in a given range.
- Parameters:
ts (TSDataset) – Dataset to make predictions on. If not given, dataset given during :py:meth:
fit
is used.start_timestamp (Timestamp | None) – First timestamp of prediction range to return, should be >= than first timestamp in
ts
; expected that beginning of each segment <=start_timestamp
; if isn’t set the first timestamp where each segment began is taken.end_timestamp (Timestamp | None) – Last timestamp of prediction range to return; if isn’t set the last timestamp of
ts
is taken. Expected that value is less or equal to the last timestamp ints
.prediction_interval (bool) – If True returns prediction interval for forecast.
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.
return_components (bool) – If True additionally returns forecast components.
- Returns:
Dataset with predictions at the source level in
[start_timestamp, end_timestamp]
range.- Return type:
- set_params(**params: dict) Self [source]#
Return new object instance with modified parameters.
Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a
model
in aPipeline
.Nested parameters are expected to be in a
<component_1>.<...>.<parameter>
form, where components are separated by a dot.- Parameters:
**params (dict) – Estimator parameters
- Returns:
New instance with changed parameters
- Return type:
Self
Examples
>>> from etna.pipeline import Pipeline >>> from etna.models import NaiveModel >>> from etna.transforms import AddConstTransform >>> model = NaiveModel(lag=1) >>> transforms = [AddConstTransform(in_column="target", value=1)] >>> pipeline = Pipeline(model, transforms=transforms, horizon=3) >>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2}) Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )