etna.pipeline.Pipeline#

class Pipeline(model: NonPredictionIntervalContextIgnorantAbstractModel | NonPredictionIntervalContextRequiredAbstractModel | PredictionIntervalContextIgnorantAbstractModel | PredictionIntervalContextRequiredAbstractModel, transforms: Sequence[Transform] = (), horizon: int = 1)[source]#

Bases: ModelPipelinePredictMixin, ModelPipelineParamsToTuneMixin, SaveModelPipelineMixin, BasePipeline

Pipeline of transforms with a final estimator.

Makes forecast in one iteration, during which applies transforms and makes call for forecast method for model.

See also

etna.pipeline.AutoRegressivePipeline: Makes forecast in several iterations.
etna.ensembles.DirectEnsemble: Makes forecast by merging the forecasts of base pipelines.

Create instance of Pipeline with given parameters.

Parameters:

model (NonPredictionIntervalContextIgnorantAbstractModel | NonPredictionIntervalContextRequiredAbstractModel | PredictionIntervalContextIgnorantAbstractModel | PredictionIntervalContextRequiredAbstractModel) – Instance of the etna Model
transforms (Sequence[Transform]) – Sequence of the transforms
horizon (int) – Number of timestamps in the future for forecasting

Methods

`backtest`(ts, metrics[, n_folds, mode, ...])	Run backtest with the pipeline.
`fit`(ts[, save_ts])	Fit the Pipeline.
`forecast`([ts, prediction_interval, ...])	Make a forecast of the next points of a dataset.
`get_historical_forecasts`(ts[, n_folds, ...])	Estimate forecast for each fold on the historical dataset.
`load`(path[, ts])	Load an object.
`params_to_tune`()	Get hyperparameter grid to tune.
`predict`(ts[, start_timestamp, ...])	Make in-sample predictions on dataset in a given range.
`save`(path)	Save the object.
`set_params`(**params)	Return new object instance with modified parameters.
`to_dict`()	Collect all information about etna object in dict.

Attributes

This class stores its __init__ parameters as attributes.

backtest(ts: TSDataset, metrics: List[Metric], n_folds: int | List[FoldMask] = 5, mode: str | None = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: bool | int = True, stride: int | None = None, joblib_params: Dict[str, Any] | None = None, forecast_params: Dict[str, Any] | None = None) → Tuple[DataFrame, DataFrame, DataFrame][source]#

Run backtest with the pipeline.

If refit != True and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.

Parameters:

ts (TSDataset) – Dataset to fit models in backtest
metrics (List[Metric]) – List of metrics to compute for each fold
n_folds (int | List[FoldMask]) – Number of folds or the list of fold masks
mode (str | None) – Train generation policy: ‘expand’ or ‘constant’. Works only if n_folds is integer. By default, is set to ‘expand’.
aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise
n_jobs (int) – Number of jobs to run in parallel
refit (bool | int) –
Determines how often pipeline should be retrained during iteration over folds.
- If True: pipeline is retrained on each fold.
- If False: pipeline is trained only on the first fold.
- If value: int: pipeline is trained every value folds starting from the first.
stride (int | None) – Number of points between folds. Works only if n_folds is integer. By default, is set to horizon.
joblib_params (Dict[str, Any] | None) – Additional parameters for joblib.Parallel
forecast_params (Dict[str, Any] | None) – Additional parameters for forecast()

Returns:

Metrics dataframe, forecast dataframe and dataframe with information about folds

Return type:

metrics_df, forecast_df, fold_info_df

Raises:

ValueError: – If mode is set when n_folds are List[FoldMask].
ValueError: – If stride is set when n_folds are List[FoldMask].

fit(ts: TSDataset, save_ts: bool = True) → Pipeline[source]#

Fit the Pipeline.

Fit and apply given transforms to the data, then fit the model on the transformed data.

Parameters:

ts (TSDataset) – Dataset with timeseries data.
save_ts (bool) – Will ts be saved in the pipeline during fit.

Returns:

Fitted Pipeline instance

Return type:

Pipeline

forecast(ts: TSDataset | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) → TSDataset[source]#

Make a forecast of the next points of a dataset.

The result of forecasting starts from the last point of ts, not including it.

Parameters:

ts (TSDataset | None) – Dataset to forecast. If not given, dataset given during fit() is used.
prediction_interval (bool) – If True returns prediction interval for forecast
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval
n_folds (int) – Number of folds to use in the backtest for prediction interval estimation
return_components (bool) – If True additionally returns forecast components

Returns:

Dataset with predictions

Return type:

TSDataset

Estimate forecast for each fold on the historical dataset.

If refit != True and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.

Parameters:

ts (TSDataset) – Dataset to fit models in backtest
n_folds (int | List[FoldMask]) – Number of folds or the list of fold masks
mode (str | None) – Train generation policy: ‘expand’ or ‘constant’. Works only if n_folds is integer. By default, is set to ‘expand’.
n_jobs (int) – Number of jobs to run in parallel
refit (bool | int) –
Determines how often pipeline should be retrained during iteration over folds.
- If True: pipeline is retrained on each fold.
- If False: pipeline is trained only on the first fold.
- If value: int: pipeline is trained every value folds starting from the first.
stride (int | None) – Number of points between folds. Works only if n_folds is integer. By default, is set to horizon.
joblib_params (Dict[str, Any] | None) – Additional parameters for joblib.Parallel
forecast_params (Dict[str, Any] | None) – Additional parameters for forecast()

Returns:

Forecast dataframe

Raises:

ValueError: – If mode is set when n_folds are List[FoldMask].
ValueError: – If stride is set when n_folds are List[FoldMask].

Return type:

DataFrame

classmethod load(path: Path, ts: TSDataset | None = None) → Self[source]#

Load an object.

Warning

This method uses dill module which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.

Parameters:

path (Path) – Path to load object from.
ts (TSDataset | None) – TSDataset to set into loaded pipeline.

Returns:

Loaded object.

Return type:

Self

params_to_tune() → Dict[str, BaseDistribution][source]#

Get hyperparameter grid to tune.

Parameters for model has prefix “model.”, e.g. “model.alpha”.

Parameters for transforms has prefix “transforms.idx.”, e.g. “transforms.0.mode”.

Returns:: Grid with parameters from model and transforms.
Return type:: Dict[str, BaseDistribution]

predict(ts: TSDataset, start_timestamp: Timestamp | int | str | None = None, end_timestamp: Timestamp | int | str | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) → TSDataset[source]#

Make in-sample predictions on dataset in a given range.

Currently, in situation when segments start with different timestamps we only guarantee to work with start_timestamp >= beginning of all segments.

Parameters start_timestamp and end_timestamp of type str are converted into pd.Timestamp.

Parameters:

ts (TSDataset) – Dataset to make predictions on.
start_timestamp (Timestamp | int | str | None) – First timestamp of prediction range to return, should be >= than first timestamp in ts; expected that beginning of each segment <= start_timestamp; if isn’t set the first timestamp where each segment began is taken.
end_timestamp (Timestamp | int | str | None) – Last timestamp of prediction range to return; if isn’t set the last timestamp of ts is taken. Expected that value is less or equal to the last timestamp in ts.
prediction_interval (bool) – If True returns prediction interval for forecast.
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.
return_components (bool) – If True additionally returns forecast components

Returns:

Dataset with predictions in [start_timestamp, end_timestamp] range.

Raises:

ValueError – Incorrect type of start_timestamp or end_timestamp is used according to ts.freq
ValueError: – Value of end_timestamp is less than start_timestamp.
ValueError: – Value of start_timestamp goes before point where each segment started.
ValueError: – Value of end_timestamp goes after the last timestamp.
NotImplementedError: – Adding target components is not currently implemented

Return type:

TSDataset

save(path: Path)[source]#

Save the object.

Parameters:: path (Path) – Path to save object to.

set_params(**params: dict) → Self[source]#

Return new object instance with modified parameters.

Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a model in a Pipeline.

Nested parameters are expected to be in a <component_1>.<...>.<parameter> form, where components are separated by a dot.

Parameters:: **params (dict) – Estimator parameters
Returns:: New instance with changed parameters
Return type:: Self

Examples

>>> from etna.pipeline import Pipeline
>>> from etna.models import NaiveModel
>>> from etna.transforms import AddConstTransform
>>> model = NaiveModel(lag=1)
>>> transforms = [AddConstTransform(in_column="target", value=1)]
>>> pipeline = Pipeline(model, transforms=transforms, horizon=3)
>>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2})
Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )

to_dict()[source]#: Collect all information about etna object in dict.