etna.models.SARIMAXModel#

class SARIMAXModel(order: Tuple[int, int, int] = (1, 0, 0), seasonal_order: Tuple[int, int, int, int] = (0, 0, 0, 0), trend: str | None = None, measurement_error: bool = False, time_varying_regression: bool = False, mle_regression: bool = True, simple_differencing: bool = False, enforce_stationarity: bool = True, enforce_invertibility: bool = True, hamilton_representation: bool = False, concentrate_scale: bool = False, trend_offset: float = 1, use_exact_diffuse: bool = False, dates: List[datetime] | None = None, freq: str | None = None, missing: str = 'none', validate_specification: bool = True, fit_params: Dict[str, Any] | None = None, **kwargs)[source]#

Bases: PerSegmentModelMixin, PredictionIntervalContextIgnorantModelMixin, PredictionIntervalContextIgnorantAbstractModel

Class for holding SARIMAX model.

Method predict can use true target values only on train data on future data autoregression forecasting will be made even if targets are known.

Notes

We use statsmodels.tsa.statespace.sarimax.SARIMAX. Statsmodels package uses exog attribute for exogenous regressors which should be known in future, however we use exogenous for additional features what is not known in future, and regressors for features we do know in future.

This model supports in-sample and out-of-sample prediction decomposition. Prediction components for SARIMAX model are: exogenous and SARIMA components. Decomposition is obtained directly from fitted model parameters.

Init SARIMAX model with given params.

Parameters:
  • order (Tuple[int, int, int]) – The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters. d must be an integer indicating the integration order of the process, while p and q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. Default is an AR(1) model: (1,0,0).

  • seasonal_order (Tuple[int, int, int, int]) – The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. s is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.

  • trend (str | None) – Parameter controlling the deterministic trend polynomial \(A(t)\). Can be specified as a string where ‘c’ indicates a constant (i.e. a degree zero component of the trend polynomial), ‘t’ indicates a linear trend with time, and ‘ct’ is both. Can also be specified as an iterable defining the non-zero polynomial exponents to include, in increasing order. For example, [1,1,0,1] denotes \(a + bt + ct^3\). Default is to not include a trend component.

  • measurement_error (bool) – Whether or not to assume the endogenous observations endog were measured with error. Default is False.

  • time_varying_regression (bool) – Used when an explanatory variables, exog, are provided provided to select whether or not coefficients on the exogenous regressors are allowed to vary over time. Default is False.

  • mle_regression (bool) – Whether or not to use estimate the regression coefficients for the exogenous variables as part of maximum likelihood estimation or through the Kalman filter (i.e. recursive least squares). If time_varying_regression is True, this must be set to False. Default is True.

  • simple_differencing (bool) – Whether or not to use partially conditional maximum likelihood estimation. If True, differencing is performed prior to estimation, which discards the first \(s D + d\) initial rows but results in a smaller state-space formulation. See the Notes section for important details about interpreting results when this option is used. If False, the full SARIMAX model is put in state-space form so that all datapoints can be used in estimation. Default is False.

  • enforce_stationarity (bool) – Whether or not to transform the AR parameters to enforce stationarity in the autoregressive component of the model. Default is True.

  • enforce_invertibility (bool) – Whether or not to transform the MA parameters to enforce invertibility in the moving average component of the model. Default is True.

  • hamilton_representation (bool) – Whether or not to use the Hamilton representation of an ARMA process (if True) or the Harvey representation (if False). Default is False.

  • concentrate_scale (bool) – Whether or not to concentrate the scale (variance of the error term) out of the likelihood. This reduces the number of parameters estimated by maximum likelihood by one, but standard errors will then not be available for the scale parameter.

  • trend_offset (float) – The offset at which to start time trend values. Default is 1, so that if trend=’t’ the trend is equal to 1, 2, …, nobs. Typically is only set when the model created by extending a previous dataset.

  • use_exact_diffuse (bool) – Whether or not to use exact diffuse initialization for non-stationary states. Default is False (in which case approximate diffuse initialization is used).

  • dates (List[datetime] | None) – If no index is given by endog or exog, an array-like object of datetime objects can be provided.

  • freq (str | None) – If no index is given by endog or exog, the frequency of the time-series may be specified here as a Pandas offset or offset string.

  • missing (str) – Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.

  • validate_specification (bool) – If True, validation of hyperparameters is performed.

  • fit_params (Dict[str, Any] | None) – Additional parameters for statsmodels.tsa.statespace.sarimax.SARIMAX.fit For example, parameter dips=False disables logging.

  • **kwargs – Additional parameters for statsmodels.tsa.statespace.sarimax.SARIMAX.

Methods

fit(ts)

Fit model.

forecast(ts[, prediction_interval, ...])

Make predictions.

get_model()

Get internal models that are used inside etna class.

load(path)

Load an object.

params_to_tune()

Get default grid for tuning hyperparameters.

predict(ts[, prediction_interval, ...])

Make predictions with using true values as autoregression context if possible (teacher forcing).

save(path)

Save the object.

set_params(**params)

Return new object instance with modified parameters.

to_dict()

Collect all information about etna object in dict.

Attributes

This class stores its __init__ parameters as attributes.

context_size

Context size of the model.

fit(ts: TSDataset) PerSegmentModelMixin[source]#

Fit model.

Parameters:

ts (TSDataset) – Dataset with features

Returns:

Model after fit

Return type:

PerSegmentModelMixin

forecast(ts: TSDataset, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) TSDataset[source]#

Make predictions.

Parameters:
  • ts (TSDataset) – Dataset with features

  • prediction_interval (bool) – If True returns prediction interval for forecast

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% are taken to form a 95% prediction interval

  • return_components (bool) – If True additionally returns forecast components

Returns:

Dataset with predictions

Return type:

TSDataset

get_model() Dict[str, Any][source]#

Get internal models that are used inside etna class.

Internal model is a model that is used inside etna to forecast segments, e.g. catboost.CatBoostRegressor or sklearn.linear_model.Ridge.

Returns:

dictionary where key is segment and value is internal model

Return type:

Dict[str, Any]

classmethod load(path: Path) Self[source]#

Load an object.

Warning

This method uses dill module which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.

Parameters:

path (Path) – Path to load object from.

Returns:

Loaded object.

Return type:

Self

params_to_tune() Dict[str, BaseDistribution][source]#

Get default grid for tuning hyperparameters.

This grid tunes parameters: order.0, order.1, order.2, trend. If self.num_periods is greater than zero, then it also tunes parameters: seasonal_order.0, seasonal_order.1, seasonal_order.2. Other parameters are expected to be set by the user.

Returns:

Grid to tune.

Return type:

Dict[str, BaseDistribution]

predict(ts: TSDataset, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) TSDataset[source]#

Make predictions with using true values as autoregression context if possible (teacher forcing).

Parameters:
  • ts (TSDataset) – Dataset with features

  • prediction_interval (bool) – If True returns prediction interval for forecast

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% are taken to form a 95% prediction interval

  • return_components (bool) – If True additionally returns prediction components

Returns:

Dataset with predictions

Return type:

TSDataset

save(path: Path)[source]#

Save the object.

Parameters:

path (Path) – Path to save object to.

set_params(**params: dict) Self[source]#

Return new object instance with modified parameters.

Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a model in a Pipeline.

Nested parameters are expected to be in a <component_1>.<...>.<parameter> form, where components are separated by a dot.

Parameters:

**params (dict) – Estimator parameters

Returns:

New instance with changed parameters

Return type:

Self

Examples

>>> from etna.pipeline import Pipeline
>>> from etna.models import NaiveModel
>>> from etna.transforms import AddConstTransform
>>> model = NaiveModel(lag=1)
>>> transforms = [AddConstTransform(in_column="target", value=1)]
>>> pipeline = Pipeline(model, transforms=transforms, horizon=3)
>>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2})
Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
to_dict()[source]#

Collect all information about etna object in dict.

property context_size: int[source]#

Context size of the model. Determines how many history points do we ask to pass to the model.

Zero for this model.