etna.analysis.get_anomalies_isolation_forest#
- get_anomalies_isolation_forest(ts: TSDataset, in_column: str = 'target', features_to_use: Sequence[str] | None = None, features_to_ignore: Sequence[str] | None = None, ignore_missing: bool = False, n_estimators: int = 100, max_samples: int | float | Literal['auto'] = 'auto', contamination: float | Literal['auto'] = 'auto', max_features: int | float = 1.0, bootstrap: bool = False, n_jobs: int | None = None, random_state: int | RandomState | None = None, verbose: int = 0, index_only: bool = True) Dict[str, List[Timestamp] | List[int] | Series] [source]#
Get point outliers in time series using Isolation Forest algorithm.
Documentation for Isolation Forest.
- Parameters:
ts (TSDataset) – TSDataset with timeseries data
in_column (str) – Name of the column in which the anomaly is searching
features_to_use (Sequence[str] | None) – List of feature column names to use for anomaly detection
features_to_ignore (Sequence[str] | None) – List of feature column names to exclude from anomaly detection
ignore_missing (bool) – Whether to ignore missing values inside a series
n_estimators (int) – The number of base estimators in the ensemble
max_samples (int | float | Literal['auto']) –
- The number of samples to draw from X to train each base estimator
If int, then draw max_samples samples.
If float, then draw max_samples * X.shape[0] samples.
If “auto”, then max_samples=min(256, n_samples).
If max_samples is larger than the number of samples provided, all samples will be used for all trees (no sampling).
contamination (float | Literal['auto']) –
The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples.
If ‘auto’, the threshold is determined as in the original paper.
If float, the contamination should be in the range (0, 0.5].
- The number of features to draw from X to train each base estimator.
If int, then draw max_features features.
If float, then draw max(1, int(max_features * n_features_in_)) features.
Note: using a float number less than 1.0 or integer less than number of features will enable feature subsampling and leads to a longer runtime.
bootstrap (bool) –
If True, individual trees are fit on random subsets of the training data sampled with replacement.
If False, sampling without replacement is performed.
n_jobs (int | None) –
- The number of jobs to run in parallel for both fit and predict.
None means 1 unless in a joblib.parallel_backend context.
-1 means using all processors
random_state (int | RandomState | None) – Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest.
verbose (int) – Controls the verbosity of the tree building process.
index_only (bool) – whether to return only outliers indices. If False will return outliers series
- Returns:
dict of outliers in format {segment: [outliers_timestamps]}
- Return type: