etna.transforms.IForestOutlierTransform#
- class IForestOutlierTransform(in_column: str, ignore_flag_column: str | None = None, features_to_use: Sequence[str] | None = None, features_to_ignore: Sequence[str] | None = None, ignore_missing: bool = False, n_estimators: int = 100, max_samples: int | float | Literal['auto'] = 'auto', contamination: float | Literal['auto'] = 'auto', max_features: int | float = 1.0, bootstrap: bool = False, n_jobs: int | None = None, random_state: int | RandomState | None = None, verbose: int = 0)[source]#
Bases:
OutliersTransform
Transform that uses
get_anomalies_isolation_forest()
to find anomalies in data.Create instance of PredictionIntervalOutliersTransform.
- Parameters:
in_column (str) – Name of the column in which the anomaly is searching
ignore_flag_column (str | None) – Column name for skipping values from outlier check
features_to_use (Sequence[str] | None) – List of feature column names to use for anomaly detection
features_to_ignore (Sequence[str] | None) – List of feature column names to exclude from anomaly detection
ignore_missing (bool) – Whether to ignore missing values inside a series
n_estimators (int) – The number of base estimators in the ensemble
max_samples (int | float | Literal['auto']) –
- The number of samples to draw from X to train each base estimator
If int, then draw max_samples samples.
If float, then draw max_samples * X.shape[0] samples.
If “auto”, then max_samples=min(256, n_samples).
If max_samples is larger than the number of samples provided, all samples will be used for all trees (no sampling).
contamination (float | Literal['auto']) –
The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples.
If ‘auto’, the threshold is determined as in the original paper.
If float, the contamination should be in the range (0, 0.5].
- The number of features to draw from X to train each base estimator.
If int, then draw max_features features.
If float, then draw max(1, int(max_features * n_features_in_)) features.
Note: using a float number less than 1.0 or integer less than number of features will enable feature subsampling and leads to a longer runtime.
bootstrap (bool) –
If True, individual trees are fit on random subsets of the training data sampled with replacement.
If False, sampling without replacement is performed.
n_jobs (int | None) –
- The number of jobs to run in parallel for both fit and predict.
None means 1 unless in a joblib.parallel_backend context.
-1 means using all processors
random_state (int | RandomState | None) – Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest.
verbose (int) – Controls the verbosity of the tree building process.
Notes
To get more insights on parameters see documentation of Isolation Forest algorithm:
Documentation for Isolation Forest.
Methods
detect_outliers
(ts)Call
get_anomalies_isolation_forest()
function with self parameters.fit
(ts)Fit the transform.
fit_transform
(ts)Fit and transform TSDataset.
Return the list with regressors created by the transform.
Inverse transform TSDataset.
load
(path)Load an object.
Get default grid for tuning hyperparameters.
save
(path)Save the object.
set_params
(**params)Return new object instance with modified parameters.
to_dict
()Collect all information about etna object in dict.
transform
(ts)Transform TSDataset inplace.
Attributes
This class stores its
__init__
parameters as attributes.Backward compatibility property.
Backward compatibility property.
- detect_outliers(ts: TSDataset) Dict[str, Series] [source]#
Call
get_anomalies_isolation_forest()
function with self parameters.
- fit(ts: TSDataset) OutliersTransform [source]#
Fit the transform.
- Parameters:
ts (TSDataset) – Dataset to fit the transform on.
- Returns:
The fitted transform instance.
- Return type:
OutliersTransform
- fit_transform(ts: TSDataset) TSDataset [source]#
Fit and transform TSDataset.
May be reimplemented. But it is not recommended.
- inverse_transform(ts: TSDataset) TSDataset [source]#
Inverse transform TSDataset.
Apply the _inverse_transform method.
- classmethod load(path: Path) Self [source]#
Load an object.
Warning
This method uses
dill
module which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.- Parameters:
path (Path) – Path to load object from.
- Returns:
Loaded object.
- Return type:
Self
- params_to_tune() Dict[str, BaseDistribution] [source]#
Get default grid for tuning hyperparameters.
This grid tunes parameters:
n_estimators
,max_samples
,contamination
,max_features
,bootstrap
. Other parameters are expected to be set by the user.- Returns:
Grid to tune.
- Return type:
- set_params(**params: dict) Self [source]#
Return new object instance with modified parameters.
Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a
model
in aPipeline
.Nested parameters are expected to be in a
<component_1>.<...>.<parameter>
form, where components are separated by a dot.- Parameters:
**params (dict) – Estimator parameters
- Returns:
New instance with changed parameters
- Return type:
Self
Examples
>>> from etna.pipeline import Pipeline >>> from etna.models import NaiveModel >>> from etna.transforms import AddConstTransform >>> model = NaiveModel(lag=1) >>> transforms = [AddConstTransform(in_column="target", value=1)] >>> pipeline = Pipeline(model, transforms=transforms, horizon=3) >>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2}) Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )