etna.datasets.TSDataset#
- class TSDataset(df: DataFrame, freq: str | None, df_exog: DataFrame | None = None, known_future: Literal['all'] | Sequence = (), hierarchical_structure: HierarchicalStructure | None = None)[source]#
Bases:
object
TSDataset is the main class to handle your time series data.
It prepares the series for exploration analyzing, implements feature generation with Transforms and generation of future points.
Notes
TSDataset supports custom indexing and slicing method. It maybe done through these interface:
TSDataset[timestamp, segment, column]
If at the start of the period dataset contains NaN those timestamps will be removed.During creation segment is casted to string type.
Examples
>>> from etna.datasets import generate_const_df >>> df = generate_const_df(periods=30, start_time="2021-06-01", n_segments=2, scale=1) >>> ts = TSDataset(df, "D") >>> ts["2021-06-01":"2021-06-07", "segment_0", "target"] timestamp 2021-06-01 1.0 2021-06-02 1.0 2021-06-03 1.0 2021-06-04 1.0 2021-06-05 1.0 2021-06-06 1.0 2021-06-07 1.0 Freq: D, Name: (segment_0, target), dtype: float64
>>> from etna.datasets import generate_ar_df >>> pd.options.display.float_format = '{:,.2f}'.format >>> df_to_forecast = generate_ar_df(100, start_time="2021-01-01", n_segments=1) >>> df_regressors = generate_ar_df(120, start_time="2021-01-01", n_segments=5) >>> df_regressors = df_regressors.pivot(index="timestamp", columns="segment").reset_index() >>> df_regressors.columns = ["timestamp"] + [f"regressor_{i}" for i in range(5)] >>> df_regressors["segment"] = "segment_0" >>> tsdataset = TSDataset(df=df_to_forecast, freq="D", df_exog=df_regressors, known_future="all") >>> tsdataset.df.head(5) segment segment_0 feature regressor_0 regressor_1 regressor_2 regressor_3 regressor_4 target timestamp 2021-01-01 1.62 -0.02 -0.50 -0.56 0.52 1.62 2021-01-02 1.01 -0.80 -0.81 0.38 -0.60 1.01 2021-01-03 0.48 0.47 -0.81 -1.56 -1.37 0.48 2021-01-04 -0.59 2.44 -2.21 -1.21 -0.69 -0.59 2021-01-05 0.28 0.58 -3.07 -1.45 0.77 0.28
>>> from etna.datasets import generate_hierarchical_df >>> pd.options.display.width = 0 >>> df = generate_hierarchical_df(periods=100, n_segments=[2, 4], start_time="2021-01-01",) >>> df, hierarchical_structure = TSDataset.to_hierarchical_dataset(df=df, level_columns=["level_0", "level_1"]) >>> tsdataset = TSDataset(df=df, freq="D", hierarchical_structure=hierarchical_structure) >>> tsdataset.df.head(5) segment l0s0_l1s3 l0s1_l1s0 l0s1_l1s1 l0s1_l1s2 feature target target target target timestamp 2021-01-01 2.07 1.62 -0.45 -0.40 2021-01-02 0.59 1.01 0.78 0.42 2021-01-03 -0.24 0.48 1.18 -0.14 2021-01-04 -1.12 -0.59 1.77 1.82 2021-01-05 -1.40 0.28 0.68 0.48
Init TSDataset.
- Parameters:
df (DataFrame) – dataframe with timeseries in a wide or long format:
DataFrameFormat
; it is expected thatdf
has feature named “target”freq (str | None) –
frequency of timestamp in df, possible values:
pandas offset aliases for datetime timestamp
None for integer timestamp
df_exog (DataFrame | None) – dataframe with exogenous data in a wide or long format:
DataFrameFormat
known_future (Literal['all'] | ~typing.Sequence) – columns in
df_exog[known_future]
that are regressors, if “all” value is given, all columns are meant to be regressorshierarchical_structure (HierarchicalStructure | None) – Structure of the levels in the hierarchy. If None, there is no hierarchical structure in the dataset.
Methods
add_columns_from_pandas
(df_update[, ...])Update the dataset with the new columns from pandas dataframe.
add_prediction_intervals
(prediction_intervals_df)Add target components into dataset.
add_target_components
(target_components_df)Add target components into dataset.
create_from_misaligned
(df, freq[, df_exog, ...])Make TSDataset from misaligned data by realigning it according to inferred alignment in
df
.describe
([segments])Overview of the dataset that returns a DataFrame.
drop_features
(features[, drop_from_exog])Drop columns with features from the dataset.
Drop prediction intervals from dataset.
Drop target components from dataset.
fit_transform
(transforms)Fit and apply given transforms to the data.
get_level_dataset
(target_level)Generate new TSDataset on target level.
Get
pandas.DataFrame
with prediction intervals.Get DataFrame with target components.
Check whether dataset has hierarchical structure.
head
([n_rows])Return the first
n_rows
rows.info
([segments])Overview of the dataset that prints the result.
inverse_transform
(transforms)Apply inverse transform method of transforms to the data.
isnull
()Return dataframe with flag that means if the correspondent object in
self.df
is null.Return names of the levels in the hierarchical structure.
make_future
(future_steps[, transforms, ...])Return new TSDataset with features extended into the future.
plot
([n_segments, column, segments, start, ...])Plot of random or chosen segments.
size
()Return size of TSDataset.
tail
([n_rows])Return the last
n_rows
rows.to_dataset
(df)Convert pandas dataframe to wide format.
to_flatten
(df[, features])Return pandas DataFrame in a long format.
to_hierarchical_dataset
(df, level_columns[, ...])Convert pandas dataframe from long hierarchical to ETNA Dataset format.
to_pandas
([flatten, features])Return pandas DataFrame.
to_torch_dataset
(make_samples[, dropna])Convert the TSDataset to a
torch.Dataset
.train_test_split
([train_start, train_end, ...])Split given df with train-test timestamp indices or size of test set.
transform
(transforms)Apply given transform to the data.
tsdataset_idx_slice
([start_idx, end_idx])Return new TSDataset with integer-location based indexing.
update_columns_from_pandas
(df_update)Update the existing columns in the dataset with the new values from pandas dataframe.
Attributes
Return columns of
self.df
.Get list of all features across all segments in dataset.
Shortcut for
pd.core.indexing.IndexSlice
Return TSDataset timestamp index.
Return self.df.loc method.
Get a tuple with prediction intervals names.
Get list of all regressors across all segments in dataset.
Get list of all segments in dataset.
Get tuple with target components names.
Get tuple with target quantiles names.
- add_columns_from_pandas(df_update: DataFrame, update_exog: bool = False, regressors: List[str] | None = None)[source]#
Update the dataset with the new columns from pandas dataframe.
Before updating columns in df, columns of df_update will be cropped by the last timestamp in df.
- Parameters:
df_update (DataFrame) – Dataframe with the new columns in wide ETNA format.
update_exog (bool) – If True, update columns also in df_exog. If you wish to add new regressors in the dataset it is recommended to turn on this flag.
regressors (List[str] | None) – List of regressors in the passed dataframe.
- add_prediction_intervals(prediction_intervals_df: DataFrame)[source]#
Add target components into dataset.
- Parameters:
prediction_intervals_df (DataFrame) – Dataframe in a wide format with prediction intervals
- Raises:
ValueError: – If dataset already contains prediction intervals
ValueError: – If prediction intervals names differ between segments
- add_target_components(target_components_df: DataFrame)[source]#
Add target components into dataset.
- Parameters:
target_components_df (DataFrame) – Dataframe in a wide format with target components
- Raises:
ValueError: – If dataset already contains target components
ValueError: – If target components names differ between segments
ValueError: – If components don’t sum up to target
- classmethod create_from_misaligned(df: DataFrame, freq: str | None, df_exog: DataFrame | None = None, known_future: Literal['all'] | Sequence = (), future_steps: int = 1, original_timestamp_name: str = 'external_timestamp') TSDataset [source]#
Make TSDataset from misaligned data by realigning it according to inferred alignment in
df
.This method: - Infers alignment using
infer_alignment()
; - Realignsdf
anddf_exog
using inferred alignment usingapply_alignment()
; - Creates exog feature with original timestamp usingmake_timestamp_df_from_alignment()
; - Creates TSDataset from these data.This method doesn’t work with
hierarchical_structure
, because it doesn’t make much sense.- Parameters:
df (DataFrame) – dataframe with timeseries in a long format:
DataFrameFormat
; it is expected thatdf
has feature named “target”freq (str | None) –
frequency of timestamp in df, possible values:
pandas offset aliases for datetime timestamp
None for integer timestamp
df_exog (DataFrame | None) – dataframe with exogenous data in a long format:
DataFrameFormat
known_future (Literal['all'] | ~typing.Sequence) – columns in
df_exog[known_future]
that are regressors, if “all” value is given, all columns are meant to be regressorsfuture_steps (int) – determines on how many steps original timestamp should be extended into the future before adding into
df_exog
; expected to be positiveoriginal_timestamp_name (str) – name for original timestamp column to add it into
df_exog
- Returns:
Created TSDataset.
- Raises:
ValueError: – If
future_steps
is not positive.ValueError: – If
original_timestamp_name
intersects with columns indf_exog
.ValueError: – Parameter
df
isn’t in a long format.ValueError: – Parameter
df_exog
isn’t in a long format if it set.
- Return type:
- describe(segments: Sequence[str] | None = None) DataFrame [source]#
Overview of the dataset that returns a DataFrame.
Method describes dataset in segment-wise fashion. Description columns:
start_timestamp: beginning of the segment, missing values in the beginning are ignored
end_timestamp: ending of the segment, missing values in the ending are ignored
length: length according to
start_timestamp
andend_timestamp
num_missing: number of missing variables between
start_timestamp
andend_timestamp
num_segments: total number of segments, common for all segments
num_exogs: number of exogenous features, common for all segments
num_regressors: number of exogenous factors, that are regressors, common for all segments
num_known_future: number of regressors, that are known since creation, common for all segments
freq: frequency of the series, common for all segments
- Parameters:
segments (Sequence[str] | None) – segments to show in overview, if None all segments are shown.
- Returns:
result_table – table with results of the overview
- Return type:
pd.DataFrame
Examples
>>> from etna.datasets import generate_const_df >>> pd.options.display.expand_frame_repr = False >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> regressors_timestamp = pd.date_range(start="2021-06-01", periods=50) >>> df_regressors_1 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 1, "segment": "segment_0"} ... ) >>> df_regressors_2 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 2, "segment": "segment_1"} ... ) >>> df_exog = pd.concat([df_regressors_1, df_regressors_2], ignore_index=True) >>> ts = TSDataset(df, df_exog=df_exog, freq="D", known_future="all") >>> ts.describe() start_timestamp end_timestamp length num_missing num_segments num_exogs num_regressors num_known_future freq segments segment_0 2021-06-01 2021-06-30 30 0 2 1 1 1 D segment_1 2021-06-01 2021-06-30 30 0 2 1 1 1 D
- drop_features(features: List[str], drop_from_exog: bool = False)[source]#
Drop columns with features from the dataset.
- Parameters:
- Raises:
ValueError: – If
features
list contains target or target components
- fit_transform(transforms: Sequence[Transform])[source]#
Fit and apply given transforms to the data.
- Parameters:
transforms (Sequence[Transform]) –
- get_prediction_intervals() DataFrame | None [source]#
Get
pandas.DataFrame
with prediction intervals.- Returns:
pandas.DataFrame
with prediction intervals for target variable.- Return type:
DataFrame | None
- get_target_components() DataFrame | None [source]#
Get DataFrame with target components.
- Returns:
Dataframe with target components
- Return type:
DataFrame | None
- head(n_rows: int = 5) DataFrame [source]#
Return the first
n_rows
rows.Mimics pandas method.
This function returns the first
n_rows
rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.For negative values of
n_rows
, this function returns all rows except the lastn_rows
rows, equivalent todf[:-n_rows]
.- Parameters:
n_rows (int) – number of rows to select.
- Returns:
the first
n_rows
rows or 5 by default.- Return type:
pd.DataFrame
- info(segments: Sequence[str] | None = None) None [source]#
Overview of the dataset that prints the result.
Method describes dataset in segment-wise fashion.
Information about dataset in general:
num_segments: total number of segments
num_exogs: number of exogenous features
num_regressors: number of exogenous factors, that are regressors
num_known_future: number of regressors, that are known since creation
freq: frequency of the dataset
Information about individual segments:
start_timestamp: beginning of the segment, missing values in the beginning are ignored
end_timestamp: ending of the segment, missing values in the ending are ignored
length: length according to
start_timestamp
andend_timestamp
num_missing: number of missing variables between
start_timestamp
andend_timestamp
- Parameters:
segments (Sequence[str] | None) – segments to show in overview, if None all segments are shown.
- Return type:
None
Examples
>>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> regressors_timestamp = pd.date_range(start="2021-06-01", periods=50) >>> df_regressors_1 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 1, "segment": "segment_0"} ... ) >>> df_regressors_2 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 2, "segment": "segment_1"} ... ) >>> df_exog = pd.concat([df_regressors_1, df_regressors_2], ignore_index=True) >>> ts = TSDataset(df, df_exog=df_exog, freq="D", known_future="all") >>> ts.info() <class 'etna.datasets.TSDataset'> num_segments: 2 num_exogs: 1 num_regressors: 1 num_known_future: 1 freq: D start_timestamp end_timestamp length num_missing segments segment_0 2021-06-01 2021-06-30 30 0 segment_1 2021-06-01 2021-06-30 30 0
- inverse_transform(transforms: Sequence[Transform])[source]#
Apply inverse transform method of transforms to the data.
Applied in reversed order.
- Parameters:
transforms (Sequence[Transform]) –
- isnull() DataFrame [source]#
Return dataframe with flag that means if the correspondent object in
self.df
is null.- Returns:
is_null dataframe
- Return type:
pd.Dataframe
- make_future(future_steps: int, transforms: Sequence[Transform] = (), tail_steps: int = 0) TSDataset [source]#
Return new TSDataset with features extended into the future.
The result dataset doesn’t contain quantiles and target components.
- Parameters:
- Returns:
dataset with features extended into the.
- Return type:
Examples
>>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> df_regressors = pd.DataFrame({ ... "timestamp": list(pd.date_range("2021-06-01", periods=40))*2, ... "regressor_1": np.arange(80), "regressor_2": np.arange(80) + 5, ... "segment": ["segment_0"]*40 + ["segment_1"]*40 ... }) >>> ts = TSDataset( ... df, "D", df_exog=df_regressors, known_future="all" ... ) >>> ts.make_future(4) segment segment_0 segment_1 feature regressor_1 regressor_2 target regressor_1 regressor_2 target timestamp 2021-07-01 30 35 NaN 70 75 NaN 2021-07-02 31 36 NaN 71 76 NaN 2021-07-03 32 37 NaN 72 77 NaN 2021-07-04 33 38 NaN 73 78 NaN
- plot(n_segments: int = 10, column: str = 'target', segments: Sequence[str] | None = None, start: Timestamp | int | str | None = None, end: Timestamp | int | str | None = None, seed: int = 1, figsize: Tuple[int, int] = (10, 5))[source]#
Plot of random or chosen segments.
- Parameters:
n_segments (int) – number of random segments to plot
column (str) – feature to plot
seed (int) – seed for local random state
start (Timestamp | int | str | None) – start plot from this timestamp
end (Timestamp | int | str | None) – end plot at this timestamp
figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches
- Raises:
ValueError: – Incorrect type of
start
orend
is used according tofreq
- size() Tuple[int, int, int | None] [source]#
Return size of TSDataset.
The order of sizes is (number of time series, number of segments, and number of features (if their amounts are equal in each segment; otherwise, returns None)).
- tail(n_rows: int = 5) DataFrame [source]#
Return the last
n_rows
rows.Mimics pandas method.
This function returns last
n_rows
rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.For negative values of
n_rows
, this function returns all rows except the first n rows, equivalent todf[n_rows:]
.- Parameters:
n_rows (int) – number of rows to select.
- Returns:
the last
n_rows
rows or 5 by default.- Return type:
pd.DataFrame
- static to_dataset(df: DataFrame) DataFrame [source]#
Convert pandas dataframe to wide format.
Columns “timestamp” and “segment” are required.
- Parameters:
df (DataFrame) – DataFrame with columns [“timestamp”, “segment”]. Other columns considered features. Columns “timestamp” is expected to be one of two types: integer or timestamp.
- Return type:
Notes
During conversion segment is casted to string type.
Examples
>>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> df.head(5) timestamp segment target 0 2021-06-01 segment_0 1.00 1 2021-06-02 segment_0 1.00 2 2021-06-03 segment_0 1.00 3 2021-06-04 segment_0 1.00 4 2021-06-05 segment_0 1.00 >>> df_wide = TSDataset.to_dataset(df) >>> df_wide.head(5) segment segment_0 segment_1 feature target target timestamp 2021-06-01 1.00 1.00 2021-06-02 1.00 1.00 2021-06-03 1.00 1.00 2021-06-04 1.00 1.00 2021-06-05 1.00 1.00
>>> df_regressors = pd.DataFrame({ ... "timestamp": pd.date_range("2021-01-01", periods=10), ... "regressor_1": np.arange(10), "regressor_2": np.arange(10) + 5, ... "segment": ["segment_0"]*10 ... }) >>> TSDataset.to_dataset(df_regressors).head(5) segment segment_0 feature regressor_1 regressor_2 timestamp 2021-01-01 0 5 2021-01-02 1 6 2021-01-03 2 7 2021-01-04 3 8 2021-01-05 4 9
- static to_flatten(df: DataFrame, features: Literal['all'] | Sequence[str] = 'all') DataFrame [source]#
Return pandas DataFrame in a long format.
The order of columns is (timestamp, segment, target, features in alphabetical order).
- Parameters:
- Returns:
dataframe with TSDataset data
- Return type:
pd.DataFrame
Examples
>>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> df.head(5) timestamp segment target 0 2021-06-01 segment_0 1.00 1 2021-06-02 segment_0 1.00 2 2021-06-03 segment_0 1.00 3 2021-06-04 segment_0 1.00 4 2021-06-05 segment_0 1.00 >>> df_wide = TSDataset.to_dataset(df) >>> TSDataset.to_flatten(df_wide).head(5) timestamp segment target 0 2021-06-01 segment_0 1.0 1 2021-06-02 segment_0 1.0 2 2021-06-03 segment_0 1.0 3 2021-06-04 segment_0 1.0 4 2021-06-05 segment_0 1.0
- static to_hierarchical_dataset(df: DataFrame, level_columns: List[str], keep_level_columns: bool = False, sep: str = '_', return_hierarchy: bool = True) Tuple[DataFrame, HierarchicalStructure | None] [source]#
Convert pandas dataframe from long hierarchical to ETNA Dataset format.
- Parameters:
df (DataFrame) – Dataframe in long hierarchical format with columns [timestamp, target] + [level_columns] + [other_columns]
level_columns (List[str]) – Columns of dataframe defines the levels in the hierarchy in order from top to bottom i.e [level_name_1, level_name_2, …]. Names of the columns will be used as names of the levels in hierarchy.
keep_level_columns (bool) – If true, leave the level columns in the result dataframe. By default level columns are concatenated into “segment” column and dropped
sep (str) – String to concatenated the level names with
return_hierarchy (bool) – If true, returns the hierarchical structure
- Returns:
Dataframe in wide format and optionally hierarchical structure
- Raises:
ValueError – If
level_columns
is empty- Return type:
Tuple[DataFrame, HierarchicalStructure | None]
- to_pandas(flatten: bool = False, features: Literal['all'] | Sequence[str] = 'all') DataFrame [source]#
Return pandas DataFrame.
- Parameters:
flatten (bool) –
If False, return dataframe in a wide format
If True, return dataframe in a long format, its order of columns is (timestamp, segment, target, features in alphabetical order).
features (Literal['all'] | ~typing.Sequence[str]) – List of features to return. If “all”, return all the features in the dataset.
- Returns:
dataframe with TSDataset data
- Return type:
pd.DataFrame
Examples
>>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> df.head(5) timestamp segment target 0 2021-06-01 segment_0 1.00 1 2021-06-02 segment_0 1.00 2 2021-06-03 segment_0 1.00 3 2021-06-04 segment_0 1.00 4 2021-06-05 segment_0 1.00 >>> ts = TSDataset(df, "D") >>> ts.to_pandas(True).head(5) timestamp segment target 0 2021-06-01 segment_0 1.00 1 2021-06-02 segment_0 1.00 2 2021-06-03 segment_0 1.00 3 2021-06-04 segment_0 1.00 4 2021-06-05 segment_0 1.00 >>> ts.to_pandas(False).head(5) segment segment_0 segment_1 feature target target timestamp 2021-06-01 1.00 1.00 2021-06-02 1.00 1.00 2021-06-03 1.00 1.00 2021-06-04 1.00 1.00 2021-06-05 1.00 1.00
- to_torch_dataset(make_samples: Callable[[DataFrame], Iterator[dict] | Iterable[dict]], dropna: bool = True) Dataset [source]#
Convert the TSDataset to a
torch.Dataset
.
- train_test_split(train_start: Timestamp | int | str | None = None, train_end: Timestamp | int | str | None = None, test_start: Timestamp | int | str | None = None, test_end: Timestamp | int | str | None = None, test_size: int | None = None) Tuple[TSDataset, TSDataset] [source]#
Split given df with train-test timestamp indices or size of test set.
In case of inconsistencies between
test_size
and (test_start
,test_end
),test_size
is ignored- Parameters:
train_start (Timestamp | int | str | None) – start timestamp of new train dataset, if None first timestamp is used
train_end (Timestamp | int | str | None) – end timestamp of new train dataset, if None previous to
test_start
timestamp is usedtest_start (Timestamp | int | str | None) – start timestamp of new test dataset, if None next to
train_end
timestamp is usedtest_end (Timestamp | int | str | None) – end timestamp of new test dataset, if None last timestamp is used
test_size (int | None) – number of timestamps to use in test set
- Returns:
generated datasets
- Return type:
train, test
- Raises:
ValueError: – Incorrect type of
train_start
ortrain_end
ortest_start
ortest_end
is used according tots.freq
Examples
>>> from etna.datasets import generate_ar_df >>> pd.options.display.float_format = '{:,.2f}'.format >>> df = generate_ar_df(100, start_time="2021-01-01", n_segments=3) >>> ts = TSDataset(df, "D") >>> train_ts, test_ts = ts.train_test_split( ... train_start="2021-01-01", train_end="2021-02-01", ... test_start="2021-02-02", test_end="2021-02-07" ... ) >>> train_ts.df.tail(5) segment segment_0 segment_1 segment_2 feature target target target timestamp 2021-01-28 -2.06 2.03 1.51 2021-01-29 -2.33 0.83 0.81 2021-01-30 -1.80 1.69 0.61 2021-01-31 -2.49 1.51 0.85 2021-02-01 -2.89 0.91 1.06 >>> test_ts.df.head(5) segment segment_0 segment_1 segment_2 feature target target target timestamp 2021-02-02 -3.57 -0.32 1.72 2021-02-03 -4.42 0.23 3.51 2021-02-04 -5.09 1.02 3.39 2021-02-05 -5.10 0.40 2.15 2021-02-06 -6.22 0.92 0.97
- transform(transforms: Sequence[Transform])[source]#
Apply given transform to the data.
- Parameters:
transforms (Sequence[Transform]) –
- tsdataset_idx_slice(start_idx: int | None = None, end_idx: int | None = None) TSDataset [source]#
Return new TSDataset with integer-location based indexing.
- update_columns_from_pandas(df_update: DataFrame)[source]#
Update the existing columns in the dataset with the new values from pandas dataframe.
Before updating columns in
df
, columns ofdf_update
will be cropped by the last timestamp indf
. Columns indf_exog
are not updated. If you wish to update thedf_exog
, create the new instance of TSDataset.- Parameters:
df_update (DataFrame) – Dataframe with new values in wide ETNA format.
- property columns: MultiIndex[source]#
Return columns of
self.df
.- Returns:
multiindex of dataframe with target and features.
- Return type:
pd.core.indexes.multi.MultiIndex
- property features: List[str][source]#
Get list of all features across all segments in dataset.
All features include initial exogenous data, generated features, target, target components, prediction intervals. The order of features in returned list isn’t specified.
If different segments have different subset of features, then the union of features is returned.
- Returns:
List of features.
- property index: Index[source]#
Return TSDataset timestamp index.
- Returns:
timestamp index of TSDataset
- property loc: _LocIndexer[source]#
Return self.df.loc method.
- Returns:
dataframe with self.df.loc[…]
- Return type:
pd.core.indexing._LocIndexer
- property prediction_intervals_names: Tuple[str, ...][source]#
Get a tuple with prediction intervals names. Return an empty tuple in the case of intervals absence.
- property regressors: List[str][source]#
Get list of all regressors across all segments in dataset.
Examples
>>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> regressors_timestamp = pd.date_range(start="2021-06-01", periods=50) >>> df_regressors_1 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 1, "segment": "segment_0"} ... ) >>> df_regressors_2 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 2, "segment": "segment_1"} ... ) >>> df_exog = pd.concat([df_regressors_1, df_regressors_2], ignore_index=True) >>> ts = TSDataset( ... df, df_exog=df_exog, freq="D", known_future="all" ... ) >>> ts.regressors ['regressor_1']
- property segments: List[str][source]#
Get list of all segments in dataset.
Examples
>>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> ts = TSDataset(df, "D") >>> ts.segments ['segment_0', 'segment_1']