Dataset API Reference#

Dataset classes are defined in chronocratic.datasets.datatypes and re-exported from the package root. They provide PyTorch Dataset implementations for time series data.

Base Datasets#

Abstract and mixin classes that define the core interface for time series datasets.

Forecasting Datasets#

Concrete dataset implementations for forecasting benchmarks.

Classification Datasets#

Concrete dataset implementations for classification benchmarks.

chronocratic.datasets.datatypes — Dataset type classes.

class chronocratic.datasets.datatypes.ETTDataset(data: np.ndarray, seq_len: int, step: int, forecast_horizon: int, mode: TimeSeriesDatasetMode = TimeSeriesDatasetMode.INPUT_OUTPUT, transformations_sequence: tuple[Callable, ...]=(<function convert_numpy_to_tensor>, ))#

Bases: FlexibleTimeSeriesDatasetSingleFile

PyTorch Dataset for ETT forecasting (ETTh1/ETTh2/ETTm1/ETTm2).

Sliding-window dataset with forecast_horizon as the label target. Labels are derived from the data segment immediately following each input window (via ForecastingStrategySingleFile).

Parameters:
  • data – 2-D numpy array of shape (time, features).

  • seq_len – Input window length.

  • step – Step between consecutive windows.

  • forecast_horizon – Number of future steps to predict.

  • transformations_sequence – Post-processing callables.

Raises:

ValueError – If forecast_horizon is not positive.

class chronocratic.datasets.datatypes.ElectricityDataset(data: np.ndarray, seq_len: int, step: int, mode: TimeSeriesDatasetMode, forecast_horizon: int, transformations_sequence: tuple[Callable, ...]=(<function convert_numpy_to_tensor>, ))#

Bases: FlexibleTimeSeriesDatasetSingleFileMultipleSeries

PyTorch Dataset for Electricity forecasting.

Sliding-window dataset for the Electricity Load Diagrams benchmark. Handles 3D data of shape (num_clients, T, 1) where each client is an independent power consumption series. 370 clients total.

Raw CSV: (27340, 371) with MT_001 timestamp column. Post-transform: (370, 27340, 1).

Parameters:
  • data – 3-D numpy array of shape (num_clients, T, 1).

  • seq_len – Input window length.

  • step – Step between consecutive windows.

  • mode – Dataset mode (e.g., TimeSeriesDatasetMode.SAMPLE_ONLY, TimeSeriesDatasetMode.INPUT_OUTPUT).

  • forecast_horizon – Number of future steps to predict.

  • transformations_sequence – Post-processing callables.

Raises:

ValueError – If forecast_horizon is not positive.

class chronocratic.datasets.datatypes.FixedTimeSeriesDatasetMultivariate(data: np.ndarray, labels: pd.Series | pd.DataFrame | None, mode: TimeSeriesDatasetMode, expand_dims_axis: int | None, transformations_sequence: list[Callable] | tuple[Callable, ...] | None = None)#

Bases: FixedTimeSeriesDataset, ABC

Multivariate classification dataset (UEA-style).

Each entry is a 3-D array (sample, timestep, feature).

Parameters:
  • data – 3-D numpy array of shape (samples, timesteps, features).

  • labels – Optional label Series.

  • mode – Dataset mode.

  • expand_dims_axis – Dimension to expand.

  • transformations_sequence – Post-processing callables.

class chronocratic.datasets.datatypes.FixedTimeSeriesDatasetUnivariate(data: pd.DataFrame, labels: pd.Series | pd.DataFrame | None, mode: TimeSeriesDatasetMode, expand_dims_axis: int | None, transformations_sequence: list[Callable] | tuple[Callable, ...] | None = None)#

Bases: FixedTimeSeriesDataset, ABC

Univariate classification dataset (UCR-style).

Each row of the DataFrame is one time series.

Parameters:
  • data – DataFrame of shape (samples, timesteps).

  • labels – Optional label Series.

  • mode – Dataset mode.

  • expand_dims_axis – Dimension to expand.

  • transformations_sequence – Post-processing callables.

class chronocratic.datasets.datatypes.FlexibleTimeSeriesDatasetSingleFile(data: np.ndarray, labels: np.ndarray | None, seq_len: int, step: int, mode: TimeSeriesDatasetMode, sequence_handling_strategy: SequenceHandlingStrategySingleFile, expand_dims_axis: int | None = 1, transformations_sequence: list[Callable] | tuple[Callable, ...] | None=(<function convert_numpy_to_tensor>, ))#

Bases: FlexibleTimeSeriesDataset

Sliding-window dataset for a single continuous series.

Parameters:
  • data – 1-D or 2-D numpy array.

  • labels – Optional label array.

  • seq_len – Window length.

  • step – Step between windows.

  • mode – Dataset mode.

  • sequence_handling_strategy – Label extraction strategy.

  • expand_dims_axis – Dimension to expand.

  • transformations_sequence – Post-processing callables.

class chronocratic.datasets.datatypes.FlexibleTimeSeriesDatasetSingleFileMultipleSeries(data: np.ndarray, labels: np.ndarray | None, seq_len: int, step: int, mode: TimeSeriesDatasetMode, sequence_handling_strategy: SequenceHandlingStrategySingleFile, expand_dims_axis: int | None = None, transformations_sequence: list[Callable] | tuple[Callable, ...] | None=(<function convert_numpy_to_tensor>, ))#

Bases: FlexibleTimeSeriesDataset

Sliding-window dataset for a single file containing multiple series.

Handles 3D input arrays of shape (num_series, T, features) where each series is an independent time series. Uses bisect + accumulate to map a global index to (series_index, window_index).

Parameters:
  • data – 3-D numpy array of shape (num_series, T, features).

  • labels – Optional label array.

  • seq_len – Window length.

  • step – Step between windows.

  • mode – Dataset mode.

  • sequence_handling_strategy – Single-file label strategy.

  • expand_dims_axis – Dimension to expand.

  • transformations_sequence – Post-processing callables.

class chronocratic.datasets.datatypes.TimeSeriesDataset(data: ndarray | list[ndarray] | DataFrame, labels: ndarray | list[ndarray] | Series | DataFrame | None, mode: TimeSeriesDatasetMode, expand_dims_axis: int | None, transformations_sequence: list[Callable] | tuple[Callable, ...] | None = None)#

Bases: Dataset[Any], ABC

Abstract base for all time series datasets.

Supports three modes via mode-specific sample getters:

  • SAMPLE_ONLY (training)

  • SAMPLE_LABEL (evaluation)

  • INPUT_OUTPUT (input/target pairs)

Parameters:
  • data – Raw time series data.

  • labels – Optional label array or Series.

  • mode – Determines the sample signature.

  • expand_dims_axis – Axis along which to expand data dimensions.

  • transformations_sequence – Post-processing callables.

class chronocratic.datasets.datatypes.UCRClassificationUnivariateDataset(data: pd.DataFrame, labels: pd.Series | pd.DataFrame | None, mode: TimeSeriesDatasetMode, expand_dims_axis: int = 1, transformations_sequence: tuple[Callable, ...]=(<function convert_numpy_to_tensor>, ))#

Bases: FixedTimeSeriesDatasetUnivariate

PyTorch Dataset for UCR univariate classification.

Each row of the input DataFrame represents one time series sample. Defaults to expanding dimensions along axis=1 (producing shape (1, timesteps)) and converting numpy arrays to tensors.

Parameters:
  • data – DataFrame of shape (samples, timesteps).

  • labels – Optional label Series.

  • mode – Dataset mode (with/without labels).

  • expand_dims_axis – Axis to expand dimensions on.

  • transformations_sequence – Post-processing callables.

class chronocratic.datasets.datatypes.UEAClassificationMultivariateDataset(data: np.ndarray, labels: pd.Series | pd.DataFrame | None, mode: TimeSeriesDatasetMode, expand_dims_axis: int | None = None, transformations_sequence: tuple[Callable, ...]=(<function convert_numpy_to_tensor>, ))#

Bases: FixedTimeSeriesDatasetMultivariate

PyTorch Dataset for UEA multivariate classification.

Each sample is a 3-D array of shape (timesteps, features). No dimension expansion by default (expand_dims_axis=None) since the multivariate shape is already fully specified.

Parameters:
  • data – 3-D numpy array of shape (samples, timesteps, features).

  • labels – Optional label Series.

  • mode – Dataset mode.

  • expand_dims_axis – Axis to expand (None for multivariate).

  • transformations_sequence – Post-processing callables.

class chronocratic.datasets.datatypes.WeatherDataset(data: np.ndarray, seq_len: int, step: int, mode: TimeSeriesDatasetMode, forecast_horizon: int, transformations_sequence: tuple[Callable, ...]=(<function convert_numpy_to_tensor>, ))#

Bases: FlexibleTimeSeriesDatasetSingleFile

PyTorch Dataset for Weather forecasting.

Sliding-window dataset for the Weather benchmark (22 features, hourly granularity). Accepts pre-processed 2D data of shape (T, 22) where T = 52696 hourly steps.

Parameters:
  • data – 2-D numpy array of shape (T, 22).

  • seq_len – Input window length.

  • step – Step between consecutive windows.

  • mode – Dataset mode (e.g., TimeSeriesDatasetMode.SAMPLE_ONLY, TimeSeriesDatasetMode.INPUT_OUTPUT).

  • forecast_horizon – Number of future steps to predict.

  • transformations_sequence – Post-processing callables.

Raises:

ValueError – If forecast_horizon is not positive.