Dataset API Reference#
Dataset classes are defined in chronocratic.datasets.datatypes and
re-exported from the package root. They provide PyTorch Dataset implementations
for time series data.
Base Datasets#
Abstract and mixin classes that define the core interface for time series datasets.
Forecasting Datasets#
Concrete dataset implementations for forecasting benchmarks.
Classification Datasets#
Concrete dataset implementations for classification benchmarks.
chronocratic.datasets.datatypes — Dataset type classes.
- class chronocratic.datasets.datatypes.ETTDataset(data: np.ndarray, seq_len: int, step: int, forecast_horizon: int, mode: TimeSeriesDatasetMode = TimeSeriesDatasetMode.INPUT_OUTPUT, transformations_sequence: tuple[Callable, ...]=(<function convert_numpy_to_tensor>, ))#
Bases:
FlexibleTimeSeriesDatasetSingleFilePyTorch Dataset for ETT forecasting (ETTh1/ETTh2/ETTm1/ETTm2).
Sliding-window dataset with forecast_horizon as the label target. Labels are derived from the data segment immediately following each input window (via ForecastingStrategySingleFile).
- Parameters:
data – 2-D numpy array of shape (time, features).
seq_len – Input window length.
step – Step between consecutive windows.
forecast_horizon – Number of future steps to predict.
transformations_sequence – Post-processing callables.
- Raises:
ValueError – If forecast_horizon is not positive.
- class chronocratic.datasets.datatypes.ElectricityDataset(data: np.ndarray, seq_len: int, step: int, mode: TimeSeriesDatasetMode, forecast_horizon: int, transformations_sequence: tuple[Callable, ...]=(<function convert_numpy_to_tensor>, ))#
Bases:
FlexibleTimeSeriesDatasetSingleFileMultipleSeriesPyTorch Dataset for Electricity forecasting.
Sliding-window dataset for the Electricity Load Diagrams benchmark. Handles 3D data of shape (num_clients, T, 1) where each client is an independent power consumption series. 370 clients total.
Raw CSV: (27340, 371) with MT_001 timestamp column. Post-transform: (370, 27340, 1).
- Parameters:
data – 3-D numpy array of shape (num_clients, T, 1).
seq_len – Input window length.
step – Step between consecutive windows.
mode – Dataset mode (e.g., TimeSeriesDatasetMode.SAMPLE_ONLY, TimeSeriesDatasetMode.INPUT_OUTPUT).
forecast_horizon – Number of future steps to predict.
transformations_sequence – Post-processing callables.
- Raises:
ValueError – If forecast_horizon is not positive.
- class chronocratic.datasets.datatypes.FixedTimeSeriesDatasetMultivariate(data: np.ndarray, labels: pd.Series | pd.DataFrame | None, mode: TimeSeriesDatasetMode, expand_dims_axis: int | None, transformations_sequence: list[Callable] | tuple[Callable, ...] | None = None)#
Bases:
FixedTimeSeriesDataset,ABCMultivariate classification dataset (UEA-style).
Each entry is a 3-D array (sample, timestep, feature).
- Parameters:
data – 3-D numpy array of shape (samples, timesteps, features).
labels – Optional label Series.
mode – Dataset mode.
expand_dims_axis – Dimension to expand.
transformations_sequence – Post-processing callables.
- class chronocratic.datasets.datatypes.FixedTimeSeriesDatasetUnivariate(data: pd.DataFrame, labels: pd.Series | pd.DataFrame | None, mode: TimeSeriesDatasetMode, expand_dims_axis: int | None, transformations_sequence: list[Callable] | tuple[Callable, ...] | None = None)#
Bases:
FixedTimeSeriesDataset,ABCUnivariate classification dataset (UCR-style).
Each row of the DataFrame is one time series.
- Parameters:
data – DataFrame of shape (samples, timesteps).
labels – Optional label Series.
mode – Dataset mode.
expand_dims_axis – Dimension to expand.
transformations_sequence – Post-processing callables.
- class chronocratic.datasets.datatypes.FlexibleTimeSeriesDatasetSingleFile(data: np.ndarray, labels: np.ndarray | None, seq_len: int, step: int, mode: TimeSeriesDatasetMode, sequence_handling_strategy: SequenceHandlingStrategySingleFile, expand_dims_axis: int | None = 1, transformations_sequence: list[Callable] | tuple[Callable, ...] | None=(<function convert_numpy_to_tensor>, ))#
Bases:
FlexibleTimeSeriesDatasetSliding-window dataset for a single continuous series.
- Parameters:
data – 1-D or 2-D numpy array.
labels – Optional label array.
seq_len – Window length.
step – Step between windows.
mode – Dataset mode.
sequence_handling_strategy – Label extraction strategy.
expand_dims_axis – Dimension to expand.
transformations_sequence – Post-processing callables.
- class chronocratic.datasets.datatypes.FlexibleTimeSeriesDatasetSingleFileMultipleSeries(data: np.ndarray, labels: np.ndarray | None, seq_len: int, step: int, mode: TimeSeriesDatasetMode, sequence_handling_strategy: SequenceHandlingStrategySingleFile, expand_dims_axis: int | None = None, transformations_sequence: list[Callable] | tuple[Callable, ...] | None=(<function convert_numpy_to_tensor>, ))#
Bases:
FlexibleTimeSeriesDatasetSliding-window dataset for a single file containing multiple series.
Handles 3D input arrays of shape (num_series, T, features) where each series is an independent time series. Uses
bisect+accumulateto map a global index to (series_index, window_index).- Parameters:
data – 3-D numpy array of shape (num_series, T, features).
labels – Optional label array.
seq_len – Window length.
step – Step between windows.
mode – Dataset mode.
sequence_handling_strategy – Single-file label strategy.
expand_dims_axis – Dimension to expand.
transformations_sequence – Post-processing callables.
- class chronocratic.datasets.datatypes.TimeSeriesDataset(data: ndarray | list[ndarray] | DataFrame, labels: ndarray | list[ndarray] | Series | DataFrame | None, mode: TimeSeriesDatasetMode, expand_dims_axis: int | None, transformations_sequence: list[Callable] | tuple[Callable, ...] | None = None)#
Bases:
Dataset[Any],ABCAbstract base for all time series datasets.
Supports three modes via mode-specific sample getters:
SAMPLE_ONLY(training)SAMPLE_LABEL(evaluation)INPUT_OUTPUT(input/target pairs)
- Parameters:
data – Raw time series data.
labels – Optional label array or Series.
mode – Determines the sample signature.
expand_dims_axis – Axis along which to expand data dimensions.
transformations_sequence – Post-processing callables.
- class chronocratic.datasets.datatypes.UCRClassificationUnivariateDataset(data: pd.DataFrame, labels: pd.Series | pd.DataFrame | None, mode: TimeSeriesDatasetMode, expand_dims_axis: int = 1, transformations_sequence: tuple[Callable, ...]=(<function convert_numpy_to_tensor>, ))#
Bases:
FixedTimeSeriesDatasetUnivariatePyTorch Dataset for UCR univariate classification.
Each row of the input DataFrame represents one time series sample. Defaults to expanding dimensions along axis=1 (producing shape
(1, timesteps)) and converting numpy arrays to tensors.- Parameters:
data – DataFrame of shape (samples, timesteps).
labels – Optional label Series.
mode – Dataset mode (with/without labels).
expand_dims_axis – Axis to expand dimensions on.
transformations_sequence – Post-processing callables.
- class chronocratic.datasets.datatypes.UEAClassificationMultivariateDataset(data: np.ndarray, labels: pd.Series | pd.DataFrame | None, mode: TimeSeriesDatasetMode, expand_dims_axis: int | None = None, transformations_sequence: tuple[Callable, ...]=(<function convert_numpy_to_tensor>, ))#
Bases:
FixedTimeSeriesDatasetMultivariatePyTorch Dataset for UEA multivariate classification.
Each sample is a 3-D array of shape (timesteps, features). No dimension expansion by default (expand_dims_axis=None) since the multivariate shape is already fully specified.
- Parameters:
data – 3-D numpy array of shape (samples, timesteps, features).
labels – Optional label Series.
mode – Dataset mode.
expand_dims_axis – Axis to expand (None for multivariate).
transformations_sequence – Post-processing callables.
- class chronocratic.datasets.datatypes.WeatherDataset(data: np.ndarray, seq_len: int, step: int, mode: TimeSeriesDatasetMode, forecast_horizon: int, transformations_sequence: tuple[Callable, ...]=(<function convert_numpy_to_tensor>, ))#
Bases:
FlexibleTimeSeriesDatasetSingleFilePyTorch Dataset for Weather forecasting.
Sliding-window dataset for the Weather benchmark (22 features, hourly granularity). Accepts pre-processed 2D data of shape (T, 22) where T = 52696 hourly steps.
- Parameters:
data – 2-D numpy array of shape (T, 22).
seq_len – Input window length.
step – Step between consecutive windows.
mode – Dataset mode (e.g., TimeSeriesDatasetMode.SAMPLE_ONLY, TimeSeriesDatasetMode.INPUT_OUTPUT).
forecast_horizon – Number of future steps to predict.
transformations_sequence – Post-processing callables.
- Raises:
ValueError – If forecast_horizon is not positive.