Forecasting Modules#

The forecasting module provides data loaders for popular multivariate time series forecasting benchmarks, built on top of PyTorch Lightning’s LightningDataModule. Each module handles data loading, splitting, caching, and provides ready-to-use DataLoaders for training, validation, and testing.

For the full API of all forecasting data modules, see the DataModule API Reference reference.

Weather Module#

The Weather dataset contains 22 meteorological features recorded hourly over 7 years (2012-2017). It uses fixed 60/20/20 temporal splits (train/validation/test).

from pathlib import Path

from chronocratic.datasets import ForecastingMode, WeatherModule

module = WeatherModule(
    dataset_file_path=Path("data/weather.csv"),
    mode=ForecastingMode.MULTIVARIATE,
    seq_len=24,
    forecast_horizon=168,
    scale_data=True,
    batch_size=32,
)

module.prepare_data()
module.setup()
train_loader = module.train_dataloader()

In univariate mode, only the last column (WetBulbCelsius) is retained as the target variable. In multivariate mode, all 22 features are used.

See the WeatherModule API reference for all constructor parameters.

ETT Data Module#

The ETT (Electricity Transformer Temperature) dataset contains two frequency variants:

  • ETTh1, ETTh2 – Hourly temperature data (17,420 timesteps)

  • ETTm1, ETTm2 – 15-minute temperature data (69,680 timesteps)

The module requires an explicit variant parameter to determine the correct 16-month / 4-month / 4-month train/validation/test split boundaries.

from pathlib import Path

from chronocratic.datasets import ETTDataModule, ForecastingMode

module = ETTDataModule(
    dataset_file_path=Path("data/ETTm1.csv"),
    variant="ETTm1",
    mode=ForecastingMode.UNIVARIATE,
    seq_len=96,
    forecast_horizon=96,
    scale_data=True,
    batch_size=32,
)

module.prepare_data()
module.setup()

In univariate mode, only the OT (outer temperature) column is used.

See the ETTDataModule API reference for all constructor parameters.

Electricity Load Module#

The Electricity dataset contains hourly power consumption data for 370 independent customers over 3 years (2012-2014). Each customer is treated as a separate time series (not as a feature), producing shape (370, T, 1) after transformation. It uses fixed 60/20/20 temporal splits.

from pathlib import Path

from chronocratic.datasets import ElectricityLoadModule, ForecastingMode

module = ElectricityLoadModule(
    dataset_file_path=Path("data/electricity.csv"),
    mode=ForecastingMode.MULTIVARIATE,
    seq_len=96,
    forecast_horizon=24,
    scale_data=True,
    batch_size=32,
)

module.prepare_data()
module.setup()

In univariate mode, only customer MT_001 is retained.

See the ElectricityLoadModule API reference for all constructor parameters.

Dataset Classes#

Under the hood, the data modules use these PyTorch Dataset classes:

See the Dataset API Reference reference for full class documentation.

Loader Mode#

Forecasting modules support multiple loader modes via ForecastingLoaderMode:

  • RAW_SERIES – Returns the full raw time series (default)

  • INPUT_TARGET – Returns input and target tensors for supervised learning

  • INPUT_ONLY – Returns only the input tensor without targets

Set this on the train_dataloader(), val_dataloader(), and test_dataloader() calls via the loader_mode keyword argument.

# Supervised learning format: (input_window, target_window)
train_loader = module.train_dataloader(loader_mode=ForecastingLoaderMode.INPUT_TARGET)

Forecasting Mode#

Control variable selection with ForecastingMode:

  • UNIVARIATE – Use a single target variable per sample

  • MULTIVARIATE – Use all available variables per sample

Set this on the module constructor via the mode keyword argument.

Scaling#

Data scaling is configured via ScalingMethod:

  • NONE – No scaling applied

  • MINMAX – Scales to a specified range (default 0-1)

  • STANDARD – Standardizes to zero mean and unit variance

Next Steps#