# Forecasting Modules The forecasting module provides data loaders for popular multivariate time series forecasting benchmarks, built on top of PyTorch Lightning's `LightningDataModule`. Each module handles data loading, splitting, caching, and provides ready-to-use DataLoaders for training, validation, and testing. For the full API of all forecasting data modules, see the {doc}`api/modules` reference. ## Weather Module The Weather dataset contains 22 meteorological features recorded hourly over 7 years (2012-2017). It uses fixed 60/20/20 temporal splits (train/validation/test). ```python from pathlib import Path from chronocratic.datasets import ForecastingMode, WeatherModule module = WeatherModule( dataset_file_path=Path("data/weather.csv"), mode=ForecastingMode.MULTIVARIATE, seq_len=24, forecast_horizon=168, scale_data=True, batch_size=32, ) module.prepare_data() module.setup() train_loader = module.train_dataloader() ``` In **univariate mode**, only the last column (`WetBulbCelsius`) is retained as the target variable. In **multivariate mode**, all 22 features are used. See the {py:class}`~chronocratic.datasets.modules.WeatherModule` API reference for all constructor parameters. ## ETT Data Module The ETT (Electricity Transformer Temperature) dataset contains two frequency variants: - **ETTh1, ETTh2** -- Hourly temperature data (17,420 timesteps) - **ETTm1, ETTm2** -- 15-minute temperature data (69,680 timesteps) The module requires an explicit `variant` parameter to determine the correct 16-month / 4-month / 4-month train/validation/test split boundaries. ```python from pathlib import Path from chronocratic.datasets import ETTDataModule, ForecastingMode module = ETTDataModule( dataset_file_path=Path("data/ETTm1.csv"), variant="ETTm1", mode=ForecastingMode.UNIVARIATE, seq_len=96, forecast_horizon=96, scale_data=True, batch_size=32, ) module.prepare_data() module.setup() ``` In **univariate mode**, only the `OT` (outer temperature) column is used. See the {py:class}`~chronocratic.datasets.modules.ETTDataModule` API reference for all constructor parameters. ## Electricity Load Module The Electricity dataset contains hourly power consumption data for 370 independent customers over 3 years (2012-2014). Each customer is treated as a separate time series (not as a feature), producing shape `(370, T, 1)` after transformation. It uses fixed 60/20/20 temporal splits. ```python from pathlib import Path from chronocratic.datasets import ElectricityLoadModule, ForecastingMode module = ElectricityLoadModule( dataset_file_path=Path("data/electricity.csv"), mode=ForecastingMode.MULTIVARIATE, seq_len=96, forecast_horizon=24, scale_data=True, batch_size=32, ) module.prepare_data() module.setup() ``` In **univariate mode**, only customer `MT_001` is retained. See the {py:class}`~chronocratic.datasets.modules.ElectricityLoadModule` API reference for all constructor parameters. ## Dataset Classes Under the hood, the data modules use these PyTorch Dataset classes: - {py:class}`~chronocratic.datasets.datatypes.ETTDataset` - {py:class}`~chronocratic.datasets.datatypes.WeatherDataset` - {py:class}`~chronocratic.datasets.datatypes.ElectricityDataset` See the {doc}`api/datatypes` reference for full class documentation. ## Loader Mode Forecasting modules support multiple loader modes via {py:class}`~chronocratic.datasets.enums.ForecastingLoaderMode`: - **RAW_SERIES** -- Returns the full raw time series (default) - **INPUT_TARGET** -- Returns input and target tensors for supervised learning - **INPUT_ONLY** -- Returns only the input tensor without targets Set this on the `train_dataloader()`, `val_dataloader()`, and `test_dataloader()` calls via the `loader_mode` keyword argument. ```python # Supervised learning format: (input_window, target_window) train_loader = module.train_dataloader(loader_mode=ForecastingLoaderMode.INPUT_TARGET) ``` ## Forecasting Mode Control variable selection with {py:class}`~chronocratic.datasets.enums.ForecastingMode`: - **UNIVARIATE** -- Use a single target variable per sample - **MULTIVARIATE** -- Use all available variables per sample Set this on the module constructor via the `mode` keyword argument. ## Scaling Data scaling is configured via {py:class}`~chronocratic.datasets.enums.ScalingMethod`: - **NONE** -- No scaling applied - **MINMAX** -- Scales to a specified range (default 0-1) - **STANDARD** -- Standardizes to zero mean and unit variance ## Next Steps - See the {doc}`classification` guide for time series classification datasets. - See the {doc}`api/modules` reference for the full API of forecasting data modules. - See the {doc}`api/enums` reference for all enum options.