Classification Modules#

The classification module provides data loaders for the UCR/UEA Time Series Classification Archive, a standard benchmark collection for time series classification research.

For the full API of all classification data modules, see the DataModule API Reference reference.

UCR Classification Data Module#

The UCR archive contains univariate, equal-length time series classification datasets stored in ARFF format. Each dataset directory provides TRAIN.arff and TEST.arff files with feature columns and a target label column.

from pathlib import Path

from chronocratic.datasets import UCRClassificationDataModule

module = UCRClassificationDataModule(
    dataset_folder_path=Path("data/FogiDataset1"),
    target_column_name="class",
    scale_data=True,
    batch_size=32,
)

module.prepare_data()
module.setup()
train_loader = module.train_dataloader()

Key details:

dataset_folder_path points to the directory containing the .arff files. The module auto-discovers {dataset_name}_TRAIN.arff and {dataset_name}_TEST.arff.
target_column_name specifies the label column name in the ARFF files.
Sequence length is derived from the number of feature columns.
Handles variable-length series automatically via padding.

See the UCRClassificationDataModule API reference for all constructor parameters.

UEA Classification Data Module#

The UEA archive contains multivariate and/or variable-length time series classification datasets stored in nested ARFF format. These datasets have multiple dimensions per timestep and may have different sequence lengths per sample.

from pathlib import Path

from chronocratic.datasets import UEAClassificationDataModule

module = UEAClassificationDataModule(
    dataset_folder_path=Path("data/ArrowHead"),
    target_column_name="class",
    scale_data=True,
    batch_size=32,
)

module.prepare_data()
module.setup()
train_loader = module.train_dataloader()

Key details:

Uses scipy.io.arff.loadarff directly for reading nested ARFF format.
Automatically encodes string labels via sklearn.preprocessing.LabelEncoder.
Data form is NESTED, meaning each sample may have variable length and multiple dimensions.
Sequence length and feature count are derived from the data at load time.

See the UEAClassificationDataModule API reference for all constructor parameters.

Dataset Classes#

Under the hood, the data modules use these PyTorch Dataset classes:

See the Dataset API Reference reference for full class documentation.

Loader Mode#

Classification modules support multiple loader modes via ClassificationLoaderMode:

SAMPLE_ONLY – Returns only the input sample tensor (no labels)
SAMPLE_LABEL – Returns the input sample tensor and its label (default)

Set this on the train_dataloader(), val_dataloader(), and test_dataloader() calls via the mode keyword argument.

# Without labels
train_loader = module.train_dataloader(mode=ClassificationLoaderMode.SAMPLE_ONLY)

Splitting Strategy#

Control how the archive’s train/test split is handled via ClassificationSplitMode:

AS_DEFINED – Keep the original train/test split from the archive
MANUAL – Re-split the combined data with a custom test_size fraction

Set this on the module constructor via the splitting_strategy keyword argument.

Scaling#

Data scaling is configured via ScalingMethod:

NONE – No scaling applied
MINMAX – Scales to a specified range (default 0-1)
STANDARD – Standardizes to zero mean and unit variance

Next Steps#

See the Forecasting Modules guide for forecasting datasets.
See the DataModule API Reference reference for the full API of classification data modules.
See the Enum API Reference reference for all enum options.