Contributing#

Thank you for your interest in contributing to chronocratic-datasets!

Development Setup#

This project uses uv for environment management and package installation.

Prerequisites#

Clone and Install#

git clone https://github.com/chronocratic/datasets.git
cd datasets

# Install with development dependencies
uv sync --all-extras

Code Style#

The project follows these conventions:

  • Type hints: All functions must have type hints for parameters and return types

  • Docstrings: Google-style docstrings for all public functions and classes

  • Naming: snake_case for functions and variables, PascalCase for classes

  • Imports: Use keyword arguments for all function calls

  • Organization: Functional programming patterns preferred; pure functions where possible

Linting and Formatting#

We use ruff for linting and formatting:

# Check for issues
uv run ruff check src/ tests/

# Format code
uv run ruff format src/ tests/

# Check formatting without modifying
uv run ruff format --check src/ tests/

Testing#

Tests are written using pytest. Run the test suite with:

# Run all tests
uv run pytest tests/

# Run with coverage
uv run pytest tests/ --cov=src/chronocratic/datasets

# Run specific test file
uv run pytest tests/test_public_api_exports.py -v

Writing Tests#

  • Place test files in the tests/ directory with test_ prefix

  • Test imports from the package root: from chronocratic.datasets import ForecastingMode

  • Keep tests focused on one behavior per test function

  • Use fixtures for common setup

Documentation#

Documentation is built with Sphinx using MyST Parser for Markdown source files.

# Build documentation
uv run sphinx-build -b html docs/ docs/_build/

Adding Documentation#

  • Write in Markdown (.md files) with MyST directives

  • Use .. autoclass:: for API reference pages

  • Use {doc} for cross-references between pages

  • Update docs/index.md to add new pages to the TOC

Adding New Datasets#

To add a new dataset:

  1. Create a dataset class in src/chronocratic/datasets/datatypes/

  2. Create a data module in src/chronocratic/datasets/modules/

  3. Register exports in the submodule __init__.py

  4. Update the root src/chronocratic/datasets/__init__.py to re-export

  5. Add tests in tests/

  6. Document in the appropriate guide page

Branching Strategy#

This project uses a two-line branching model with dev and main as the only long-lived branches. Both maintain strictly linear histories.

Philosophy#

A linear history is not cosmetic. It makes every commit independently deployable in theory, trivial to bisect, and easy to reason about during code review. Merge commits obscure causality: did bug X come from branch A, B, or the three-way merge itself? Squash-merge and fast-forward policies eliminate that ambiguity.

dev is the integration branch. It collects feature work, may be unstable, and is the source for all releases. main is the release branch. It tracks published versions only — every commit on main corresponds to a tag on PyPI.

Branch Rules#

Rule

dev

main

Source for feature branches

Yes

No

Who can open PRs

Everyone

Maintainers only

Merge strategy

Squash only

Fast-forward only

Rebase allowed

Yes (before PR)

No

Force-push allowed

Own branches only

Never

Contributing Workflow#

All contribution branches must be created from dev. All PRs from contributors target dev.

# 1. Sync with remote dev
git fetch origin
git checkout dev
git pull

# 2. Create feature branch from dev
git checkout -b feat/your-feature

# 3. Commit, push, open PR against dev
git push -u origin feat/your-feature

PRs into dev are squash-merged. This collapses all intermediate commits into a single clean commit on dev. The commit message is rewritten at merge time to follow conventional commits format. Your local branch may have twenty exploratory commits; dev sees one.

Rebase your feature branch onto dev before opening a PR — or immediately after reviewers request changes — so the squash target is clean and CI runs against the latest code.

Release Workflow#

PRs from dev into main are restricted to maintainers and must be fast-forward merged. No squash, no merge commit. Fast-forward means every commit on main was already reviewed and integrated into dev; the act of merging to main is a release assertion, not a code change.

Because dev squash-merges feature work and main fast-forwards from dev, both branches stay linear. git log --oneline main reads as a chronological changelog. git bisect works without navigating merge diamonds.

Commit Messages#

Use Conventional Commits format:

type(scope): summary

[optional body]

Types: feat, fix, docs, ci, refactor, test, chore. Scope is the affected submodule. Summary is imperative mood, no period. Example:

feat(classification): add UCRElectricalGM12 dataset loader

Because contributor PRs are squash-merged, the commit on dev uses the PR title as the subject line. Write PR titles in conventional commits format. Local commits on your feature branch need not follow the format — they are development notes, not release history.

Pull Requests#

  • Write clear commit messages following conventional commits

  • Ensure all tests pass before submitting

  • Update documentation for user-facing changes

  • Reference any related issues in the PR description

License#

By contributing, you agree that your contributions will be licensed under the BSD 3-Clause License. See the LICENSE for details.