# Contributing Thank you for your interest in contributing to **chronocratic-datasets**! ## Development Setup This project uses `uv` for environment management and package installation. ### Prerequisites - Python 3.12+ - `uv` — see [docs.astral.sh/uv](https://docs.astral.sh/uv/) for installation ### Clone and Install ```bash git clone https://github.com/chronocratic/datasets.git cd datasets # Install with development dependencies uv sync --all-extras ``` ## Code Style The project follows these conventions: - **Type hints:** All functions must have type hints for parameters and return types - **Docstrings:** Google-style docstrings for all public functions and classes - **Naming:** `snake_case` for functions and variables, `PascalCase` for classes - **Imports:** Use keyword arguments for all function calls - **Organization:** Functional programming patterns preferred; pure functions where possible ### Linting and Formatting We use `ruff` for linting and formatting: ```bash # Check for issues uv run ruff check src/ tests/ # Format code uv run ruff format src/ tests/ # Check formatting without modifying uv run ruff format --check src/ tests/ ``` ## Testing Tests are written using `pytest`. Run the test suite with: ```bash # Run all tests uv run pytest tests/ # Run with coverage uv run pytest tests/ --cov=src/chronocratic/datasets # Run specific test file uv run pytest tests/test_public_api_exports.py -v ``` ### Writing Tests - Place test files in the `tests/` directory with `test_` prefix - Test imports from the package root: `from chronocratic.datasets import ForecastingMode` - Keep tests focused on one behavior per test function - Use fixtures for common setup ## Documentation Documentation is built with Sphinx using MyST Parser for Markdown source files. ```bash # Build documentation uv run sphinx-build -b html docs/ docs/_build/ ``` ### Adding Documentation - Write in Markdown (`.md` files) with MyST directives - Use `.. autoclass::` for API reference pages - Use `{doc}` for cross-references between pages - Update `docs/index.md` to add new pages to the TOC ## Adding New Datasets To add a new dataset: 1. Create a dataset class in `src/chronocratic/datasets/datatypes/` 2. Create a data module in `src/chronocratic/datasets/modules/` 3. Register exports in the submodule `__init__.py` 4. Update the root `src/chronocratic/datasets/__init__.py` to re-export 5. Add tests in `tests/` 6. Document in the appropriate guide page ## Branching Strategy This project uses a two-line branching model with `dev` and `main` as the only long-lived branches. Both maintain strictly linear histories. ### Philosophy A linear history is not cosmetic. It makes every commit independently deployable in theory, trivial to bisect, and easy to reason about during code review. Merge commits obscure causality: did bug X come from branch A, B, or the three-way merge itself? Squash-merge and fast-forward policies eliminate that ambiguity. `dev` is the integration branch. It collects feature work, may be unstable, and is the source for all releases. `main` is the release branch. It tracks published versions only — every commit on `main` corresponds to a tag on PyPI. ### Branch Rules | Rule | `dev` | `main` | | --- | --- | --- | | Source for feature branches | Yes | No | | Who can open PRs | Everyone | Maintainers only | | Merge strategy | Squash only | Fast-forward only | | Rebase allowed | Yes (before PR) | No | | Force-push allowed | Own branches only | Never | ### Contributing Workflow All contribution branches must be created from `dev`. All PRs from contributors target `dev`. ```bash # 1. Sync with remote dev git fetch origin git checkout dev git pull # 2. Create feature branch from dev git checkout -b feat/your-feature # 3. Commit, push, open PR against dev git push -u origin feat/your-feature ``` PRs into `dev` are **squash-merged**. This collapses all intermediate commits into a single clean commit on `dev`. The commit message is rewritten at merge time to follow conventional commits format. Your local branch may have twenty exploratory commits; `dev` sees one. Rebase your feature branch onto `dev` before opening a PR — or immediately after reviewers request changes — so the squash target is clean and CI runs against the latest code. ### Release Workflow PRs from `dev` into `main` are **restricted to maintainers** and must be **fast-forward merged**. No squash, no merge commit. Fast-forward means every commit on `main` was already reviewed and integrated into `dev`; the act of merging to `main` is a release assertion, not a code change. Because `dev` squash-merges feature work and `main` fast-forwards from `dev`, both branches stay linear. `git log --oneline main` reads as a chronological changelog. `git bisect` works without navigating merge diamonds. ### Commit Messages Use [Conventional Commits](https://www.conventionalcommits.org/) format: ``` type(scope): summary [optional body] ``` Types: `feat`, `fix`, `docs`, `ci`, `refactor`, `test`, `chore`. Scope is the affected submodule. Summary is imperative mood, no period. Example: ``` feat(classification): add UCRElectricalGM12 dataset loader ``` Because contributor PRs are squash-merged, the commit on `dev` uses the PR title as the subject line. Write PR titles in conventional commits format. Local commits on your feature branch need not follow the format — they are development notes, not release history. ## Pull Requests - Write clear commit messages following conventional commits - Ensure all tests pass before submitting - Update documentation for user-facing changes - Reference any related issues in the PR description ## License By contributing, you agree that your contributions will be licensed under the BSD 3-Clause License. See the [LICENSE](https://github.com/chronocratic/datasets/blob/main/LICENSE) for details.