Contributing
Thank you for your interest in contributing to the ODA Data Package! This guide will help you get started with development.
Getting Started
Prerequisites
- Python 3.11 or higher
- uv package manager
- Git
Fork and Clone
- Fork the repository on GitHub
-
Clone your fork locally:
-
Add the upstream repository:
Development Setup
Install Dependencies
Install all dependencies including development and test dependencies:
This installs:
- Core package dependencies
- Development tools (pre-commit, ruff, etc.)
- Testing dependencies (pytest, etc.)
Install Pre-commit Hooks
Pre-commit hooks automatically check your code before each commit:
The hooks will:
- Format code with Ruff
- Check for common Python mistakes
- Validate JSON/YAML files
- Detect accidentally committed secrets
- Remove trailing whitespace
- Ensure files end with newlines
Development Workflow
Creating a Branch
Create a new branch for your feature or fix:
Use descriptive branch names:
feature/add-new-indicatorfix/caching-bugdocs/update-readme
Making Changes
- Make your changes in the appropriate files
- Add tests for new functionality
- Update documentation if needed
- Run tests locally (see below)
Committing Changes
When you commit, pre-commit hooks will automatically run:
What Happens During Commit
The pre-commit hooks will:
- Format your code with Ruff (auto-fixes)
- Lint your code with Ruff (auto-fixes when possible)
- Check file quality (trailing whitespace, EOF, line endings)
- Validate JSON/YAML files
- Check for secrets (private keys, etc.)
Tests Run in CI
Tests are NOT run during commit to keep commits fast. Instead, tests run automatically in CI when you open or update a pull request.
If any hook fails:
- Review the error messages
- Make necessary fixes
- Stage the fixes:
git add . - Commit again
Code Quality
Code Style
We use Ruff for both linting and formatting:
# Format code
uv run ruff format .
# Lint and auto-fix
uv run ruff check --fix .
# Lint without auto-fix
uv run ruff check .
Ruff replaces Black, isort, flake8, and other tools with a single fast tool.
Type Hints
While not strictly enforced, type hints are encouraged:
def get_indicators(
self,
indicators: str | list[str],
base_year: int | None = None
) -> pd.DataFrame:
...
Docstrings
Use clear docstrings for public functions and classes:
def add_gni_share_column(client: OECDClient, indicator: str) -> pd.DataFrame:
"""Add GNI share percentage column to ODA data.
Args:
client: Configured OECDClient instance
indicator: Indicator code to fetch
Returns:
DataFrame with 'gni_share_pct' column added
"""
Running Tests
Run All Tests
Run Specific Tests
# Run a specific test file
uv run pytest tests/test_api.py
# Run a specific test function
uv run pytest tests/test_api.py::test_oecd_client
# Run tests matching a pattern
uv run pytest -k "test_indicator"
Run Tests by Marker
# Skip slow tests (useful during development)
uv run pytest -m "not slow"
# Run only slow tests
uv run pytest -m "slow"
Test Coverage
Check test coverage:
Then open htmlcov/index.html in your browser.
Writing Tests
Place tests in the tests/ directory:
import pytest
from oda_data import OECDClient
def test_oecd_client_initialization():
"""Test that OECDClient initializes correctly."""
client = OECDClient(years=range(2020, 2023))
assert client.years == range(2020, 2023)
@pytest.mark.slow
def test_bulk_download():
"""Test bulk download functionality (slow)."""
# Mark slow tests that hit external APIs
...
Continuous Integration
Tests run automatically via GitHub Actions when you open or update a pull request. The test suite runs on:
- Operating Systems: Ubuntu, macOS, and Windows
- Python Versions: 3.11, 3.12, and 3.13
This gives us a comprehensive test matrix of 9 combinations to ensure the package works across different platforms.
You can view test results in the "Checks" tab of your pull request. All tests must pass before your PR can be merged.
Adding New Features
Adding a New Indicator
- Add indicator definition to the appropriate JSON file:
- DAC1:
src/oda_data/indicators/dac1/dac1_indicators.json - DAC2A:
src/oda_data/indicators/dac2a/dac2a_indicators.json -
CRS:
src/oda_data/indicators/crs/crs_indicators.json -
If custom processing is needed, add a function:
- DAC1:
src/oda_data/indicators/dac1/dac1_functions.py - DAC2A:
src/oda_data/indicators/dac2a/dac2a_functions.py -
CRS:
src/oda_data/indicators/crs/crs_functions.py -
Reference the function in the indicator's
custom_functionfield -
Add tests for the new indicator
Example indicator definition:
{
"code": "DAC1.NEW.INDICATOR",
"name": "My New Indicator",
"description": "Description of what this indicator measures",
"sources": ["DAC1"],
"filters": {
"DAC1": {
"measure": "net_disbursement",
"flow_type": ["1010"]
}
},
"custom_function": "process_new_indicator"
}
Modifying Data Sources
When modifying source classes in src/oda_data/api/sources.py:
- Override methods as needed:
__init__()- Define filter parameters_create_bulk_fetcher()- Bulk download logic-
download()- API-based retrieval -
Use
_init_filters()for standard filters (years, providers, recipients) -
Test with both API and bulk downloads
-
Update documentation if the API changes
Submitting Changes
Before Submitting a Pull Request
- [ ] All tests pass locally
- [ ] Pre-commit hooks pass
- [ ] New features have tests
- [ ] Documentation is updated
- [ ] CHANGELOG.md is updated (if applicable)
- [ ] Code follows project conventions
Creating a Pull Request
-
Push your branch to your fork:
-
Create a pull request on GitHub
-
Describe your changes:
- What does this PR do?
- Why is this change needed?
- How has it been tested?
-
Are there any breaking changes?
-
Link related issues if applicable
PR Review Process
- Maintainers will review your PR
- Address any feedback or requested changes
- Once approved, a maintainer will merge your PR
Project Structure
oda_data_package/
├── src/oda_data/ # Main package code
│ ├── api/ # API clients and data sources
│ │ ├── oecd.py # OECDClient (main entry point)
│ │ └── sources.py # Data source classes
│ ├── indicators/ # Indicator definitions
│ │ ├── dac1/ # DAC1 indicators and functions
│ │ ├── dac2a/ # DAC2A indicators and functions
│ │ └── crs/ # CRS indicators and functions
│ ├── clean_data/ # Data cleaning and schema
│ │ ├── schema.py # Column mappings
│ │ ├── common.py # Cleaning functions
│ │ └── validation.py # Input validation
│ ├── tools/ # Utility functions
│ │ ├── cache.py # Caching system
│ │ ├── groupings.py # Provider/recipient groupings
│ │ ├── gni.py # GNI calculations
│ │ └── names/ # Name mapping utilities
│ └── __init__.py # Package exports
├── tests/ # Test files
├── docs/ # Documentation
├── .github/workflows/ # GitHub Actions
├── .pre-commit-config.yaml # Pre-commit hooks configuration
├── pyproject.toml # Project metadata and dependencies
└── README.md # User documentation
Key Components
- OECDClient: High-level API for users (in
api/oecd.py) - Data Sources: DAC1Data, DAC2AData, CRSData classes (in
api/sources.py) - Indicators: JSON definitions + custom processing functions
- Caching: 3-tier system (memory, query cache, bulk cache)
- Schema: Column name normalization and mappings
Getting Help
- Issues: Check existing issues or create a new one
- Discussions: Start a discussion for questions or ideas
- Documentation: See other documentation pages for usage examples
License
By contributing, you agree that your contributions will be licensed under the same MIT license as the project.