Aug 16, 2025 ML Ops

Your ML Repo Starter Kit: src Layout, uv venv, GitHub Push

Most ML ideas start life in a notebook. Some even grow up there. But if you’ve ever tried to take a promising experiment from your laptop - or even worse, from another developer's laptop - to another environment, you know the pain: missing dependencies, mystery settings, scattered scripts, and plots living in five different folders. Today, we fix that by giving your project a simple, professional backbone you can build on.

By the end of this step, you’ll have:

a minimal-but-solid repo structure you can reuse,
a reproducible environment managed with uv,
your package importable from scripts/notebooks,
a first dataset wired in for quick iteration,
and your initial commit live on GitHub.

Repository Structure

In the following sections, we will create this structure step by step and explain what each component is for:

.
├── artifacts/                     # generated: duckdb, plots, metrics, temp exports
├── config/
│   ├── project.yaml               # paths, switches, seeds
│   ├── features.yaml              # feature toggles/params
│   └── model.yaml                 # model + training params
├── data/                          # tiny sample inputs only (ok to commit if small)
├── notebooks/
│   └── explore.ipynb
├── scripts/
│   ├── train.py                   # calls into src.<pkg>.* with configs
│   └── predict.py
├── src/
│   └── <your_pkg>/
│       ├── __init__.py
│       ├── config.py              # Pydantic loader for YAML files
│       ├── data.py                # load/split IO; write to DuckDB if used
│       ├── features.py            # transforms / feature sets
│       ├── modeling.py            # fit/persist; MLflow logging hooks
│       └── evaluation.py          # metrics & plots
├── tests/
│   ├── __init__.py
│   └── test_smoke.py
├── pyproject.toml
├── README.md
└── uv.lock

What tools will add for you

Directories that tools might add to you, depending on what you use:

dist/ → appears when you run uv build (wheel + sdist). No need to pre-create. Astral Docs
.venv/ → created on first uv sync / uv run inside the project. Astral Docs
mlruns/ → MLflow’s default local store; created the first time you log a run. mlflow.org
pycache/ → Python runtime byproduct → ignore

Developing locally with your `src` package (editable)

The mental model

Your code lives under src/<pkg_name>/….
Your virtual environment “installs” that package in editable mode (PEP 660).
You run everything through that environment (via uv run …), so import <pkg_name> just works.
Because it’s editable, changing files in src/<pkg_name> is immediately visible to Python without needing to reinstall.

Bootstrap the repository and environment with `uv`

# 1) Create project as a packaged app (gives you the src layout and basics for building the package later)
uv init --package ml-blueprint
cd ml-blueprint

# 2) Build the package locally and set up a venv at the same time
uv sync

Some notes on defaults and changes you might want to make:

Change the package name
- Change "ml-blueprint" to anything else you might want the project or package called.
Specify Python version
- If you want a different or specific Python version for your virtual environment, change the Python version in both the pyproject.toml and run uv python pin 3.12 to update the version designated in the .python-version file in your repository before running uv sync.
- You can also change this later; just rerun uv sync, and it will create a new venv for you.
Add additional Python libraries/dependencies
- If you already know which libraries you'll need for your project, you can add them to both the pyproject.toml and the venv at the same time by running uv add <package-name>, for example, uv add scikit-learn.

Setting Up Directories

# 3) Create necessary folders
mkdir -p config data artifacts notebooks scripts tests

Purpose of each directory:

config: Contains configuration files like project_config.yaml and model_config.yaml, which define paths, hyper-parameters, and feature transformations.
data: Stores small data samples that can be shared via version control without overwhelming the system.
artifacts: Holds all output files, such as a local DuckDB database, plots, and model files. These are typically not shared via version control due to their size. Instead, use services like MLflow for sharing model files and plots when tracking experiments.
notebooks: Used for experimenting with the existing code base. These notebooks should import from the ml_blueprint source package whenever possible and should not contain any logic intended for long-term use.
scripts: Contains lightweight entry points for executing workflows, such as a training script that initiates and calls classes and methods from the src package. Major logic should not reside here; instead, focus on orchestrating workflows and loading configurations from the config files.
tests: A place for unit tests and other testing scripts, if you choose to include them.

Connect to git & adjust `.gitignore`

# Python-generated files
__pycache__/
*.py[oc]
# Virtual environments
.venv/
.ruff_cache/
.pytest_cache/
# Builds & metadata
build/
dist/
wheels/
*.egg-info
# ML / local runs
mlruns/
artifacts/
*.db
# OS/IDE
.DS_Store
.idea/
.vscode/

You can create the remote GitHub repository via the GitHub CLI (brew install gh if you are using Homebrew on Mac or Linux):

# First time set-up/authentication:
gh auth login
git config --global init.defaultBranch main

# 1) initialize git if needed
git init

# 2) stage & commit
git add .
git commit -m "chore: initial commit"

# 3) create the remote GitHub repo and push in one go
#    replace ml-blueprint with your repo name; add --private or --public
gh repo create ml-blueprint --public --source=. --remote=origin --push

# 4) confirm
git remote -v

Subscribe to this publication

No spam, no sharing to third party. Only you and me.