Add a Pipeline Family

Adding a new pipeline family is the most powerful extension path, but it is also the most invasive one. Use it only when the runtime orchestration itself no longer fits object-centric, human-centric, or vlm-as-a-judge.

Guide

Start from the Closest Existing Family

The current families already give you three useful templates:

object-centric parser-grounder plus region-aware metrics
human-centric parser-grounder plus experts plus region-aware metrics
vlm-as-a-judge metric-only pairwise evaluation without parser-grounder or experts

In most cases, the fastest path is to copy the nearest family and then remove what you do not need.

Add and Register the New Family

Pipeline family implementations live under src/autopipeline/pipelines/.

Minimal pattern:

from .base_pipeline import BasePipeline
from . import PIPELINE_REGISTRY

@PIPELINE_REGISTRY.register("my-pipeline")
class MyPipeline(BasePipeline):
    required_configs = ["metric_configs"]

    def __init__(self, **kwargs):
        super().__init__(kwargs)
        ...

    def __call__(self, input_dict, **kwargs):
        ...

Then import the new module from src/autopipeline/pipelines/__init__.py, otherwise PipelineLoader will not find it.

Reuse `BasePipeline` Where Possible

BasePipeline already provides several useful building blocks:

required-config validation
image parsing
parser-grounder loading
expert loading
metric config parsing
pipe caching and smart loading

If your new family can reuse those helpers, do so. If not, make the divergence explicit in your own class rather than fighting hidden assumptions later.

Add Config Files, Not New CLI Flags

The CLI already accepts a pipeline YAML path. That means most integration happens through config files, not argument parsing.

Typical required files:

configs/pipelines/<family>/...
optionally shared defaults in configs/pipelines/modules_init/pipes_default.yaml
optionally expert defaults in configs/pipelines/modules_init/experts_default.yaml
candidate pool files under configs/datasets/candidate_pools/ if the new family supports new annotation tasks

The YAML name field must exactly match the pipeline registry key.

Expect Some Non-Generic Runtime Wiring

This is the part most people underestimate.

The current runner stack still contains explicit assumptions about the existing families, including:

executor choice
annotation output grouping
eval eligibility
worker-side result shaping
some dataset loading paths

So a truly new pipeline family usually requires updates outside src/autopipeline/pipelines/, especially in:

src/cli/autopipeline.py
src/autopipeline/runners/workers.py
dataset-loading or result-formatting code when the new family changes input or output shape

Important Caveats

There are a few framework assumptions to check early:

BasePipeline only maps edit_area and unedit_area to mask_mode
expert loading is tied to metric-name prefixes used by the current human-centric design
parser-grounder is treated as a special pre-step rather than a regular metric

If your new family violates those assumptions, plan for code changes in the base layer instead of trying to hide the mismatch in YAML.

Validation Checklist

the class is imported from pipelines/__init__.py
a pipeline YAML with name: <your-family> loads successfully
one minimal single-sample run succeeds
the runner emits the correct artifact shape
multi-worker execution still behaves correctly
the family does not accidentally depend on hardcoded behavior from another family

Start from the Closest Existing Family​

Add and Register the New Family​

Reuse BasePipeline Where Possible​

Add Config Files, Not New CLI Flags​

Expect Some Non-Generic Runtime Wiring​

Important Caveats​

Validation Checklist​