How to Choose a Pipeline
If you remember only one rule, remember this:
- Use
human-centricorobject-centricwhen you want structured scores for a single edited image - Use
vlm-as-a-judgewhen you want to compare two candidate edits directly
Quick selection matrix
| Question | Recommended pipeline | Output |
|---|---|---|
| Did a single edit satisfy an object-level instruction? | object-centric | scope -> metric -> score |
| Did a single human edit preserve identity, appearance, or local consistency? | human-centric | scope -> metric -> score |
| Which of two candidate edits is better? | vlm-as-a-judge | winner + raw_responses |
When to choose object-centric
Typical tasks include:
subject_addsubject_removesubject_replacecolor_altermaterial_altersize_adjustmenttext_editingcref/oref
Key characteristics:
- Scores a single edited image at a time
- Relies primarily on parser-grounder plus general-purpose visual metric pipes
- Starter configs live in
configs/pipelines/object_centric/
When to choose human-centric
Choose this family for:
ps_humanmotion_change- other tasks that depend on human-local consistency
Key characteristics:
- Loads
face-detector,human-segmenter, andhair-segmenter - Separates
edit_areafromunedit_area - Uses attribute-aware measurement rubrics to decide which metrics actually run
When to choose vlm-as-a-judge
Choose this family for:
- benchmark pairwise evaluation
- direct A/B comparison of candidate outputs
- direct construction of pairwise training pairs
Key characteristics:
- Expects
instruction + input_image + edited_images - Returns
Image A,Image B,Tie, orFailed - Starter configs live in
configs/pipelines/vlm_as_a_judge/
A common confusion
Do not confuse "I want train pairs" with "which pipeline should I choose."
train-pairsis a post-processing command- The earlier choice between
annotationandevaldepends on whether you want structured scores or pairwise winners first
Continue with: