Skip to main content

GroundingDINOMixin

GroundingDINOMixin is the local zero-shot detection helper implemented in src/autopipeline/components/primitives/grounding_dino.py.

It is not registry-backed, but it is still a useful reusable primitive when a module needs direct text-conditioned object detection.

Mixin
Signature

Class Signature

GroundingDINOMixin(**kwargs)
Constructor

Constructor Parameters

KeyRequiredDefaultMeaning
model_pathNoGroundingDINO/GroundingDINO-SwinBHugging Face Grounding DINO checkpoint.
deviceNoautoTorch device used for inference.
Methods

Public Methods

MethodPurpose
object_detection(image, text_labels, box_threshold=0.25, text_threshold=0.25)Run zero-shot detection and return the first processed result dict.
get_bounding_boxes(image, text_labels, box_threshold=0.25, text_threshold=0.25)Convert raw detection results into a list of simple bbox dictionaries.
Methods

Method Reference

object_detection(...)

object_detection(
image: Image.Image,
text_labels: List[str],
box_threshold=0.25,
text_threshold=0.25,
)

Input

ParameterMeaning
imagePIL image to search.
text_labelsObject labels used as text prompts.
box_thresholdDetection threshold.
text_thresholdText alignment threshold.

Output

Returns the first processed result dict from the Hugging Face processor.

get_bounding_boxes(...)

Returns a list of dictionaries with:

  • box
  • score
  • label

Coordinates are rounded to integer pixel values.

Config

Minimal Integration Example

This primitive is typically consumed from code rather than directly from YAML, but the equivalent init shape is:

init_config:
model_path: GroundingDINO/GroundingDINO-SwinB
device: cuda
Failure Mode

Failure Semantics

The file does not add explicit local exception handling.

That means:

  • model-loading failures bubble up
  • inference failures bubble up
  • no detections usually result in empty structures rather than a custom sentinel value
Extension

Extension Notes

  • Use this mixin when you need local text-conditioned detection without the full parser-grounder runtime.
  • Keep task-specific grounding orchestration in ParserGrounderPipe, not here.