vLLMOnlineClient

vLLMOnlineClient is the local or internal OpenAI-compatible vLLM client in AutoPipeline. It is implemented in src/autopipeline/components/primitives/clients/vllm_client.py and registered as vllm.

Client

Overview

Registry Entry

Field	Value
Registry key	`vllm`
Class	`vLLMOnlineClient`
Base class	`BaseClient`

Signature

Class Signature

vLLMOnlineClient(
    model_name,
    **kwargs,
)

Constructor

Constructor Parameters

Key	Required	Default	Meaning
`model_name`	Yes	none	Model name served by the vLLM endpoint.
`ip_address`	Yes	none	Host where the vLLM server is running.
`port`	Yes	none	Port where the vLLM server exposes `/v1`.
`timeout`	No	`600`	Request timeout.
`max_tokens`	No	`2048`	Maximum generated tokens.
`retries`	No	`3`	Retry budget.
`temperature`	No	`0.7`	Sampling temperature.
`extra_body`	No	`{}`	Extra request fields forwarded for specific models.

The constructor asserts that ip_address and port exist and then builds an OpenAI client with:

http://<ip_address>:<port>/v1

Methods

Public Methods

Method	Purpose
`call_model(messages)`	Execute an OpenAI-style chat completion against a vLLM server.

Signature

Callable Interface

call_model(messages)

Input contract

messages should already be OpenAI-style chat content, usually built by OpenAIStylePromptAdapter.

Special behavior

If "Qwen3-8B" appears in model_name, the client forwards extra_body to the request. Each retry also changes the seed to:

42 + try_count

Return value

On success, the client returns:

response.choices[0].message.content

On terminal failure, it returns None.

Config

Minimal Config Example

init_config:
  backend: vllm
  model_name: Qwen3-VL-8B-Instruct
  ip_address: ${client_config.ip_address}
  port: ${client_config.vlm_port}
  max_tokens: 2048
  retries: 3
  timeout: 600
  temperature: 0.7

Failure Mode

Failure Semantics

The retry loop:

catches requests.exceptions.RequestException
waits two seconds between attempts
returns None after the retry budget is exhausted

Compared with OpenAIAPIClient, the exception handling is narrower and more transport-specific.

Extension

Extension Notes

Use this client when your local service is OpenAI-compatible.
Do not overload it with non-compatible protocols.
If a new server type is not wire-compatible with OpenAI, add a new client instead of branching this one indefinitely.

Registry Entry​

Class Signature​

Constructor Parameters​

Public Methods​

Callable Interface​

Input contract​

Special behavior​

Return value​

Minimal Config Example​

Failure Semantics​

Extension Notes​