Skip to main content

vLLMOnlineClient

vLLMOnlineClient is the local or internal OpenAI-compatible vLLM client in AutoPipeline. It is implemented in src/autopipeline/components/primitives/clients/vllm_client.py and registered as vllm.

Client
Overview

Registry Entry

FieldValue
Registry keyvllm
ClassvLLMOnlineClient
Base classBaseClient
Signature

Class Signature

vLLMOnlineClient(
model_name,
**kwargs,
)
Constructor

Constructor Parameters

KeyRequiredDefaultMeaning
model_nameYesnoneModel name served by the vLLM endpoint.
ip_addressYesnoneHost where the vLLM server is running.
portYesnonePort where the vLLM server exposes /v1.
timeoutNo600Request timeout.
max_tokensNo2048Maximum generated tokens.
retriesNo3Retry budget.
temperatureNo0.7Sampling temperature.
extra_bodyNo{}Extra request fields forwarded for specific models.

The constructor asserts that ip_address and port exist and then builds an OpenAI client with:

http://<ip_address>:<port>/v1
Methods

Public Methods

MethodPurpose
call_model(messages)Execute an OpenAI-style chat completion against a vLLM server.
Signature

Callable Interface

call_model(messages)

Input contract

messages should already be OpenAI-style chat content, usually built by OpenAIStylePromptAdapter.

Special behavior

If "Qwen3-8B" appears in model_name, the client forwards extra_body to the request. Each retry also changes the seed to:

42 + try_count

Return value

On success, the client returns:

response.choices[0].message.content

On terminal failure, it returns None.

Config

Minimal Config Example

init_config:
backend: vllm
model_name: Qwen3-VL-8B-Instruct
ip_address: ${client_config.ip_address}
port: ${client_config.vlm_port}
max_tokens: 2048
retries: 3
timeout: 600
temperature: 0.7
Failure Mode

Failure Semantics

The retry loop:

  • catches requests.exceptions.RequestException
  • waits two seconds between attempts
  • returns None after the retry budget is exhausted

Compared with OpenAIAPIClient, the exception handling is narrower and more transport-specific.

Extension

Extension Notes

  • Use this client when your local service is OpenAI-compatible.
  • Do not overload it with non-compatible protocols.
  • If a new server type is not wire-compatible with OpenAI, add a new client instead of branching this one indefinitely.