vLLMOnlineClient
vLLMOnlineClient is the local or internal OpenAI-compatible vLLM client in AutoPipeline. It is implemented in src/autopipeline/components/primitives/clients/vllm_client.py and registered as vllm.
Client
Constructor
Constructor Parameters
| Key | Required | Default | Meaning |
|---|---|---|---|
model_name | Yes | none | Model name served by the vLLM endpoint. |
ip_address | Yes | none | Host where the vLLM server is running. |
port | Yes | none | Port where the vLLM server exposes /v1. |
timeout | No | 600 | Request timeout. |
max_tokens | No | 2048 | Maximum generated tokens. |
retries | No | 3 | Retry budget. |
temperature | No | 0.7 | Sampling temperature. |
extra_body | No | {} | Extra request fields forwarded for specific models. |
The constructor asserts that ip_address and port exist and then builds an OpenAI client with:
http://<ip_address>:<port>/v1
Methods
Public Methods
| Method | Purpose |
|---|---|
call_model(messages) | Execute an OpenAI-style chat completion against a vLLM server. |
Signature
Callable Interface
call_model(messages)
Input contract
messages should already be OpenAI-style chat content, usually built by OpenAIStylePromptAdapter.
Special behavior
If "Qwen3-8B" appears in model_name, the client forwards extra_body to the request. Each retry also changes the seed to:
42 + try_count
Return value
On success, the client returns:
response.choices[0].message.content
On terminal failure, it returns None.
Config
Minimal Config Example
init_config:
backend: vllm
model_name: Qwen3-VL-8B-Instruct
ip_address: ${client_config.ip_address}
port: ${client_config.vlm_port}
max_tokens: 2048
retries: 3
timeout: 600
temperature: 0.7
Failure Mode
Failure Semantics
The retry loop:
- catches
requests.exceptions.RequestException - waits two seconds between attempts
- returns
Noneafter the retry budget is exhausted
Compared with OpenAIAPIClient, the exception handling is narrower and more transport-specific.
Extension
Extension Notes
- Use this client when your local service is OpenAI-compatible.
- Do not overload it with non-compatible protocols.
- If a new server type is not wire-compatible with OpenAI, add a new client instead of branching this one indefinitely.