Platform – Custom Vision Language Models (VLM)
To access the Ximilar VLM Platform, first register at Ximilar App to get your API token. This service is currently in beta and only available to selected users.
Vision Language Models (VLM) by Ximilar enable you to train custom vision-language models (VLM) for structured image analysis tasks. Unlike traditional simple image classification, VLM models can:
- generate structured outputs like JSON/YAML/XML/CSV responses with explanations
- analyze multiple images (video frames) at once
- accept meta data to guide the analysis
Instruction Fine-tuning
VLM training uses instruction fine-tuning – a technique where you teach the model to follow specific
instructions by providing example input-output pairs. Each dataset defines a template for the expected
outputs, including prompts and variables that represent the output schema. Training samples contain
images along with the annotated variable values that serve as ground truth. During training, the model
learns to generate structured outputs matching your template based on the visual content of the images.
Key Concepts
The VLM system uses a hierarchical structure:
- Task: Defines the AI model with system and user prompts, connects to multiple datasets
- Model: The result of the training process of a Task. A trained AI model with stored weights and metrics.
- Dataset: Collection of training samples with a result template that defines the output format
- Variable: Schema definition for output variables (type, constraints, validation rules)
- Sample: Individual training example with images and annotated variable values
Use Cases
- Product Description Generation: Generate structured product descriptions from images
- Quality Grading: Analyze items and generate quality grades with explanations
- Image Comparison: Compare multiple images and describe differences
- Structured Data Extraction: Extract specific data points from images in JSON format
- Advanced OCR: Extract structured text from images like invoices with OCR like system
All Endpoints
https://api.ximilar.com/vlm/v2/task/
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/add-dataset/
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/remove-dataset/
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/train/
Task Endpoints
List Tasks
List all VLM tasks in your workspace. Returns paginated results.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
Optional attributes
- Name
workspace- Type
- string
- Description
Filter by workspace ID.
- Name
search- Type
- string
- Description
Search tasks by name.
Request
curl -v -XGET \
-H 'Authorization: Token __API_TOKEN__' \
https://api.ximilar.com/vlm/v2/task/
Get Task
Get details of a specific VLM task by its ID.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
task_id- Type
- string
- Description
UUID of the task.
Returns
- Name
id- Type
- string
- Description
UUID of the task.
- Name
name- Type
- string
- Description
Name of the task.
- Name
system_prompt- Type
- string
- Description
System prompt for the AI model.
- Name
user_prompt- Type
- string
- Description
User prompt (instruction) for the AI model.
- Name
max_tokens- Type
- integer
- Description
Maximum number of tokens for model output.
- Name
datasets- Type
- array
- Description
List of dataset IDs connected to this task.
- Name
production_version- Type
- integer
- Description
Currently active model version.
Request
curl -v -XGET \
-H 'Authorization: Token __API_TOKEN__' \
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/
Response
{
"id": "dbb498ba-cf24-4400-9897-d5196444a880",
"name": "My Custom Task",
"created": "2025-12-17T14:59:34.529951Z",
"description": "Custom VLM task for structured image analysis",
"auto_deploy": true,
"production_version": 0,
"system_prompt": "You are a helpful assistant...",
"user_prompt": "Analyse the image[s]...",
"max_tokens": 1000,
"datasets": ["8797c273-b1d3-4e6f-82bb-adfb719415fe"],
"dataset_count": 1,
"workspace": "748e50e4-d081-4924-b9e7-f500aac6a71d"
}
Train Task
Start training a VLM model for the specified task. The training process uses all samples from connected datasets to fine-tune the vision-language model.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
task_id- Type
- string
- Description
UUID of the task to train.
Request
curl -v -XPOST \
-H 'Authorization: Token __API_TOKEN__' \
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/train/
Add Dataset to Task
Connect a dataset to a VLM task. A task can have multiple datasets for training.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
task_id- Type
- string
- Description
UUID of the task.
- Name
dataset_id- Type
- string
- Description
UUID of the dataset to add.
Request
curl -v -XPOST \
-H 'Authorization: Token __API_TOKEN__' \
-H 'Content-Type: application/json' \
-d '{"dataset_id": "__DATASET_ID__"}' \
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/add-dataset/
Remove Dataset from Task
Remove a dataset from a VLM task.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
task_id- Type
- string
- Description
UUID of the task.
- Name
dataset_id- Type
- string
- Description
UUID of the dataset to remove.
Request
curl -v -XPOST \
-H 'Authorization: Token __API_TOKEN__' \
-H 'Content-Type: application/json' \
-d '{"dataset_id": "__DATASET_ID__"}' \
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/remove-dataset/
Dataset Endpoints
List Datasets
List all VLM datasets in your workspace. Returns paginated results.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
Request
curl -v -XGET \
-H 'Authorization: Token __API_TOKEN__' \
https://api.ximilar.com/vlm/v2/dataset/
Get Dataset
Get details of a specific VLM dataset by its ID.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
dataset_id- Type
- string
- Description
UUID of the dataset.
Returns
- Name
id- Type
- string
- Description
UUID of the dataset.
- Name
name- Type
- string
- Description
Name of the dataset.
- Name
system_prompt- Type
- string
- Description
System prompt (can override task's prompt).
- Name
user_prompt- Type
- string
- Description
User prompt for this dataset.
- Name
result_template- Type
- string
- Description
Template defining the expected output format with variable placeholders.
- Name
samples_count- Type
- integer
- Description
Number of samples in this dataset.
- Name
variables_count- Type
- integer
- Description
Number of variables defined for this dataset.
Request
curl -v -XGET \
-H 'Authorization: Token __API_TOKEN__' \
https://api.ximilar.com/vlm/v2/dataset/__DATASET_ID__/
Response
{
"id": "8797c273-b1d3-4e6f-82bb-adfb719415fe",
"name": "My Training Dataset",
"description": "Training samples for the task",
"version": "2025-12-21",
"system_prompt": "You are a helpful assistant...",
"user_prompt": "Analyse the image[s]...",
"result_template": "{\"grade\": {{grade}}, \"explain\": \"{{explain}}\"}",
"samples_count": 150,
"variables_count": 2,
"workspace": "748e50e4-d081-4924-b9e7-f500aac6a71d"
}
Variable Endpoints
Variables define the schema for your dataset's output format. Each variable has a type and validation constraints.
Supported Variable Types
| Type | Description |
|---|---|
string | Text values with optional min/max length |
integer | Whole numbers with optional min/max value |
float | Decimal numbers with optional min/max value and step size |
boolean | True/false values |
array | List of values |
object | Nested JSON objects |
List Variables
List all variables, optionally filtered by dataset.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
Optional attributes
- Name
dataset- Type
- string
- Description
Filter variables by dataset ID.
- Name
page_size- Type
- integer
- Description
Number of results per page.
Request
curl -v -XGET \
-H 'Authorization: Token __API_TOKEN__' \
'https://api.ximilar.com/vlm/v2/variable/?dataset=__DATASET_ID__'
Response
{
"count": 3,
"next": null,
"previous": null,
"results": [
{
"id": "f82b01ed-e65d-4730-a458-2966cbf86994",
"dataset": "2e3a2346-fa3b-43ca-8a4c-168941692c58",
"dataset_name": "My Training Dataset",
"name": "grade",
"type": "float",
"required": true,
"min_value": 0.0,
"max_value": 10.0,
"step_size": 0.5
},
{
"id": "7b661f0e-7c00-4b59-b334-aca2a0249e2f",
"dataset": "2e3a2346-fa3b-43ca-8a4c-168941692c58",
"dataset_name": "My Training Dataset",
"name": "explain",
"type": "string",
"required": false
}
]
}
Get Variable
Get details of a specific variable by its ID.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
variable_id- Type
- string
- Description
UUID of the variable.
Request
curl -v -XGET \
-H 'Authorization: Token __API_TOKEN__' \
https://api.ximilar.com/vlm/v2/variable/__VARIABLE_ID__/
Create Variable
Create a new variable for a dataset.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
dataset- Type
- string
- Description
UUID of the dataset this variable belongs to.
- Name
name- Type
- string
- Description
Variable name (must follow programming naming conventions).
- Name
type- Type
- string
- Description
Variable type:
string,integer,float,boolean,array, orobject.
Optional attributes
- Name
required- Type
- boolean
- Description
Whether this variable must be provided (default: false).
- Name
description- Type
- string
- Description
Human-readable description.
- Name
choices- Type
- string
- Description
Comma-separated list of allowed values (for string type).
- Name
min_value- Type
- number
- Description
Minimum value (for numeric types).
- Name
max_value- Type
- number
- Description
Maximum value (for numeric types).
- Name
step_size- Type
- number
- Description
Step size (for float type).
- Name
min_length- Type
- integer
- Description
Minimum length (for string/array types).
- Name
max_length- Type
- integer
- Description
Maximum length (for string/array types).
- Name
default_value- Type
- any
- Description
Default value if not provided.
Request
curl -v -XPOST \
-H 'Authorization: Token __API_TOKEN__' \
-H 'Content-Type: application/json' \
-d '{
"dataset": "__DATASET_ID__",
"name": "condition",
"type": "string",
"required": false,
"description": "Condition of the item",
"choices": "mint,near_mint,excellent,good,poor"
}' \
https://api.ximilar.com/vlm/v2/variable/
Sample Endpoints
Samples are individual training examples consisting of one or more images and annotated variable values.
List Samples
List all samples, optionally filtered by dataset.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
Optional attributes
- Name
dataset- Type
- string
- Description
Filter samples by dataset ID.
- Name
test- Type
- boolean
- Description
Filter by test/training samples.
- Name
search- Type
- string
- Description
Search samples by name.
Request
curl -v -XGET \
-H 'Authorization: Token __API_TOKEN__' \
'https://api.ximilar.com/vlm/v2/sample/?dataset=__DATASET_ID__'
Get Sample
Get details of a specific sample including images and variable values.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
sample_id- Type
- string
- Description
UUID of the sample.
Request
curl -v -XGET \
-H 'Authorization: Token __API_TOKEN__' \
https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/
Create Sample
Create a new sample in a dataset.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
dataset- Type
- string
- Description
UUID of the dataset this sample belongs to.
Optional attributes
- Name
name- Type
- string
- Description
Optional name for the sample.
- Name
test- Type
- boolean
- Description
Whether this is a test sample (default: false).
Request
curl -v -XPOST \
-H 'Authorization: Token __API_TOKEN__' \
-H 'Content-Type: application/json' \
-d '{
"dataset": "__DATASET_ID__",
"name": "Sample 1",
"test": false
}' \
https://api.ximilar.com/vlm/v2/sample/
Add Images to Sample
Add images to an existing sample. Images must already be uploaded to your workspace.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
sample_id- Type
- string
- Description
UUID of the sample.
- Name
image_ids- Type
- array
- Description
List of image UUIDs to add.
Request
curl -v -XPOST \
-H 'Authorization: Token __API_TOKEN__' \
-H 'Content-Type: application/json' \
-d '{"image_ids": ["__IMAGE_ID_1__", "__IMAGE_ID_2__"]}' \
https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/add-images/
Set Sample as Test
Mark a sample as a test sample. Test samples are used for model evaluation, not training.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
sample_id- Type
- string
- Description
UUID of the sample.
Request
curl -v -XPOST \
-H 'Authorization: Token __API_TOKEN__' \
https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/set-test/
Unset Sample as Test
Remove the test flag from a sample, making it a training sample.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
sample_id- Type
- string
- Description
UUID of the sample.
Request
curl -v -XPOST \
-H 'Authorization: Token __API_TOKEN__' \
https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/set-untest/
Set Sample Type
Set the sample type which determines how images are used during training.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
sample_id- Type
- string
- Description
UUID of the sample.
- Name
type- Type
- string
- Description
Sample type. Valid values:
single: Take just one image from the list during trainingmulti_random: Default, randomly pick all images during trainingmulti_ordered: Use all images preserving order during training
Request
curl -v -XPATCH \
-H 'Authorization: Token __API_TOKEN__' \
-H 'Content-Type: application/json' \
-d '{"type": "multi_ordered"}' \
https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/
Add Input Metadata to Sample
Add input metadata to a sample. This metadata can be used in the user prompt during training and inference. The metadata is merged with any existing input metadata.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
sample_id- Type
- string
- Description
UUID of the sample.
- Name
input_meta_data- Type
- object
- Description
JSON object containing metadata to add to the sample.
Request
curl -v -XPATCH \
-H 'Authorization: Token __API_TOKEN__' \
-H 'Content-Type: application/json' \
-d '{"input_meta_data": {"style": "elegant", "category": "jacket"}}' \
https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/
Add Variable Value to Sample
Add or update a variable value for a sample. This is how you annotate your training data.
Required attributes
- Name
Authorization- Type
- string
- Description
Unique API token for authentication.
- Name
sample_id- Type
- string
- Description
UUID of the sample.
- Name
dataset_variable- Type
- string
- Description
UUID of the variable to set.
- Name
value- Type
- any
- Description
The value for this variable (type must match variable definition).
Request
curl -v -XPOST \
-H 'Authorization: Token __API_TOKEN__' \
-H 'Content-Type: application/json' \
-d '{
"dataset_variable": "__VARIABLE_ID__",
"value": {"value": 8.5}
}' \
https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/add-variable-value/
Using Different Workspace
When making an API request, the default workspace associated with the user's API token is used. To access data or upload to a different workspace, specify the workspace in the URL or JSON payload.
# Get all samples from a specific workspace
https://api.ximilar.com/vlm/v2/sample/?workspace=WORKSPACE_ID
# Create a sample in a specific workspace
curl -XPOST \
-H 'Authorization: Token __API_TOKEN__' \
-H 'Content-Type: application/json' \
-d '{
"dataset": "__DATASET_ID__",
"workspace": "WORKSPACE_ID"
}' \
https://api.ximilar.com/vlm/v2/sample/
from ximilar.client.vlm import VLMClient
# Initialize client with specific workspace
client = VLMClient(
token="__API_TOKEN__",
workspace="WORKSPACE_ID"
)