Platform – Custom Vision Language Models (VLM)

Vision Language Models (VLM) by Ximilar enable you to train custom vision-language models (VLM) for structured image analysis tasks. Unlike traditional simple image classification, VLM models can:

  • generate structured outputs like JSON/YAML/XML/CSV responses with explanations
  • analyze multiple images (video frames) at once
  • accept meta data to guide the analysis

Instruction Fine-tuning

VLM training uses instruction fine-tuning – a technique where you teach the model to follow specific instructions by providing example input-output pairs. Each dataset defines a template for the expected outputs, including prompts and variables that represent the output schema. Training samples contain images along with the annotated variable values that serve as ground truth. During training, the model learns to generate structured outputs matching your template based on the visual content of the images.

Key Concepts

The VLM system uses a hierarchical structure:

  • Task: Defines the AI model with system and user prompts, connects to multiple datasets
  • Model: The result of the training process of a Task. A trained AI model with stored weights and metrics.
  • Dataset: Collection of training samples with a result template that defines the output format
  • Variable: Schema definition for output variables (type, constraints, validation rules)
  • Sample: Individual training example with images and annotated variable values

Use Cases

  • Product Description Generation: Generate structured product descriptions from images
  • Quality Grading: Analyze items and generate quality grades with explanations
  • Image Comparison: Compare multiple images and describe differences
  • Structured Data Extraction: Extract specific data points from images in JSON format
  • Advanced OCR: Extract structured text from images like invoices with OCR like system

All Endpoints

https://api.ximilar.com/vlm/v2/task/
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/add-dataset/
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/remove-dataset/
https://api.ximilar.com/vlm/v2/task/__TASK_ID__/train/

Task Endpoints

List Tasks

List all VLM tasks in your workspace. Returns paginated results.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

Optional attributes

  • Name
    workspace
    Type
    string
    Description

    Filter by workspace ID.

  • Name
    search
    Type
    string
    Description

    Search tasks by name.

Request

GET
/v2/task/
curl -v -XGET \
     -H 'Authorization: Token __API_TOKEN__' \
     https://api.ximilar.com/vlm/v2/task/

Get Task

Get details of a specific VLM task by its ID.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    task_id
    Type
    string
    Description

    UUID of the task.

Returns

  • Name
    id
    Type
    string
    Description

    UUID of the task.

  • Name
    name
    Type
    string
    Description

    Name of the task.

  • Name
    system_prompt
    Type
    string
    Description

    System prompt for the AI model.

  • Name
    user_prompt
    Type
    string
    Description

    User prompt (instruction) for the AI model.

  • Name
    max_tokens
    Type
    integer
    Description

    Maximum number of tokens for model output.

  • Name
    datasets
    Type
    array
    Description

    List of dataset IDs connected to this task.

  • Name
    production_version
    Type
    integer
    Description

    Currently active model version.

Request

GET
/v2/task/{task_id}/
curl -v -XGET \
     -H 'Authorization: Token __API_TOKEN__' \
     https://api.ximilar.com/vlm/v2/task/__TASK_ID__/

Response

{
  "id": "dbb498ba-cf24-4400-9897-d5196444a880",
  "name": "My Custom Task",
  "created": "2025-12-17T14:59:34.529951Z",
  "description": "Custom VLM task for structured image analysis",
  "auto_deploy": true,
  "production_version": 0,
  "system_prompt": "You are a helpful assistant...",
  "user_prompt": "Analyse the image[s]...",
  "max_tokens": 1000,
  "datasets": ["8797c273-b1d3-4e6f-82bb-adfb719415fe"],
  "dataset_count": 1,
  "workspace": "748e50e4-d081-4924-b9e7-f500aac6a71d"
}

Train Task

Start training a VLM model for the specified task. The training process uses all samples from connected datasets to fine-tune the vision-language model.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    task_id
    Type
    string
    Description

    UUID of the task to train.

Request

POST
/v2/task/{task_id}/train/
curl -v -XPOST \
     -H 'Authorization: Token __API_TOKEN__' \
     https://api.ximilar.com/vlm/v2/task/__TASK_ID__/train/

Add Dataset to Task

Connect a dataset to a VLM task. A task can have multiple datasets for training.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    task_id
    Type
    string
    Description

    UUID of the task.

  • Name
    dataset_id
    Type
    string
    Description

    UUID of the dataset to add.

Request

POST
/v2/task/{task_id}/add-dataset/
curl -v -XPOST \
     -H 'Authorization: Token __API_TOKEN__' \
     -H 'Content-Type: application/json' \
     -d '{"dataset_id": "__DATASET_ID__"}' \
     https://api.ximilar.com/vlm/v2/task/__TASK_ID__/add-dataset/

Remove Dataset from Task

Remove a dataset from a VLM task.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    task_id
    Type
    string
    Description

    UUID of the task.

  • Name
    dataset_id
    Type
    string
    Description

    UUID of the dataset to remove.

Request

POST
/v2/task/{task_id}/remove-dataset/
curl -v -XPOST \
     -H 'Authorization: Token __API_TOKEN__' \
     -H 'Content-Type: application/json' \
     -d '{"dataset_id": "__DATASET_ID__"}' \
     https://api.ximilar.com/vlm/v2/task/__TASK_ID__/remove-dataset/

Dataset Endpoints

List Datasets

List all VLM datasets in your workspace. Returns paginated results.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

Request

GET
/v2/dataset/
curl -v -XGET \
     -H 'Authorization: Token __API_TOKEN__' \
     https://api.ximilar.com/vlm/v2/dataset/

Get Dataset

Get details of a specific VLM dataset by its ID.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    dataset_id
    Type
    string
    Description

    UUID of the dataset.

Returns

  • Name
    id
    Type
    string
    Description

    UUID of the dataset.

  • Name
    name
    Type
    string
    Description

    Name of the dataset.

  • Name
    system_prompt
    Type
    string
    Description

    System prompt (can override task's prompt).

  • Name
    user_prompt
    Type
    string
    Description

    User prompt for this dataset.

  • Name
    result_template
    Type
    string
    Description

    Template defining the expected output format with variable placeholders.

  • Name
    samples_count
    Type
    integer
    Description

    Number of samples in this dataset.

  • Name
    variables_count
    Type
    integer
    Description

    Number of variables defined for this dataset.

Request

GET
/v2/dataset/{dataset_id}/
curl -v -XGET \
     -H 'Authorization: Token __API_TOKEN__' \
     https://api.ximilar.com/vlm/v2/dataset/__DATASET_ID__/

Response

{
  "id": "8797c273-b1d3-4e6f-82bb-adfb719415fe",
  "name": "My Training Dataset",
  "description": "Training samples for the task",
  "version": "2025-12-21",
  "system_prompt": "You are a helpful assistant...",
  "user_prompt": "Analyse the image[s]...",
  "result_template": "{\"grade\": {{grade}}, \"explain\": \"{{explain}}\"}",
  "samples_count": 150,
  "variables_count": 2,
  "workspace": "748e50e4-d081-4924-b9e7-f500aac6a71d"
}

Variable Endpoints

Variables define the schema for your dataset's output format. Each variable has a type and validation constraints.

Supported Variable Types

TypeDescription
stringText values with optional min/max length
integerWhole numbers with optional min/max value
floatDecimal numbers with optional min/max value and step size
booleanTrue/false values
arrayList of values
objectNested JSON objects

List Variables

List all variables, optionally filtered by dataset.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

Optional attributes

  • Name
    dataset
    Type
    string
    Description

    Filter variables by dataset ID.

  • Name
    page_size
    Type
    integer
    Description

    Number of results per page.

Request

GET
/v2/variable/
curl -v -XGET \
     -H 'Authorization: Token __API_TOKEN__' \
     'https://api.ximilar.com/vlm/v2/variable/?dataset=__DATASET_ID__'

Response

{
  "count": 3,
  "next": null,
  "previous": null,
  "results": [
    {
      "id": "f82b01ed-e65d-4730-a458-2966cbf86994",
      "dataset": "2e3a2346-fa3b-43ca-8a4c-168941692c58",
      "dataset_name": "My Training Dataset",
      "name": "grade",
      "type": "float",
      "required": true,
      "min_value": 0.0,
      "max_value": 10.0,
      "step_size": 0.5
    },
    {
      "id": "7b661f0e-7c00-4b59-b334-aca2a0249e2f",
      "dataset": "2e3a2346-fa3b-43ca-8a4c-168941692c58",
      "dataset_name": "My Training Dataset",
      "name": "explain",
      "type": "string",
      "required": false
    }
  ]
}

Get Variable

Get details of a specific variable by its ID.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    variable_id
    Type
    string
    Description

    UUID of the variable.

Request

GET
/v2/variable/{variable_id}/
curl -v -XGET \
     -H 'Authorization: Token __API_TOKEN__' \
     https://api.ximilar.com/vlm/v2/variable/__VARIABLE_ID__/

Create Variable

Create a new variable for a dataset.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    dataset
    Type
    string
    Description

    UUID of the dataset this variable belongs to.

  • Name
    name
    Type
    string
    Description

    Variable name (must follow programming naming conventions).

  • Name
    type
    Type
    string
    Description

    Variable type: string, integer, float, boolean, array, or object.

Optional attributes

  • Name
    required
    Type
    boolean
    Description

    Whether this variable must be provided (default: false).

  • Name
    description
    Type
    string
    Description

    Human-readable description.

  • Name
    choices
    Type
    string
    Description

    Comma-separated list of allowed values (for string type).

  • Name
    min_value
    Type
    number
    Description

    Minimum value (for numeric types).

  • Name
    max_value
    Type
    number
    Description

    Maximum value (for numeric types).

  • Name
    step_size
    Type
    number
    Description

    Step size (for float type).

  • Name
    min_length
    Type
    integer
    Description

    Minimum length (for string/array types).

  • Name
    max_length
    Type
    integer
    Description

    Maximum length (for string/array types).

  • Name
    default_value
    Type
    any
    Description

    Default value if not provided.

Request

POST
/v2/variable/
curl -v -XPOST \
     -H 'Authorization: Token __API_TOKEN__' \
     -H 'Content-Type: application/json' \
     -d '{
       "dataset": "__DATASET_ID__",
       "name": "condition",
       "type": "string",
       "required": false,
       "description": "Condition of the item",
       "choices": "mint,near_mint,excellent,good,poor"
     }' \
     https://api.ximilar.com/vlm/v2/variable/

Sample Endpoints

Samples are individual training examples consisting of one or more images and annotated variable values.

List Samples

List all samples, optionally filtered by dataset.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

Optional attributes

  • Name
    dataset
    Type
    string
    Description

    Filter samples by dataset ID.

  • Name
    test
    Type
    boolean
    Description

    Filter by test/training samples.

  • Name
    search
    Type
    string
    Description

    Search samples by name.

Request

GET
/v2/sample/
curl -v -XGET \
     -H 'Authorization: Token __API_TOKEN__' \
     'https://api.ximilar.com/vlm/v2/sample/?dataset=__DATASET_ID__'

Get Sample

Get details of a specific sample including images and variable values.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    sample_id
    Type
    string
    Description

    UUID of the sample.

Request

GET
/v2/sample/{sample_id}/
curl -v -XGET \
     -H 'Authorization: Token __API_TOKEN__' \
     https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/

Create Sample

Create a new sample in a dataset.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    dataset
    Type
    string
    Description

    UUID of the dataset this sample belongs to.

Optional attributes

  • Name
    name
    Type
    string
    Description

    Optional name for the sample.

  • Name
    test
    Type
    boolean
    Description

    Whether this is a test sample (default: false).

Request

POST
/v2/sample/
curl -v -XPOST \
     -H 'Authorization: Token __API_TOKEN__' \
     -H 'Content-Type: application/json' \
     -d '{
       "dataset": "__DATASET_ID__",
       "name": "Sample 1",
       "test": false
     }' \
     https://api.ximilar.com/vlm/v2/sample/

Add Images to Sample

Add images to an existing sample. Images must already be uploaded to your workspace.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    sample_id
    Type
    string
    Description

    UUID of the sample.

  • Name
    image_ids
    Type
    array
    Description

    List of image UUIDs to add.

Request

POST
/v2/sample/{sample_id}/add-images/
curl -v -XPOST \
     -H 'Authorization: Token __API_TOKEN__' \
     -H 'Content-Type: application/json' \
     -d '{"image_ids": ["__IMAGE_ID_1__", "__IMAGE_ID_2__"]}' \
     https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/add-images/

Set Sample as Test

Mark a sample as a test sample. Test samples are used for model evaluation, not training.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    sample_id
    Type
    string
    Description

    UUID of the sample.

Request

POST
/v2/sample/{sample_id}/set-test/
curl -v -XPOST \
     -H 'Authorization: Token __API_TOKEN__' \
     https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/set-test/

Unset Sample as Test

Remove the test flag from a sample, making it a training sample.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    sample_id
    Type
    string
    Description

    UUID of the sample.

Request

POST
/v2/sample/{sample_id}/set-untest/
curl -v -XPOST \
     -H 'Authorization: Token __API_TOKEN__' \
     https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/set-untest/

Set Sample Type

Set the sample type which determines how images are used during training.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    sample_id
    Type
    string
    Description

    UUID of the sample.

  • Name
    type
    Type
    string
    Description

    Sample type. Valid values:

    • single: Take just one image from the list during training
    • multi_random: Default, randomly pick all images during training
    • multi_ordered: Use all images preserving order during training

Request

PATCH
/v2/sample/{sample_id}/
curl -v -XPATCH \
     -H 'Authorization: Token __API_TOKEN__' \
     -H 'Content-Type: application/json' \
     -d '{"type": "multi_ordered"}' \
     https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/

Add Input Metadata to Sample

Add input metadata to a sample. This metadata can be used in the user prompt during training and inference. The metadata is merged with any existing input metadata.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    sample_id
    Type
    string
    Description

    UUID of the sample.

  • Name
    input_meta_data
    Type
    object
    Description

    JSON object containing metadata to add to the sample.

Request

PATCH
/v2/sample/{sample_id}/
curl -v -XPATCH \
     -H 'Authorization: Token __API_TOKEN__' \
     -H 'Content-Type: application/json' \
     -d '{"input_meta_data": {"style": "elegant", "category": "jacket"}}' \
     https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/

Add Variable Value to Sample

Add or update a variable value for a sample. This is how you annotate your training data.

Required attributes

  • Name
    Authorization
    Type
    string
    Description

    Unique API token for authentication.

  • Name
    sample_id
    Type
    string
    Description

    UUID of the sample.

  • Name
    dataset_variable
    Type
    string
    Description

    UUID of the variable to set.

  • Name
    value
    Type
    any
    Description

    The value for this variable (type must match variable definition).

Request

POST
/v2/sample/{sample_id}/add-variable-value/
curl -v -XPOST \
     -H 'Authorization: Token __API_TOKEN__' \
     -H 'Content-Type: application/json' \
     -d '{
       "dataset_variable": "__VARIABLE_ID__",
       "value": {"value": 8.5}
     }' \
     https://api.ximilar.com/vlm/v2/sample/__SAMPLE_ID__/add-variable-value/

Using Different Workspace

When making an API request, the default workspace associated with the user's API token is used. To access data or upload to a different workspace, specify the workspace in the URL or JSON payload.

# Get all samples from a specific workspace
https://api.ximilar.com/vlm/v2/sample/?workspace=WORKSPACE_ID

# Create a sample in a specific workspace
curl -XPOST \
     -H 'Authorization: Token __API_TOKEN__' \
     -H 'Content-Type: application/json' \
     -d '{
       "dataset": "__DATASET_ID__",
       "workspace": "WORKSPACE_ID"
     }' \
     https://api.ximilar.com/vlm/v2/sample/
from ximilar.client.vlm import VLMClient

# Initialize client with specific workspace
client = VLMClient(
    token="__API_TOKEN__",
    workspace="WORKSPACE_ID"
)

Was this page helpful?