Platform – Overview & Basics

This page describes the Ximilar Computer Vision Platform for training custom image recognition models. It supports model types like classification, regression, tagging, object detection, and their combinations via Flows. These models leverage visual AI and machine learning architectures such as convolutional neural networks (CNNs) and vision transformers.

The Ximilar App provides a simple interface for setting up your plan and tasks, uploading images, training models, and evaluating the results. Trained models are automatically deployed as API endpoints for easy integration into your application or system.

Unlike large language models, these computer vision models are optimized for speed and cost, and can even run on edge devices.

Terminology

A task is the starting point for your machine learning project. A task represents the abstract definition for training a recognition model. Each task has a set of labels. Each label can be assigned to multiple training images. Your tasks, data, and images are private and accessible only to you.

A model is the trained version of a task – essentially a neural network trained on your specific images. This makes it highly accurate at recognizing your new images. Each model has an accuracy metric measured at the end of training. Models are private to their owner. Each time you retrain a model, a new version is created with an incremented version number. You can select which model version to deploy in production.

Tasks can be of different types:

  • Image Classification tasks, further divided into:
    • Tagging ('multi_label'): Assigns multiple labels to an image.
    • Categorization ('multi_class'): Assigns a single category to an image.
    • Regression ('regression'): Predicts a numerical value (e.g., an age) from an image.
  • Detection tasks: Identifies and locates objects within images.
  • Similarity tasks: Determines how similar images are to each other.

A label represents a feature you want to recognize in your images. Each task has one or several labels. For image classification tasks, labels have specific types depending on the use case:

  • category for categorization tasks (e.g., shoes)
  • tag for tagging tasks (e.g., leather, brown, high heels)
  • value for regression tasks (e.g., age, height, weight)

To train a model, you must upload training images and either assign labels to them or define objects within them (annotate).

A workspace is where all your images, labels, and tasks are stored. By default, each user owns at least one workspace. You can share your workspace and grant access to other users.

Multiple tasks of different types can be combined and called together via the Flows service.

For example, to recognize apples and bananas in images, you can create a categorization task with two labels: 'apple' and 'banana'. For each label, you need to upload at least 20 images. Once the images are uploaded, you can train the task — the training process may take several minutes to a few hours, depending on the dataset and settings. The result of this training is a model, which is deployed as an API. You can then send simple REST requests with images to classify them as either apple or banana.

Task Types

The Ximilar Computer Vision Platform allows you to train following types of tasks (via App or API):

Classification

This task type is ideal for categorizing images into predefined classes. It's particularly well-suited for high-quality images where you need to assign a single category to each image.

Endpoint: https://api.ximilar.com/recognition/v2/classify/

Best for: Building simple image categorization models.

Examples:

  • Sorting images of cats and dogs.
  • Sorting real estate images by room (bedroom, kitchen, bathroom, living room).
  • Sorting images by color of clothing (red, green, blue).

Tagging

Tagging allows you to assign multiple labels to a single image. This is useful when you need to extract multiple attributes from an image at the same time. In general, a complex tagging model can also be replaced by several simpler classification models connected through Flows.

Endpoint: https://api.ximilar.com/recognition/v2/classify/

Best for: Predicting multiple tags for a single image.

Examples:

  • Extracting attributes from fashion packshots, such as color (red, green, blue), cut (maxi, midi, mini), or sleeves (long, short, sleeveless).
  • Extracting keywords from real estate images (indoor, outdoor, kitchen, wooden, table, chair, etc.).
  • Tagging images of cars (sedan, SUV, hatchback, etc.).
  • Tagging images of cats vs. other animals. All non-cat images would receive a negative tag "not-a-cat". This could also be achieved using categorization, with "cat" and "not-a-cat" as separate categories.

Regression

Image regression (value prediction) tasks are used to predict continuous numerical values from images instead of discrete categories. It is useful for assigning scores, predicting measurements, or other numerical attributes.

Endpoint: https://api.ximilar.com/recognition/v2/classify/

Best for: Predicting continuous numerical values from images.

Examples:

  • Rating real estate photos on a scale of 0 to 100.
  • Predicting the quality or damage of a product or material from an image.
  • Predicting the aesthetics rating of stock photos.
  • Predicting the age of a person from an image.

Custom Similarity

This task type allows you to train models that can determine how similar images are to each other. It's useful for finding visually similar items, creating image search systems, or grouping similar images together.

Endpoint: https://api.ximilar.com/similarity/training/v2/descriptor

Best for: Embedding models as a feature extraction for visual search systems.

Examples:

  • Visual search engine for collectible items (cards, coins, stamps, comics, etc.).
  • Visual search engine for fashion apparel.
  • Image-matching systems for product catalog maintenance.

Object Detection

Object detection tasks identify and locate specific objects within images. This is useful when you need to know not just what objects are in an image, but also where they are located (typically with bounding boxes).

Endpoint: https://api.ximilar.com/detection/v2/detect

Best for: Finding items and marking down their location in images.

Examples:

  • Assisting medical professionals with detection of objects in X-ray images.
  • Detecting cars in CCTV videos.
  • Detecting objects in product packshots (e.g., fashion apparel).
  • Detecting home decor and furniture items in images.


What's next?

First check out our Quickstart guide to get started with the platform. Then try to query details about your user account, statistics, workspaces, and available solutions – User Management. After that, you can start uploading images, creating labels, and training your first model – Image Classification or Object Detection. If you need to combine the models, you can use Flows.

Was this page helpful?