Skip to content

Image Matching

The Image Matching service can identify duplicate or near-duplicate images. It calculates so called "visual hash" that should be the same or nearly the same for images that are only slightly modified: shift of colors (B/W), re-compression, change of resolution, noise etc.

The API follows the general rules of Ximilar API as described in Section First steps.

The API is a set of HTTP REST services accepting JSON-formatted documents using POST and returning JSON documents. The base URL for this service is:

https://api.ximilar.com/image_matching/v2/<method>

Overview of API Methods

The following methods are "stateless" - they work solely with the images passed in the request:

  • /v2/ping -- test the service and get basic info about it
  • /v2/visual_hash -- get visual hash(es) for given image or images
  • /v2/remove_duplicates -- get a set of images and merge the ones that are duplicates or near-duplicates
  • /v2/rank_images -- get one "query" image and a set of "data" images rank the data images by hash-based similarity to the query image

The Image Matching service also provides an option to store information about your image database in a Ximilar collection and then match images with these stored images. The principle and the API are the same as for Photo & Product Similarity:

  • see this documentation with API prefix https://api.ximilar.com/image_matching/v2/

Parameters of API methods

The Ximilar Search API works with data records that represent a single image. It has the same format in all operations and also in the responses. It is a JSON record (map) with the following fields:

  • _url -- URL with a PNG, JPG, or TIFF image file
  • _base64 -- base64-encoded content of a PNG, JPG or TIFF image file
  • attribute -- a JSON representation of any attribute of the record; these attributes are returned by the method and can be used for identification of individual records within the answer. We typically use attribute _id as unique image ID.

Example of image records in field records which is used by all API methods:

{
  "records": 
  [ 
    {
      "_id": "1",
      "_url": "https://yourdomain.com/images/product_image_321.jpg"
    },
    {
      "_id": "2",
      "_base64": "data:image/jpeg;base64,/9j/4A...."
    }
  ]
}

Return Values

All API methods return:

  • HTTP error code 2XX, if the method was OK and other HTTP error code, if the method failed
  • JSON-formatted body with the status, answer and statistics

Answer fields common for all types of answers:

  • statistics -- a map of various statistics about the processing. The only statistic included every time is
    • processing time -- time of actual processing of the query (in seconds)
  • status -- a JSON map with a status of the method processing. It contains these subfields:
    • code -- a numeric code of the operation status; it follows the concept of HTTP status codes (2XX, 4XX). Specific codes are described for each type of answer (or operation) (see below).
    • text -- a text describing the status code
    • error_description -- in case of the processing ended with error (codes 4XX), this field contains a detailed description of the error; this might include Java stack traces.

Generic statuses that can be returned by any operation:

  • "status": {"code": 200, "text": "OK"}
  • "status": {"code": 402, "text": "aborted by error", error_description="..."}
  • "status": {"code": 500, "text": "unknown error", "error_description": "..."}

Detailed Descriptions of API Methods

/v2/ping

Description: returns a basic information about the index

Example:

curl --request POST \
  --url https://api.ximilar.com/image_matching/v2/ping \
  --header 'authorization: Token __API_TOKEN__'

Returns:

{
  "status": {
    "code": 200,
    "text": "OK"
  },
  "_service_info": {
    "_name": "Image matching service",
    "_info": "Get visual hashes, find (near-)duplicate images and rank them"
  }
}

/v2/visual_hash

Description: get a visual hash (or several different types of hashes) for given image(s)

Parameters:

  • records: list of photos to get hashes for
    • must contain either of _url or _base64 field - see section image data for details
  • hash_type: determine type of visual hash that is used for computing, (default both bmh1 and phash are computed)

Example:

curl --request POST \
  --url https://api.ximilar.com/image_matching/v2/visual_hash \
  --header 'authorization: Token __API_TOKEN__' \
  --header 'content-type: application/json' \
  --data '{
    "records": [
        {"_url": "https://images.ximilar.com/examples/fashion_products/10073009-HERO.jpeg"}
    ]
}'

Returns:

{
  "records": [
    {
      "_url": "https://images.ximilar.com/examples/fashion_products/10073009-HERO.jpeg",
      "_width": 400,
      "_height": 400,
      "bmh1": "11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111001111111111111111111111111111110000111111111111111101111111111110001111111101111111000000001111110011111111001111110000000011111110111111110001111100000000111111111111111100001111000000001111111111111111000001111000000011111111111111110000001110000000111111111111111100000001100000001111111101111111000000001000000011111111001111110000000010000000111111110001111100000000100000001111111100001111000000001100000011111111000001110000000000000000111111110000001100000000000000001100000000000001000000000000000011000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001001100000000000001110000000000011111111000000000111111110000000001",
      "phash": "1111110000001011010000111110110010111001000101100010010110001010"
    }
  ],
  "statistics": {
    "processing time": 0.13515067100524902
  },
  "status": {
    "code": 200,
    "text": "OK",
  }
}

/v2/remove_duplicates

Description: merge the images/records that are matching (based on visual hashes).

Things to note

  • Be aware that when two or more records are matched then they are moved to the field removed_records of the first record. See return value as example.

Parameters:

  • records: list of photos to merge
    • must contain either of _url or _base64 field - see section image data for details
  • hash_type: determine type of visual hash that is used for comparision, (default bmh1, optional phash)
  • range: specify the minimum threshold value that is used for clustering for removing duplicates (default 0)

Example:

curl --request POST \
  --url https://api.ximilar.com/image_matching/v2/remove_duplicates \
  --header 'authorization: Token __API_TOKEN__' \
  --header 'content-type: application/json' \
  --data '{
    "records": [
        {"_url": "__URL_PATH_1__", "_id": 1}, {"_url": "__URL_PATH_1__", "_id": 2}, {"_url": "__URL_PATH_2__", "_id": 3}
    ]
}'

Returns:

{
  "records": [
    {
      "_url": "__URL_PATH_1__",
      "_status": {
        "code": 200,
        "text": "OK",
        "request_id": "7a1cf0ee-a2dd-4fa2-9927-69441bc1d3dc"
      },
      "_id": "1",
      "_width": 259,
      "_height": 460,
      "removed_records": [
        {
          "_url": "__URL_PATH_1__",
          "_status": {
            "code": 200,
            "text": "OK",
            "request_id": "7a1cf0ee-a2dd-4fa2-9927-69441bc1d3dc"
          },
          "_id": "2",
          "_width": 259,
          "_height": 460
        }
      ]
    },
    {
      "_url": "__URL_PATH_2__",
      "_status": {
        "code": 200,
        "text": "OK",
        "request_id": "7a1cf0ee-a2dd-4fa2-9927-69441bc1d3dc"
      },
      "_id": "3",
      "_width": 212,
      "_height": 289
    }
  ],
  "status": {
    "code": 200,
    "text": "OK",
    "request_id": "7a1cf0ee-a2dd-4fa2-9927-69441bc1d3dc",
    "proc_id": "d9ac827a-ee8e-4e3b-a5d2-665c10e3fa84"
  },
  "statistics": {
    "processing time": 0.7242739200592041
  }
}

/v2/rank_images

Description: rank the images/records against the query image, based on image hash.

Things to note

  • Be aware that request of method /v2/rank_images contains records and query_record field and returns query_records and answer_records. This is because we are imitate ranking endpoint of Photo and Product similarity service.

Parameters:

  • query_record: a record/image that you want to compare against records
  • records: list of images to rank
    • must contain either of _url or _base64 field - see section image data for details
  • hash_type: determine type of visual hash that is used for comparision, (default bmh1, optional phash)

Example:

curl --request POST \
  --url https://api.ximilar.com/image_matching/v2/rank_images \
  --header 'authorization: Token __API_TOKEN__' \
  --header 'content-type: application/json' \
  --data '{
    "query_record": {
        "_url": "__URL_PATH_1__"
    },
    "records": [
        {
            "_url": "__URL_PATH_1__"
        },
        {
            "_url": "__URL_PATH_2__"
        },
        {
            "_url": "__URL_PATH_3__"
        },
        {
            "_url": "__URL_PATH_4_NOT_WORKING__"
        }
    ]
}'

Returns:

  • HTTP error code 2XX, if the method was OK and other HTTP error code, if the method failed.
  • Body of the response is a JSON object (map) with the following fields:
    • status - a JSON map with a status of the method processing. It contains these subfields:
      • code - a numeric code of the operation status; it follows the concept of HTTP status codes (2XX, 4XX). Specific codes are described for each type of answer (or operation) (see below).
      • text - a text describing the status code
    • statistics - a map of various statistics about the processing. The only statistic included every time is
      • processing time - time of actual processing of the query [in seconds]
    • query_records - a record/image that you compared against records (returned as array with one record)
    • answer_records - sorted (the first is most matching image and the last is the least one) array of records
    • answer_distances - array of distance values that correspond with answer_records array, lower the value the closer it is to the query record
    • skipped_records - if some record fails with analysis (most common is due to wrong image url), then the record will be present here

{
  "query_records": [
    {
      "_url": "__URL_PATH_1__",
      "_width": 259,
      "_height": 460
    }
  ],
  "answer_distances": [
    0.0,
    0.1,
    26.0
  ],
  "answer_records": [
    {
      "_url": "__URL_PATH_1__",
      "_id": "e7ee2a82-495f-4df7-adc3-5cdb2b5fadf7",
      "_width": 259,
      "_height": 460
    },
    {
      "_url": "__URL_PATH_2__",
      "_id": "b170de44-75e2-4c4c-a41a-ff1ed9fa84b6",
      "_width": 259,
      "_height": 460
    },
    {
      "_url": "__URL_PATH_3__",
      "test": "insomnia",
      "_id": "ee72be0b-ab93-4696-9689-7f866ea9bb38",
      "_width": 212,
      "_height": 289
    }
  ],
  "skipped_records": [
    {
      "_url": "__URL_PATH_4_NOT_WORKING__",
      "_status": {
        "code": 400,
        "text": "Error Loading Image: Unable to download image from '__URL_PATH_4_NOT_WORKING__', Attempts: 3",
        "request_id": "a798e682-2b89-49ea-bb6b-0c2d09d523c1"
      },
      "_id": "dccb8eab-c4da-4fc6-b24a-1d92fe96f75e"
    }
  ],
  "status": {
    "code": 300,
    "text": "MIXED_RESULT",
    "request_id": "a798e682-2b89-49ea-bb6b-0c2d09d523c1",
    "proc_id": "19ef9e49-0c6b-4dca-a52a-80726da178ed"
  },
  "statistics": {
    "processing time": 0.9733231067657471
  }
}