The Power and Pitfalls of Vector-Based Image Search

The Power and Pitfalls of Vector-Based Image Search


of any e-commerce platform. Users expect to see relevant results and, they want them fast. Because of this, e-commerce teams constantly work to improve both performance and perceived search quality to keep users happy and prevent churn.

Please take a look at the following three images.

They look similar from a visual perspective but the items in these images are completely unrelated. This contrast demonstrates both the capabilities and inherent shortcomings of image search.

All images used in this article are from our internal database that we use for experimentation and testing.

One use case of image search is detecting duplicate products. While product titles and descriptions can certainly be used to find duplicates, some listings feature slightly (or even entirely) different titles while using the exact same images. Detecting these listings is also very important.

Because e-commerce platforms typically serve millions of products, we need efficient tools and methods for performing any type of search at scale.

In this article, I will show you how to setup a vector database of image vector embeddings and perform searches in this database. I will also explain in detail both the advantages and limitations of vector-based image search.

Here is a rough outline of the article:

  • Convert images into vectors: Transform visual data into searchable embeddings.
  • Create a Milvus collection: Set up a Milvus collection, which is the primary logical unit of data organization in the Milvus vector database.
  • Perform the image search: Search for target images in this collection.
  • Interpret the results: Go over some examples and interpret search results.

Let’s start with getting our vectors.

Convert images to vectors

The first step is to convert images into vectors, which are numerical representations of visual data. Vector size is critical and the optimum size depends on the application. The lengths of 128, 512, or 768 dimensions are common choices. When we increase the size, we capture more information and expect more accurate results but it comes at the cost of larger storage size and possibly more latency in search.

We need an embedding model to convert images to vectors. We can train our own model but there are several ready-to-use models available, both free and paid.

For example, the following code block takes a JPEG image and converts it into a 512-dimensional vector using the open-source clip-ViT-B-32 model.

from PIL import Image
from sentence_transformers import SentenceTransformer

sku_image = Image.open("sample_image.jpeg")
model = SentenceTransformer('clip-ViT-B-32')
image_vector = model.encode(sku_image)

type(image_vector), image_vector.shape
(numpy.ndarray, (512,))

The image_vector variable is a Numpy array of size 512.

Create a Milvus Collection

Milvus is a vector database and a collection in Milvus is a two-dimensional table with fixed columns and rows. Each column represents a field, and each row represents an entity, which is an image in our case.

We will create a collection with two fields: an id field (such as a product SKU) and a corresponding vector field for that SKU’s image.

There are several ways to create and interact with a collection. I prefer to use Python whenever I can so I’ll go with the pymilvus module.

The first step is to create a client object. This is the way to connect to your vector database in Milvus.

from pymilvus import MilvusClient, DataType

client = MilvusClient(uri="http://....") # your milvus db uri

Then, we define a schema for our collection:

schema = client.create_schema(auto_id=False, enable_dynamic_field=True)


# Add fields to schema
schema.add_field(field_name="sku_id", datatype=DataType.VARCHAR, max_length=512, is_primary=True)
schema.add_field(field_name="image_vector", datatype=DataType.FLOAT_VECTOR, dim=512)

The schema has two fields: sku_id and image_vector.

Then, we can create the collection using the create_collection method:

collection_name = "test_collection"

client.create_collection(
    collection_name=collection_name,
    schema=schema,
    index_params=index_params
)

We can now add an index to our fields. An index is crucial for a Milvus collection, or any vector database. It accelerates search speeds and reduces query latency, especially when working with large vector datasets.

One option to add index is to use the create_index function.

index_params = client.prepare_index_params()

# index for the image vector
index_params.add_index(
    field_name="image_vector",
    index_name="image_vector_idx",
    index_type="IVF_FLAT",
    metric_type="COSINE",  # Using COSINE similarity (common for images). Can also be L2 or IP.
)

# index for "sku_id" (primary key)
index_params.add_index(
    field_name="sku_id",
    index_name="sku_id_idx",
    index_type="INVERTED"
)

collection_name = "test_collection"

client.create_index(
    collection_name=collection_name,
    index_params=index_params
)

In the final step, we load the collection and check its status.

client.load_collection(collection_name=collection_name)

# check load status
res = client.get_load_state(
    collection_name=collection_name
)

print(res)
{'state': }

We have created the collection, but it is currently empty. To confirm that our new collection was successfully created, we can list all collections in the database using the list_collections method.

client.list_collections()

['test_collection']

The next step is to insert entities (i.e. skus and image vectors).

Insert Entities

We can now load data into our collection. To do this, the data must be formatted as a dictionary, where the keys match the collection’s field names. We can then use a list of these dictionaries to insert multiple entities at once.

I have vector data stored in a Pandas DataFrame as shown below:

We can use the to_dict method to convert this DataFrame to a list of dictionaries where each dictionary represents an entity in our collection.

df.to_dict(orient="records")

[{'sku_id': 'HBCV00009LIR5S',
  'image_vector': array([ 0.1206549 ,  0.00597879, -0.07224327,  0.02327867, -0.09490156,
          0.02150885,  0.10642719, -0.10139938,  0.03159734,  0.05613545,
         -0.07615539, -0.15523671, -0.10006154,  0.05045145,  0.07733533,
         -0.03749327, -0.02301577,  0.13337888,  0.00096778,  0.05047926,
...

Once we have all the entities as a list of dictionaries, we can use the insert method to load entities into our collection:

# convert dataframe to a list of dictionaries    
data = df.to_dict(orient="records")

# insert data into collection
res = client.insert(
    collection_name=collection_name,
    data=data
)

print(res)

{'insert_count': 10000, 'ids': ['HBCV00009LIR5S', 'HBCV00001U46VH' ...]

We’ve just loaded 10000 entities but collections typically hold much more data (e.g. several millions of entities). We cannot load millions of vectors at once. In such cases, we can divide the data into batches and then load into the collection sequentially. For example, the for loop below iterates over the entire DataFrame and inserts 10000 entities per batch.

batch_size = 10000

for i in range(0, len(df), batch_size):

    data = df.iloc[i:i+batch_size,].to_dict(orient="records")

    res = client.insert(
        collection_name=collection_name,
        data=data
    )

We now have a collection with sku and vector data. The next step is to search for images in this collection.

Perform the image search

To search for an image, we first need to convert it into a vector of the same size as the vectors stored in our collection.

Milvus has different search methods such as basic search, range search, hybrid search. We will do a basic search, which is actually an Approximate Nearest Neighbor (ANN) search. It locates a subset of vector embeddings based on the query vector used in the search, compares the query vector to the vectors in the subset, and returns most similar results.

The following code block reads a JPEG image, converts it into a vector using the same model we used when creating the collection, and then searches for this image vector within the collection.

collection_name = "test_collection"

model = SentenceTransformer('clip-ViT-B-32')
search_image = Image.open("image_to_search.jpeg")
search_image_vector = model.encode(search_image)

res = client.search(
    collection_name=collection_name,
    anns_field="image_vector",
    data=[search_image_vector],
    limit=3,
    search_params={"metric_type": "COSINE"}
)

The anns_field parameter specifies the vector field to be used in the search. We then pass the target image vector to the data parameter. The limit parameter tells Milvus how many results to return (i.e. 3 returns the most similar 3 vectors in the collection).

The output of the search method is a list of dictionaries as follows:

print(res[0])

[{'my_id': 'HBCV0000BLJIBF', 'distance': 0.9814613848423661, 'entity': {}}
{'my_id': 'HBCV00003Z49OT', 'distance': 0.9504563808441162, 'entity': {}}
{'my_id': 'HBCV0000DCK7ML', 'distance': 0.9104360342025757, 'entity': {}}]

Checking the results

Image search using vectors is highly efficient and accurate. If the exact same image (or a very similar one) exists in the collection, you are almost guaranteed to find it.

In the examples below, the leftmost image is the one being searched for, and the remaining images are the top search results.

Same images

In the following examples, multiple occurrences of the same image exist in the collection and vector search was able to detect them.

Similar and related images

In the examples below, we can clearly see that the retrieved images are highly similar and directly related to the target image.

Similar but unrelated images

In some cases, vector search finds images visually similar but completely unrelated in context. Here is an example of such a case:

The searched image is a bath sponge but results include plush toys that look similar.

Here are a couple more examples that include visually similar but but completely unrelated images:

Image search in a vector database is highly useful across many cases. We can easily detect identical images as well as images with a similar visual appearance. However, as seen in the example above, visual similarity does not always indicate that the products are the same or even related.

Another shortcoming of image search for e-commerce platforms is that entirely different products may share the exact same image. For instance, consider smartphones: many distinct models look nearly identical from the outside. Relying purely on image search without additional context, such as the product title or model name, can easily lead to misleading results.

One solution to overcome the aforementioned shortcomings of image search, we can implement implement hybrid search, which combines the image and text vectors. Hybrid search ensures that results are not only visually similar but also conceptually accurate.

In the next article, I will explain step-by-step how to create a Milvus collection with multiple vector field and perform a hybrid search. We will also test this approach using the same examples from this article to see how hybrid search eliminates irrelevant search results.

Thank you for reading! Please let me know if you have any feedback.



Source link