How to Build an Image Search Application With OpenAI CLIP & PostgreSQL in JavaScript
With over 3.2 billion images shared online daily, the demand for efficient image search capabilities has never been higher. The opportunity to create a powerful image search engine spans many fields, from e-commerce to social media, driven by this high-velocity data.
Historically, image search solutions primarily relied on keyword-based methods, where images were matched based on captions or tags. However, these methods often fell short because computers couldn't interpret the content of images beyond their associated text.
Thanks to OpenAI's CLIP model, systems can now understand both visual and textual information. In this article, we will build an image search application using the OpenAI CLIP model and a managed PostgreSQL database with Timescale in JavaScript.
Image Search Application Tools: OpenAI CLIP and PostgreSQL
OpenAI CLIP (Contrastive Language–Image Pre-training) is a neural network that learns visual concepts from natural language supervision.
It can classify images by leveraging text descriptions and achieving zero-shot capabilities, meaning it can recognize visual categories without direct training on specific datasets.
CLIP key features
1. Efficiency: CLIP is trained on diverse Internet-sourced text-image pairs, reducing the need for costly, labor-intensive labeled datasets.
2. Flexibility: the model can perform various visual classification tasks by simply providing textual descriptions of the categories.
3. Robust performance: it also excels in recognizing objects in varied settings and abstract depictions, outperforming traditional models on many benchmarks.
CLIP use cases
1. Image search: it enables more accurate and flexible image retrieval by understanding natural language queries.
2. Content moderation: OpenAI CLIP helps identify inappropriate content by recognizing complex visual patterns.
3. Automated tagging: CLIP enhances the tagging accuracy for extensive image collections, which is helpful in social media and digital asset management.
4. Object recognition: the OpenAI model can be applied in diverse fields, from medical imaging to autonomous driving, improving recognition tasks without extensive retraining.
Why use PostgreSQL as a vector database
PostgreSQL is an open-source relational database system known for its robust performance, extensibility, and support for advanced data types. PostgreSQL can handle high-dimensional vector data efficiently with extensions like pgvector, pgai, and pgvectorscale. You can install these open-source extensions on your machine or easily access them on any database service in the Timescale Cloud PostgreSQL platform.
- pgvector: provides native support for vector data in PostgreSQL, allowing for efficient storage and retrieval
- pgai: integrates artificial intelligence (AI) workflows directly into PostgreSQL, enabling seamless AI and machine learning operations
- pgvectorscale: builds on pgvector to offer enhanced performance and scalability for handling large-scale vector data
Benefits:
- Efficient vector storage: Timescale Cloud’s open-source AI stack optimizes the management and querying of high-dimensional vector data.
- Seamless AI integration: it also facilitates advanced AI and machine learning operations directly within the database via pgai.
- Scalability: it supports large datasets and high-performance queries with pgvectorscale, making it ideal for expanding image search applications.
Implementing an Image Search Application in JS
Let’s start building our image search application using JavaScript. The client-side will be developed with React JS, while Node JS and Express will handle the server-side logic. But first, the setup.
Setting Up the Development Environment
We will break down the setup for client-side and server-side separately:
Prerequisites for client-side setup
Before starting with the frontend setup for the image search engine, ensure you have the following prerequisites:
1. Node.js and npm: You’ll need to have Node.js and npm installed on your machine. You can download and install them from Node.js.
2. React: You’ll need basic knowledge of React and familiarity with creating and managing React components.
3. Axios: We will use Axios to make HTTP requests. Ensure you have Axis installed on your project.
Step-by-step guide
Let’s install the required libraries:
1. Set up your React project: if you haven't already set up a React project, you can do so using Create React App.
2. Install Axios: install Axios to make HTTP requests.
3. Validation: start your React application to see the template by using the following command.
Once started, the application will open in the browser on localhost
, displaying as follows:
We will return to this later to create the client side of the image search app. For now, let's set up the server side.
Prerequisites for server-side setup
Before starting, ensure you have the following prerequisites in place for the server-side setup:
1. Node.js and npm: These prerequisites are required for the backend setup, but if you already installed them for the front end, you can skip this step.
2. PostgreSQL: Setting it up for the first time can be tedious and time-consuming, so we will use Timescale Cloud, which provides managed PostgreSQL database services.
3. Pgvector: This extension is installed in your PostgreSQL database to handle vector operations. Once connected to Timescale Cloud, you will need to enable the pgvector extension. We will cover this later. The npm package for pgvector will also be installed to cater to the types required to store embeddings in PostgreSQL.
4. Pg library: A PostgreSQL client for Node.js. Ensure it is installed along with the necessary PostgreSQL driver (pg
).
5. @xenova/transformers library: This converts text and image queries into embeddings using the CLIP model.
6. Cors: Cross-Origin Resource Sharing (CORS) is an HTTP header-based mechanism that allows a server to indicate any origins other than its own from which a browser should permit loading resources. To allow an origin to request a server, we need to install it.
Step-by-step guide
Let’s install the required libraries:
1. Initialize your Node.js project to set up the server, create a folder, and run npm init
to initialize the project.
2. Install required packages: install the following packages required for the application.
3. Create index.js
: create a file named index.js
and a basic Express server.
The code sets up a basic Express.js server. It listens on port 3000 and responds with Hello World! when the root URL is accessed. The server uses the express.json()
middleware to parse JSON request bodies. Ensure that the type
field is set to "module"
in the package.json
file:
4. Testing your API: start the server to test the API.
We have our basic client and server side ready for the image application. Here is the project structure and file organization:
It’s time to discuss the most important ingredient of the recipe: OpenAI’s CLIP model.
Integrating OpenAI CLIP
OpenAI’s CLIP model is open-source, eliminating the need for preliminary setup steps. However, most resources are in Python rather than JavaScript. The Xenova npm library, Transformers.js, addresses this gap. Transformers.js matches Hugging Face's Transformers Python library, enabling the same pre-trained models with a similar API. Supported tasks include the following:
- Natural Language Processing (NLP): text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation
- Computer vision: image classification, object detection, and segmentation
- Audio: automatic speech recognition and audio classification
- Multimodal: zero-shot image classification
CLIP model in JS
Let's use the previously installed CLIP model. In this section, we will generate embeddings for a list of images using the CLIP model from xenova/transformers
and return them to the calling module. This process will assist in inserting the embeddings into our database. Later, we will also generate embeddings for text to aid in the application's search query.
First, let’s create a model.js
file.
Importing required libraries
These libraries handle image preprocessing, tokenization, model inference, and PostgreSQL vector operations.
Defining processor and vision model promises
The models are defined outside the functions to load them only once and reuse them, enhancing performance.
Vision embedding function
The code below defines an asynchronous function, visionEmbeddingGenerator
, that generates image embeddings from a given image file path. The function first waits for the processor
and visionModel
to be loaded. A processor prepares the image data by resizing, normalizing, and converting it into a format suitable for the model to generate embeddings.
It then reads the image using RawImage.read(image_path)
and processes it using the processor
. Next, it computes the embeddings by passing the processed image through the visionModel
. If any errors occur during these steps, they are caught and logged. Finally, the function returns the image embeddings as an array of data.
Text embedding function
First, we will define the tokenizer and the model in the following way.
The textEmbeddingGenerator
function processes a given text string to generate its embeddings using the CLIP model. First, it initializes the tokenizer and text model using previously defined promises. A tokenizer prepares the text data by breaking it into tokens, adding padding and truncating, if necessary, to ensure it is in the correct format for the model to generate embeddings.
The text is then tokenized with padding and truncation options. The tokenized inputs are injected into the text model to generate the embeddings.
The section above walked you through setting up a function in model.js
to generate image and text embeddings using the CLIP model.
Setting Up PostgreSQL as a Vector Database and Data Insertion
In this section, we will set up, create, and insert data in the PostgreSQL database hosted on Timescale.
Setting up PostgreSQL
A critical component for an image search application is a vector database, which enables querying indexed image embeddings to retrieve the most relevant results. This tutorial will use PostgreSQL and its extension pgvector, self-hosted on Timescale, for efficient image search.
Here are a few reasons we recommend using Timescale for this:
- Enhanced pgvector: faster and more accurate similarity search on large-scale vectors using StreamingDiskANN.
- Quick time-based vector search: automatic time-based partitioning and indexing for efficient vector searches.
- Familiar SQL interface: easy querying of vector embeddings using standard SQL.
To start, sign up for Timescale Cloud, create a new database, and follow the provided instructions. For more information, refer to the Get started with Timescale guide. Please take into account that the database creation might take a couple of minutes, make sure it appears as available in your 🔧 Services section before attempting to connect.
After signing up, connect to the Timescale database by providing the service URI, which can be found under the service section on the dashboard. The URI will look something like this:
postgres://tsdbadmin:@.tsdb.cloud.timescale.com:/tsdb?sslmode=require
To obtain the password, go to Project settings and click on Create credentials.
This setup guide will help you configure your environment for handling vector operations. Once connected to the Timescale Cloud, we need to enable the pgvector extension, which we will cover shortly.
Connecting to the server
To talk to the database, we will use pg
, a non-blocking PostgreSQL client for Node.js
.
In the index.js
, import the following, and initialize the pool. The pool will ensure that connections to the database are established to create a table, insert data, and perform queries later. One benefit of using a connection pool is improved performance by reusing existing connections instead of opening and closing new ones for each request.
Now, to connect to the cloud PostgreSQL, use the credentials obtained from the previous step:
Ensure the connection with the client.connect()
in the following way:
After the connection, let’s create a table.
Table creation
We will define the table creation logic in a separate file named database.js
. All database-related operations will use this file.
Here’s the explanation of the code above:
- Function definition: It defines an asynchronous function
createTableIfNotExists
that takes aclient
as an argument. - Connect to database: It establishes a connection to the PostgreSQL database using the provided
client
. - Install pgvector extension: It ensures that the
pgvector
extension is installed in the database by executingCREATE EXTENSION IF NOT EXISTS vector
. - Check table existence: This function executes a query to check if the
Search_table
exists in thepublic
schema. If it does not exist, this function creates it. - Create table: If the table does not exist, it executes a query to create the
Search_table
with the following columns:- id: a primary key with auto-increment
- path: a text field for storing the image path
- embedding: a vector field with 512 dimensions for storing image embeddings
- Error handling: It catches and logs any errors that occur during the process.
Release connection: The client.release()
method is used to return a database client back to the connection pool after it has been used.
Since this code will only be called once, we can invoke the function directly in database.js
. We can do that when inserting the data.
Data insertion
This section will discuss the dataset used for the image application and how to insert it into our database.
Flickr30k
The Flickr30k dataset is a well-known benchmark for sentence-based image descriptions. It contains 31,783 images of people engaging in everyday activities and events. It is widely used for evaluating models that generate sentence-based portrayals of images. The dataset is available on Kaggle and can be easily downloaded. As this is an extensive image dataset, this demo is based on a sample of 100 images.
Insertion logic
The following code is a part of database.js
:
The insertInTable
function connects to a PostgreSQL database and iterates over a list of image file paths. For each path, it computes image embeddings using the visionEmbeddingGenerator
function and inserts these embeddings, along with the file path, into the Search_table
table. It handles errors that occur while processing each image and ensures that the database connection is closed properly once all insertions are complete. This approach maintains robust error handling and efficient database management throughout the insertion process.
Let's include a function in utils.js
to list the files in our dataset directory. We will use this in database.js
to insert the images into the database. Here’s the utility function:
Now we can import it in database.js
and execute the insertion:
Note: The preceding code remains unchanged in the file.
Now that this process is complete, we have inserted the images and their embeddings in the table, which will be retrieved depending on the query.
Building the Image Search Application
Search API
In this section, we will develop a POST route /search
using Express.js
that accepts a textual query from the user, transforms it into embeddings, and performs a database search. CLIP, a neural network model, combines image and text embeddings into a unified output space, allowing for direct comparisons between the two modalities within a single model.
The app.post('/search')
route processes POST requests to perform an image search based on a textual query. When a request is received, the code first connects to the PostgreSQL database. It then generates embeddings for the search text using the `textEmbeddingGenerator` function.
These embeddings are converted into a format compatible with PostgreSQL using pgvector.toSql
. The route then executes a similarity search against the Search_table
table in the database, ordering results based on their similarity to the query embeddings using the <->
operator. It limits the results to the top five matches. The matching image paths are returned as a JSON response. If an error occurs during this process, a 500 status code is sent, and the database connection is closed in the finally
block.
After running the server using node index.js
, we can check our endpoint using Postman, which is a platform that helps developers build and use APIs. If that seems a hassle, we can simply use wget
or curl
. Here’s how we can make a POST request with curl
:
If you are using Postman, you will need a desktop version. After logging in and creating a workspace, let’s request our API:
1. Add a query parameter with the key searchText
and the value old man
.
2. Configure the request method as POST.
3. Set the URL to http://localhost:3000
where the server is listening.
Here are the paths retrieved from the database after semantic search:
Let's verify one of the images from the paths to ensure that the retrieved images match the query.
Now, our server is ready to search, given the query. Let’s complete it with our client side.
Final Touches
In this section, we will create a React application that a client will use to interact with the Search API. Here’s how you can create the client side:
The first step is to create a component file named SearchBar.js
, which will take the user's input. Let’s write some code in it.
This React component, SearchBar
, allows users to input a search query and retrieve image results from a server. It manages the search text, results, loading state, and any errors encountered during the search. Let’s fill in with the useEffect
hook to query the Search API.
This code snippet uses React's useState
and useEffect
hooks to manage search results and errors. When the clicked
state changes, useEffect
triggers an asynchronous search function that sends a POST request to http://localhost:3000/search
with the search text. Successful responses update the `results` state, and any errors update the error
state. The clicked
state is reset to prevent repeated searches.
Now, let’s look at the complete SearchBar
component. Please note that additional components and custom hooks have been created to handle dynamic image imports. However, due to the scope of the article, we will skip the explanation. If you want, you can explore this further in our GitHub repository.
Here’s the complete component:
The SearchBar.js
component is also responsible for displaying images on the page. After retrieving the image paths from the database, it selects the corresponding assets and displays them. To dynamically add image imports in React, we have created the useImage
effect and the Image
component.
Note: An assets
folder is created within the src
directory, which contains the image dataset.
Let’s create a component to display the image, as you can see in the SearchBar.js
:
The exported component can be imported into App.js
, which is the main file of the React application. Here’s how to import it:
With all of the coding complete, here’s what everything should look like. Let’s see if we can find cyclists:
Hooray! 🎉 We have our image search engine ready to be used. At present, it is configured to handle 100 images efficiently. However, there is substantial room for improvement. We can upload the entire dataset by leveraging Timescale-hosted PostgreSQL. It offers robust and scalable solutions, ensuring that our search engine will perform optimally as we scale up.
Conclusion
In this blog post, we explored how to build an image search application using OpenAI CLIP and managed PostgreSQL. We covered setting up the PostgreSQL database with pgvector, creating functions to generate and store embeddings, and building a React front end to query and display search results. PostgreSQL is all you need to build your AI applications.
We encourage you to try building your own image search engine—take a look at our GitHub repository for guidance—using Timescale Cloud’s managed PostgreSQL platform and open-source AI stack (pgvector, pgai, and pgvectorscale). It unlocks quicker and more accurate similarity searches on large-scale vectors and lightning-fast time-based vector searches with much welcome SQL familiarity. Create a Timescale Cloud account and start for free today.
You can access pgai and pgvectorscale on any database service on the Timescale Cloud PostgreSQL platform or install them manually according to the instructions in the pgai and pgvectorscale GitHub repos (⭐s welcome!).