Multimodal Vector Database

The Challenge

The main challenge of this project was to develop a vector database that, as well as storing text, can also store images and tables. It should be possible to take a PDF file and save its elements automatically in this database, as well as being able to search through the database efficiently once the data has been saved.

Our Goals

Our main goals were the following:

Create a multimodal database
Saving text, tables and images in the database
Retrieving text, images and tables from a PDF file and passing them on to the database
Use a user prompt to fetch text, tables and images relevant to the question
Use the information obtained to ask an LLM (ChatGPT) a question about the content in the document.

Our Solution

To accomplish the main purpose of the project, we used ChromaDB to create the multimodal database. Although ChromaDB only allows you to create one database, you can create multiple collections within the database. So we created 3 collections: one for text, one for images and one for tables.
In order to test the creation of the collections, we developed a simple program that can retrieve text, images and tables from simple PDF files. The output of this program was then saved in the database, in its respective modalities.
With the information saved, we tried to use input from a user using LLMs, such as ChatGPT, to obtain the most relevant information about the document. This information is then added to the user's question and provided to the LLM to obtain an answer which is found in the body of the document.

Multimodal Vector Database

The Challenge

Our Goals

Our Solution

The Results

What our client says

Est tation latine aliquip id, mea ad tale illud definitiones.