1 Developer
The main challenge of this project was to develop a vector database that, as well as storing text, can also store images and tables. It should be possible to take a PDF file and save its elements automatically in this database, as well as being able to search through the database efficiently once the data has been saved.
Our main goals were the following:
To accomplish the main purpose of the project, we used ChromaDB to create the multimodal database. Although ChromaDB only allows you to create one database, you can create multiple collections within the database. So we created 3 collections: one for text, one for images and one for tables.
In order to test the creation of the collections, we developed a simple program that can retrieve text, images and tables from simple PDF files. The output of this program was then saved in the database, in its respective modalities.
With the information saved, we tried to use input from a user using LLMs, such as ChatGPT, to obtain the most relevant information about the document. This information is then added to the user's question and provided to the LLM to obtain an answer which is found in the body of the document.