Retrieval-Augmented Generation (RAG) With Milvus and LlamaIndex
Learn to build an RAG application with Milvus and LlamaIndex, which can quickly handle big data and retrieve relevant information, especially when adopted together.
Join the DZone community and get the full member experience.
Join For FreeRetrieval-augmented generation (RAG) applications integrate private data with public data and improve large language models' (LLMs) output, but building one is challenging as private data can be unstructured and siloed. You'll also need a reliable and efficient way to retrieve relevant information from the knowledge base. This might seem like an uphill battle, but it's doable with tools like Milvus and LlamaIndex, which can quickly handle big data and retrieve relevant information, especially when adopted together.
What Are Milvus and LlamaIndex?
To build an RAG application that optimizes query efficiency, you need a scalable, flexible vector database and an indexing algorithm. Before showing you how to build one, we'll quickly discuss Milvus and LlamaIndex.
What Is Milvus?
Milvus is an open-source vector database for storing, processing, running, indexing, and retrieving vector embedding across various environments. This platform is popular among generative AI developers because of its similarity search in massive datasets of high-dimensional vectors and high scalability. Besides its scalability and high performance, developers can use machine learning (ML), build recommendation systems, and mitigate hallucinations in LLMs.
Milvus offers three deployment options:
- Milvus Lite is a Python library and ultra-lightweight version of Milvus that works great for small-scale local experiments.
- Milvus Standalone is a single-node deployment that uses a client-server model, the MySQL equivalent of Milvus.
- Milvus Distributed is Milvus's distributed mode, which adopts a cloud-native architecture and is great for building large-scale vector database systems.
What Is LlamaIndex?
LlamaIndex is an orchestration framework that simplifies building LLM applications by integrating private, domain-specific, and public data. It achieves this by augmenting external data and storing it as vectors in a vector database to be used for knowledge generation, complex operation search, and reasoning. Besides storing and data ingestion, LlamaIndex comes in handy when indexing and querying data.
The enterprise version comprises LlamaCloud and LlamaParse. There's also an open-source package with LlamaHub (their data connectors), Python, and TypeScript packages.
What Is RAG?
Retrieval-augmented generation (RAG) is an AI technique that combines the strength of generative LLMs with traditional information retrieval systems to enhance accuracy and reliability. This is important because it exposes your LLMs to external, real-time vector-based information outside their knowledge bases, addressing non-contextual, inaccuracy, and hallucination issues.
Building a RAG System Using LlamaIndex and Milvus
We'll show you how to build a retrieval-augmented generation system using LlamaIndex and Milvus. First, you'll make use of data from the Litbank repository. Then, we'll index the data using the llama_index
library and Milvus Lite. Next, we'll process the documents into vector representation using the OpenAI API and finally, query data and filter it through the metadata.
Prerequisites and Dependencies
To follow along with this tutorial, you will need the following:
- Python 3.9 or higher.
- Any IDE or code editor. We recommend Google Collab, but you can also use Jupyter Notebook.
- An OpenAI developer account so you can access your OpenAI API key.
Setup and Installation
Before building the RAGs, you'll need to install all your dependencies.
%pip install pymilvus>=2.4.2
%pip install llama-index-vector-stores-milvus
%pip install llama-index
These code snippets will install and upgrade the following:
pymilvus
— is the Milvus Python SDK.llama-index-vector-stores-milvus
— provides integration between the LlamaIndex and Milvus vector storellama-index
— the data framework for indexing and querying LLMs
Next, you need to set up your OpenAI API to access their multiple advanced language models that have been trained for various natural language processing (NLP) and image-generative AI tasks. However, before you can use the OpenAI API, you must create an OpenAI developer account.
- Visit the API keys section of your OpenAI developer dashboard.
- Click on “Create a new secret key” to generate an API key.
- Copy the key.
Then, head over to your Google Collab notebook.
import openai
openai.api_key = "OpenAI-API-Key"
Generating Data
For your dataset, you can use LitBank, a repository of annotated datasets of a hundred works of English-language fiction. For this project, we'll use "The Fall of the House of Usher" by Edgar Allan Poe and "Oliver Twist" by Charles Dickens. To achieve this, create a directory to retrieve and save your data.
! mkdir -p 'data/'
! wget 'https://raw.githubusercontent.com/dbamman/litbank/refs/heads/master/original/730_oliver_twist.txt' -O 'data/730_oliver_twist.txt'
! wget 'https://raw.githubusercontent.com/dbamman/litbank/refs/heads/master/original/932_the_fall_of_the_house_of_usher.txt' -O 'data/932_the_fall_of_the_house_of_usher.txt'
Then, generate a document from the novel using a SimpleDirectoryReader
class from llama_index
library library.
from llama_index.core import SimpleDirectoryReader
# load documents
documents = SimpleDirectoryReader(
input_files=["data/730_oliver_twist.txt"]
).load_data()
print("Document ID:", documents[0].doc_id)
Indexing Data
Next, index over your document to reduce search latency and enable semantic similarity search for quick retrieval of relevant documents based on meaning and context.
You can do this using the llama_index
library. All you need to do is specify the file path and storage configuration and set your vector embedding dimensionality. You'll also set the URI of Milvus Lite as your local file. Alternatively, you can run Milvus via Docker, Kubernetes, or Zilliz Cloud, Milvus’s fully managed cloud solution. These alternatives are best for large projects.
# Create an index over the documents
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore
vector_store = MilvusVectorStore(uri="./milvus_demo.db", dim=1536, overwrite=True)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
Querying Data
You'll need to leverage the indexed documents as a knowledge base for asking questions. This will allow your RAG to have conversational AI capabilities, quickly retrieve relevant answers, and have a contextual understanding of conversations.
query_engine = index.as_query_engine()
res = query_engine.query("how did Oliver twist grow up?")
print(res)
Try asking more questions about the novel.
res = query_engine.query("What motivates Oliver to ask for more food in the workhouse?")
print(res)
You can try more tests like overwriting any previously asked information.
from llama_index.core import Document
vector_store = MilvusVectorStore(uri="./milvus_demo.db", dim=1536, overwrite=True)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
[Document(text="The number that is being searched for is ten.")],
storage_context,
)
query_engine = index.as_query_engine()
res = query_engine.query("how did Oliver twist grow up?")
print(res)
Let’s try one more test to add additional data to an already existing index.
del index, vector_store, storage_context, query_engine
vector_store = MilvusVectorStore(uri="./milvus_demo.db", overwrite=False)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
query_engine = index.as_query_engine()
res = query_engine.query("What is the number?")
print(res)
res = query_engine.query("how did Oliver twist grow up?")
print(res)
Filtering Metadata
Metadata filtering allows you to narrow search results that match specific criteria based on metadata. This way, you can search for documents based on various metadata fields such as author, date, and tag. This is particularly useful when you have a large dataset and need to find documents that meet certain attributes. You can load both documents using the code snippet below.
from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters
# Load all the two documents loaded before
documents_all = SimpleDirectoryReader("./data/").load_data()
vector_store = MilvusVectorStore(uri="./milvus_demo.db", dim=1536, overwrite=True)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents_all, storage_context)
If you only want to retrieve documents from The Fall of the House of Usher, use the following script:
filters = MetadataFilters(
filters=[ExactMatchFilter(key="file_name", value="932_the_fall_of_the_house_of_usher.txt")]
)
query_engine = index.as_query_engine(filters=filters)
res = query_engine.query("What distinctive physical feature does Roderick Usher exhibit in The Fall of the House of Usher?")
print(res)
If you only want to use Oliver Twist, you can use this script:
filters = MetadataFilters(
filters=[ExactMatchFilter(key="file_name", value="730_oliver_twist.txt")]
)
query_engine = index.as_query_engine(filters=filters)
res = query_engine.query("What challenges did Oliver face?")
print(res)
You can explore the full project code on GitHub along with the interactive Google Collab notebook.
Conclusion
In this post, you learned how to build a RAG application with LlamaIndex and Milvus. Milvus offers capabilities such as image search, and since Milvus Lite is an open-source project, you can make your own contributions as well.
Opinions expressed by DZone contributors are their own.
Comments