Build Retrieval-Augmented Generation (RAG) With Milvus
Learn to manage hallucinations by building RAGs with Milvus. Developers can embed similarity searches and use unstructured data for LLMs.
Join the DZone community and get the full member experience.
Join For FreeIt's no secret that traditional large language models (LLMs) often hallucinate — generate incorrect or nonsensical information — when asked knowledge-intensive questions requiring up-to-date information, business, or domain knowledge. This limitation is primarily because most LLMs are trained on publicly available information, not your organization's internal knowledge base or proprietary custom data. This is where retrieval-augmented generation (RAG), a model introduced by Meta AI researchers, comes in.
RAG addresses an LLM's limitation of over-relying on pre-trained data for output generation by combining parametric memory with non-parametric memory through vector-based information retrieval techniques. Depending on the scale, this vector-based information retrieval technique often works with vector databases to enable fast, personalized, and accurate similarity searches. In this guide, you'll learn how to build a retrieval-augmented generation (RAG) with Milvus.
What Is RAG?
RAG simply means retrieval-augmented generation, a cost-effective process of optimizing the output of an LLM to generate context and responses outside its knowledge base without retraining the model.
This is important because LLMs are usually constrained by the cut-off period of their training data, which could lead to unpredictable, noncontextual, and inaccurate responses. RAGs address this by integrating real-time vector-based information retrieval techniques to get real-time information.
What Is Milvus?
Milvus is an open-source, high-performance vector database specially designed to manage and retrieve unstructured data through vector embeddings. Unlike other vector databases, Milvus is optimized for fast storage and offers users a flexible and scalable database with index support and search capabilities.
One thing that makes vector databases interesting is their vector embedding and data storage capabilities, which come with a real-time data retrieval system to help reduce hallucinations. By vector embedding, we mean the numerical representation of data that captures the semantic meaning of words and allows LLMs to find concepts positioned closely to them in a multidimensional space.
Steps to Building a Retrieval-Augmented Generation (RAG) Pipeline With Milvus
TL;DR: This project focuses on building a RAG system using Milvus and OpenAI's API to efficiently answer users' questions based on the developer guide in the repositories.
- You will utilize the GitHub REST API to download the developer guides from the Milvus repository.
- Process the documents into vector representation for embedding using OpenAI's embedding model.
- Create a collection in Milvus to store embeddings to enhance information retrieval and response generation.
- Use the GPT-3.5-turbo OpenAI model to generate responses.
Prerequisites/Dependencies
To follow along with this tutorial, you'll need the following:
- Python 3.9 or higher: Download it here.
- An IDE or code editor of your choice: I recommend Google Colab, but you can also use Jupyter Notebook.
Setup and Installation
Before building the RAG, you'll need to install all your dependencies. Thus, open Jupyter Notebook locally and run.
! pip install --upgrade pymilvus openai requests tqdm
This code will install and upgrade:
- pymilvus, which is the Milvus Python SDK
- openai, the OpenAI Python API library
- requests for making HTTP requests
Next, import os
and get your OpenAI API key from the OpenAI developer dashboard.
import os
os.environ["OPENAI_API_KEY"] = "sk-***********"
Preparing the Data and Embedding Model
For this project, you can use the Milvus developer guides repository as the data source for your RAG pipeline. To do that, you'll download all the files within the developer guide directory of the repo using the script below.
This script uses the GitHub REST API to retrieve and download all the developer doc content with the .md
extension and saves it in the milvus_docs
folder.
Now that you have the markdown, you'll gather all the text from the .md
files, split them, and store them in a single list called text_lines
.
import requests
api_url = "https://api.github.com/repos/milvus-io/milvus/contents/docs/developer_guides"
raw_base_url = "https://raw.githubusercontent.com/milvus-io/milvus/master/docs/developer_guides/"
docs_path = "milvus_docs"
if not os.path.exists(docs_path):
os.makedirs(docs_path)
response = requests.get(api_url)
if response.status_code == 200:
files = response.json()
for file in files:
if file['name'].endswith('.md'): # Only select markdown files
file_url = raw_base_url + file['name']
# Download each markdown file
file_response = requests.get(file_url)
if file_response.status_code == 200:
# Save the content to a local markdown file
with open(os.path.join(docs_path, file['name']), "wb") as f:
f.write(file_response.content)
print(f"Downloaded: {file['name']}")
else:
print(f"Failed to download: {file_url} (Status code: {file_response.status_code})")
else:
print(f"Failed to fetch file list from {api_url} (Status code: {response.status_code})")
Prepare the Embedding Model With OpenAI
Embedding techniques ensure that similarity, classification, and search tasks can be performed on our text. The model will transform our text into vectors of floating-point numbers and use the distance between each vector to represent how similar the texts are.
from glob import glob
text_lines = []
for file_path in glob(os.path.join(docs_path, "*.md"), recursive=True):
with open(file_path, "r", encoding="utf-8") as file:
file_text = file.read()
text_lines += file_text.split("# ")
We will use the OpenAI client to make requests to the OpenAI API and interact with its embedding models. The OpenAI documentation (linked earlier) provides more information about the embedding models.
from openai
import OpenAI openai_client = OpenAI()
Next, you will need to write a function, emb_text
, that takes your text strings and returns its embedding vector.
def emb_text(text):
return (
openai_client.embeddings.create(input=text, model="text-embedding-3-small")
.data[0]
.embedding
)
Loading and Inserting the Data Into Milvus
You can run Milvus in various ways:
- Milvus Lite is a lightweight version of Milvus that is great for small-scale projects.
- Via Docker or Kubernetes: You would, however, need a server to serve as the URI (Milvus instance).
- Via Zilliz Cloud, a fully managed cloud solution: You will need the URL and API keys for your Zilliz Cloud account.
Since we're using Milvus Lite, we'll first need to install pymilvus
to connect to Milvus using the Milvus Python SDK.
pip install -U pymilvus
Next, you'll create an instance of MilvusClient
and specify a URI ("./milvus_demo.db"
) for storing the data. After that, define your collection. Think of a collection as a data schema that serves as a vector container. This is important for effectively organizing and indexing your data for similarity searches.
from pymilvus import MilvusClient
milvus_client = MilvusClient(uri="./milvus_demo.db")
collection_name = "my_rag_collection"
if milvus_client.has_collection(collection_name):
milvus_client.drop_collection(collection_name)
Then you can test it.
test_embedding = emb_text("This is a test")
embedding_dim = len(test_embedding)
print(embedding_dim)
print(test_embedding[:10])
Next, you create a collection. By default, Milvus generates three fields:
- an ID field for unique identification
- a vector field for storing embeddings
- a JSON field for accommodating non-schema-defined data
milvus_client.create_collection(
collection_name=collection_name,
dimension=embedding_dim,
metric_type="IP", # Inner product distance
consistency_level="Strong", # Strong consistency level
)
Once done, insert the data.
from tqdm import tqdm
data = []
for i, line in enumerate(tqdm(text_lines, desc="Creating embeddings")):
data.append({"id": i, "vector": emb_text(line), "text": line})
milvus_client.insert(collection_name=collection_name, data=data)
Building the RAG
You start by specifying a question.
question = "What are the key features of Milvus that make it suitable for handling vector databases in AI applications?"
Using milvus_search
, you search for the question using semantic top-3 matches in your collection storage.
search_res = milvus_client.search(
collection_name=collection_name,
data=[
emb_text(question)
],
limit=3, # Return top 3 results
search_params={"metric_type": "IP", "params": {}},
output_fields=["text"], # Return the text field
)
Now, you process the text and use one of the GPT-3 models to generate a response to the question. You can make use of the GPT-3.5-turbo OpenAI model.
import json
retrieved_lines_with_distances = [
(res["entity"]["text"], res["distance"]) for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=4))
context = "\n".join(
[line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
)
SYSTEM_PROMPT = """
Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
"""
USER_PROMPT = f"""
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": USER_PROMPT},
],
)
print(response.choices[0].message.content)
Deploying the System
You can view the full code in this GitHub repository. To deploy your Google Colab RAG application using Docker, follow these steps:
-
First, download your Google Colab files as
.py
and.ipynb
and put them in a folder. Alternatively, you can push the file to GitHub and clone the repo.
git clone https://github.com/Bennykillua/Build_a_RAG_Milvus.git
2. Create a .env
for your variable.
OPENAI_API_KEY= sk-***********
MILVUS_ENDPOINT=./milvus_demo.db
COLLECTION_NAME=my_rag_collection ```
3. Then install your dependencies. Alternatively, you can create a requirements.txt
file.
4. Next, you will build and run the application inside a Docker container by creating a Dockerfile.
5. Start by downloading milvus-standalone-docker-compose.yml and add it to the folder with your .py
file. Name the downloaded file as docker-compose.yml.
However, if your file is not present or incorrectly downloaded, you can redownload it using the command below:
Invoke-WebRequest -Uri "https://github.com/milvus-io/milvus/releases/download/v2.0.2/milvus-standalone-docker-compose.yml" -OutFile "docker-compose.yml"
6. Start Milvus by running docker-compose up -d
. You can learn more about Milvus Standalone with Docker Compose in the documentation.
7. In your project directory, create a Dockerfile.
FROM python:3.9-slim
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
EXPOSE 8501
ENV OPENAI_API_KEY=your_openai_api_key
ENV MILVUS_ENDPOINT=./milvus_demo.db
ENV COLLECTION_NAME=my_rag_collection
# Run the app
CMD ["streamlit", "run", "build_rag_with_milvus.py"]
8. Next, build and run your Docker image:
docker build -t my_rag_app .
docker run -it -p 8501:8501 my_rag_app
Build With Milvus
LLMs are great, but they come with some limitations, like hallucinations. However, with the right tool, these limitations can be managed. This article shows how to manage hallucinations seamlessly by building RAGs with Milvus. Milvus makes it easy for developers to perform embedded similarity searches and use unstructured data for their LLMs. By using Milvus in your project, you can create accurate, informative LLMs with up-to-date information. Also, Milvus's architecture is constantly being improved since it is an open-source vector database.
If you have read this far, I want to say thank you — I appreciate it! You can connect with me on LinkedIn or leave a comment.
Opinions expressed by DZone contributors are their own.
Comments