Using Embedded Milvus to Instantly Install and Run Milvus With Python
Milvus 2.1.0 introduces embedded Milvus to empower more Python developers while they are installing and using Milvus the vector database.
Join the DZone community and get the full member experience.
Join For FreeMilvus is an open-source vector database for AI applications. It provides a variety of installation methods, including building from source code and installing Milvus with Docker Compose/Helm/APT/YUM/Ansible. Users can choose one of the installation methods depending on their operating systems and preferences. However, there are many data scientists and AI engineers in the Milvus community who work with Python and yearn for a much simpler installation method than the currently available ones.
Therefore, we released embedded Milvus, a user-friendly Python version, along with Milvus 2.1 to empower more Python developers in our community. This article introduces what embedded Milvus is and provides instructions on how to install and use it.
An Overview of Embedded Milvus
Embedded Milvus enables you to quickly install and use Milvus with Python. It can quickly bring up a Milvus instance and allows you to start and stop the Milvus service whenever you wish to. All data and logs are persisted even if you stop embedding Milvus.
Embedded Milvus itself does not have any internal dependencies and do not require pre-installing and running any third-party dependencies like etcd, MinIO, Pulsar, etc.
Everything you do with embedded Milvus and every piece of code you write for it can be safely migrated to other Milvus modes - standalone, cluster, cloud version, etc. This reflects one of the most distinctive features of embedded Milvus -
"Write once, run anywhere."
When to Use Embedded Milvus?
Embedded Milvus and PyMilvus are constructed for different purposes. You may consider choosing embedded Milvus in the following scenarios:
- You want to use Milvus without installing Milvus in any of the ways provided here.
- You want to use Milvus without keeping a long-running Milvus process in your machine.
- You want to quickly use Milvus without starting a separate Milvus process and other required components like etcd, MinIO, Pulsar, etc.
It is suggested that you should NOT use embedded Milvus:
- In a production environment. (To use Milvus for production, consider Milvus cluster or Zilliz cloud, a fully managed Milvus service.)
- If you have a high demand for performance. (Comparatively speaking, embedded Milvus might not provide the best performance.)
A Comparison of Different Modes of Milvus
The table below compares several modes of Milvus: standalone, cluster, embedded Milvus, and the Zilliz Cloud, a fully managed Milvus service.
Embedded Milvus |
Milvus Standalone |
Milvus Cluster |
Zilliz Cloud (a fully managed Milvus service) |
|
Production ready? |
Not suggested for production |
Yes |
Yes |
Yes |
Features |
All of Milvus 2.1 features |
All of Milvus 2.1 features |
All of Milvus 2.1 features |
All of Milvus 2.1 features |
SDK support |
Python |
Python, Java, Go |
Python, Java, Go |
Python, Java, Go |
Docker required? |
No |
Yes |
Yes |
Yes |
Kubernetes required? |
No |
No |
No, but highly suggested |
Yes |
External S3 like MinIO required? |
No |
Yes |
Yes |
Yes |
External etcd required? |
No (etcd is embedded) |
Yes |
Yes |
Yes |
External Pulsar/Kafka required? |
No |
No |
Yes |
Yes |
Performance |
High |
Very high |
Very high |
Very high |
Availability |
Medium |
High |
High (without kubernetes) Very high(with kubernetes) |
Very high |
Scalability |
Low |
Low |
High |
Very high |
How to Install Embedded Milvus?
Before installing embedded Milvus, you need to first ensure that you have installed Python 3.6 or later. Embedded Milvus supports the following operating systems:
- Ubuntu 18.04
- Mac x86_64 >= 10.4
- Mac M1 >= 11.0
If the requirements are met, you can run $ python3 -m pip install milvus
to install embedded Milvus. You can also add the version in the command to install a specific version of embedded Milvus. For instance, if you want to install the 2.1.0 version, run $ python3 -m pip install milvus==2.1.0
. And later, when a new version of embedded Milvus is released, you can also run $ python3 -m pip install --upgrade milvus
to upgrade embedded Milvus to the latest version.
If you are an old user of Milvus who has already installed PyMilvus before and wants to install embedded Milvus, you can run $ python3 -m pip install --no-deps milvus
.
After running the installation command, you need to create a data folder for embedded Milvus under /var/bin/e-milvus
by running the following command:
sudo mkdir -p /var/bin/e-milvus
sudo chmod -R 777 /var/bin/e-milvus
Start and Stop Embedded Milvus
When the installation is successful, you can start the service.
If you are running embedded Milvus for the first time, you need to import Milvus and set up embedded Milvus first.
$ python3
Python 3.9.10 (main, Jan 15 2022, 11:40:53)
[Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import milvus
>>> milvus.before()
please do the following if you have not already done so:
1. install required dependencies: bash /var/bin/e-milvus/lib/install_deps.sh
2. export LD_PRELOAD=/SOME_PATH/embd-milvus.so
3. export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib:/usr/local/lib:/var/bin/e-milvus/lib/
>>>
If you have successfully started embedded Milvus before and come back to restart it, you can directly run milvus.start()
after importing Milvus.
$ python3Python 3.9.10 (main, Jan 15 2022, 11:40:53)
[Clang 13.0.0 (clang-1300.0.29.3)] on darwinType "help", "copyright", "credits" or "license" for more information.
>>> import milvus
>>> milvus.start()
>>>
You will see the following output if you have successfully started the embedded Milvus service.
---Milvus Proxy successfully initialized and ready to serve!---
After the service starts, you can start another terminal window and run the example code of "Hello Milvus" to play around with embedded Milvus!
# Download hello_milvus script
$ wget https://raw.githubusercontent.com/milvus-io/pymilvus/v2.1.0/examples/hello_milvus.py
# Run Hello Milvus
Alternatively, you can import and run the PyMilvus script immediately after running milvus.start()
in the same terminal window.
$ python3
Python 3.9.10 (main, Jan 15 2022, 11:40:53)
[Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import milvus
--- if you are running Milvus for the first time, type milvus.before() for pre-run instructions ---
--- otherwise, type milvus.start() ---
>>>
>>> milvus.start()
---Milvus Proxy successfully initialized and ready to serve!---
>>>
>>>
>>> import random
>>> from pymilvus import (
... connections,
... utility,
... FieldSchema, CollectionSchema, DataType,
... Collection,
... )
>>> connections.connect("default", host="localhost", port="19530")
>>> has = utility.has_collection("hello_milvus")
>>> fields = [
... FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
... FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=8)
... ]
>>> schema = CollectionSchema(fields, "hello_milvus is the simplest demo to introduce the APIs")
>>> hello_milvus = Collection("hello_milvus", schema, consistency_level="Strong")
>>> num_entities = 3000
>>> entities = [
... [i for i in range(num_entities)], # provide the pk field because `auto_id` is set to False
... [[random.random() for _ in range(8)] for _ in range(num_entities)], # field embeddings
... ]
>>> insert_result = hello_milvus.insert(entities)
>>> index = {
... "index_type": "IVF_FLAT",
... "metric_type": "L2",
... "params": {"nlist": 128},
... }
>>> hello_milvus.create_index("embeddings", index)
>>> hello_milvus.load()
>>> vectors_to_search = entities[-1][-2:]
>>> search_params = {
... "metric_type": "l2",
... "params": {"nprobe": 10},
... }
>>> result = hello_milvus.search(vectors_to_search, "embeddings", search_params, limit=3)
>>> for hits in result:
... for hit in hits:
... print(f"hit: {hit}")
...
hit: (distance: 0.0, id: 2998)
hit: (distance: 0.1088758111000061, id: 2345)
hit: (distance: 0.12012234330177307, id: 1172)
hit: (distance: 0.0, id: 2999)
hit: (distance: 0.0297045037150383, id: 2000)
hit: (distance: 0.16927233338356018, id: 560)
>>> utility.drop_collection("hello_milvus")
>>>
When you are done with using embedded Milvus, we recommend stopping it gracefully and clean up the environment variables by running the following command or press Ctrl-D.
>>> milvus.stop()
if you need to clean up the environment variables, run:
export LD_PRELOAD=
export LD_LIBRARY_PATH=
>>>
>>> exit()
Published at DZone with permission of Charles Xie. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments