Install Llama-Cpp-Python With GPU Support
This article is a walk-through to install the llama-cpp-python package with GPU capability (CUBLAS) to load models easily on the GPU.
Join the DZone community and get the full member experience.
Join For FreeIf you are looking for a step-wise approach to installing the llama-cpp-python package, you are in the right place. This guide summarizes the steps required for installation.
Before we install, are you wondering why we need to install this package separately with GPU capability?
This package gives us a class or interface (LlamaCPP) to create a model instance or object, primarily for pre-trained LLM models.
By default, even if you have a Nvidia GPU in your system with all the CUDA compilers and packages installed, this package only installs CPU capability.
Installing with GPU capability enabled eases the computation of LLMs (Larger Language Models) by automatically transferring the model onto GPU.
In this guide, detailed steps are provided to install this package using cuBLAS (GPU-accelerated library) provided by Nvidia.
Tested System Configuration
- System — Azure VM
- OS — Ubuntu 20.04
- LLM model used — Mistral -7B
Prerequisites
- Ensure the Nvidia CUDA toolkit is installed, the minimum required package version is 12.2
- Download the required package from Nvidia's official website and install it.
- Verify the successful installation of the toolkit by using this command
nvidia-smi
. This command should detect your GPU. - Also, verify in the source folder by checking in the /usr/local/ directory, there should be cuda-12.2 directory created and inside all the required files will be created.
2. Install GCC and G++ compilers to compile and install packages
- Add the gcc repository using the below command.
- sudo add-apt-repository ppa:ubuntu-toolchain-r/test
- Install gcc and g++ compilers using the command below.
- sudo apt install gcc-11 g++-11 (minimum required version is 11 for gcc and g++ compilers)
- Update alternatives using the below command to change default version 11
- sudo update-alternatives — install /usr/bin/gcc gcc /usr/bin/gcc-11 60 — slave /usr/bin/g++ g++ /usr/bin/g++-11
- Check the installed versions of GCC and G++ for correct installation.
- gcc — version # This should printout gcc version as 11.4.0
- g++ — version # This should printout gcc version as 11.4.0
3. Install Langchain and cmake packages using the below command
pip install langchain cmake
Llama-CPP Installation
- By default, the LlamaCPP package tries to pick up the default version available on the VM. If there are multiple CUDA versions, a specific version needs to be mentioned.
- Use the below command for the installation of the package.
CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCUDA_PATH=/usr/local/cuda-12.2 -DCUDAToolkit_ROOT=/usr/local/cuda-12.2 -DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12/include -DCUDAToolkit_LIBRARY_DIR=/usr/local/cuda-12.2/lib64" FORCE_CMAKE=1 pip install llama-cpp-python - no-cache-dir
Verifying Installation
Verify by creating an instance of the LLM model by enabling verbose = True
parameter.
from langchain.llm import LlamaCpp
model = LlamaCpp(model_path, n_gpu_layers = -1, verbose = True)
n_gpu_layers = -1
is the main parameter that transfers available computation layers onto the GPU. Alternatively, you can set the number of layers you want to transfer, but -1 will automatically calculate and transfer them.
verbose = True
prints the models details and parameters
On the terminal console, when the model is loaded, check for the following lines.
Device: <your-gpu-name> (Ex: Device 0: Tesla T4)
BLAS = 1 (indicates that the model is loaded onto the GPU)
Comparison
LlamaCPP With CPU
Time taken to load Mistral-7B model: 1 min (approx)
Time taken to generate a response to a query: 20 min (approx)
LlamaCPP With GPU
Time taken to load Mistral-7B model: 30 sec(approx)
Time taken to generate a response to query: 30 sec (approx)
Conclusion
Based on the load time and response generation, there is a significant performance difference when we use llama-cpp-python
package with GPU support. Consider installing this package for better performance, if you have GPU/s attached to your system.
Published at DZone with permission of Manish Kovelamudi. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments