Mixtral 8x7B Locally: Train Your LLM with Your Own Data
The newly established French company Mistral AI has managed to position itself as a leading player in the world of Artificial Intelligence. With its Large Language Model (LLM), Mixtral 8x7B, based on an innovative concept of Mixture of Experts (MoE), it competes with giants like Meta and its Llama 2 70B model, as well as OpenAI and its famous ChatGPT 3.5. Mistral AI’s adoption of the open-source Apache 2.0 license democratizes access to this cutting-edge technology, allowing a broad range of users and developers to exploit and customize this technology according to their specific needs.
Let’s get hands-on with this model and see how to make the most of Mixtral 8x7B by customizing a LLM model with our own data, locally, to preserve its confidentiality. This approach finally offers unprecedented flexibility and reliability for developers and businesses seeking to integrate AI into their projects, all the while maintaining complete control over their data.
- Understanding AI Jargon
- The Concepts Behind Mixtral 8x7B
- Installing Mixtral 8x7B
- Testing the Intrinsic Capabilities of Mixtral 8x7B
- Specializing Our Mixtral Model
- Conclusion
Understanding AI Jargon
Before diving further into our approach, it may be helpful to understand the terms we will use that are at the core of the currently popular AI models:
LLM (Large Language Models): These are AI models designed to understand and generate natural language. They are based on vast datasets, with perhaps the most well-known being OpenAI’s ChatGPT. However, there are many others like Google’s BERT, Meta’s Llama, Hugging Face’s BLOOM, Technology Innovation Institute’s Falcon, and the one of our interest today, Mixtral by Mistral AI.
RAG (Retrieval-Augmented Generation): This is a means of adding new information to an LLM or specializing it in a specific domain. It requires vectorial databases that allow the LLM to use these new pieces of information and provide more contextual responses.
LangChain: This is a development framework dedicated to LLMs. It allows for the combination of a wide variety of language models including LLMs with external sources or user input components. It has become de facto the most used open source framework in applications utilizing LLMs.
Token: This represents the basic unit in language processing by AI models. It can represent a word, character, or part of a word like a phoneme, for example. Thus, it is this abstraction that LLM models manipulate, and its size influences their capacity to analyze and generate language.
Mixture-of-Experts (MoE): This is a technique where an AI model is divided into specialized ‘experts,’ each handling a different part of the information. Depending on the context of the request, the most relevant expert is solicited, which allows for a more precise and tailored response. This approach improves the quality of the information generated by leveraging the specific skills of each expert.
The Concepts Behind Mixtral 8x7B
Mixtral 8x7B is a Large Language Model (LLM) of the Mixture-of-Experts (MoE) type. It operates by directing each token to 2 out of 8 groups of experts that make up the model. The outputs from these experts are then combined to produce the final result, thus optimizing the processing and generation of the response.
Each expert within the Mixtral 8x7B model has about 7 billion parameters, which explains the model’s name. When processing a request, Mixtral 8x7B uses only 12.6 billion parameters (approximately 2x7B), which speeds up its processing and reduces the necessary resources. The specialization of each expert allows Mixtral 8x7B to outperform larger models like Llama 2 70B (70 billion parameters), while being six times faster. Moreover, it equals or surpasses GPT3.5 on most standard benchmarks.
Licensed under Apache 2.0, Mixtral 8x7B can be reused by developers, researchers, and companies, thus fostering innovation and collaboration in the field of AI. This open license allows for extensive adaptation and customization of the model, making the technology modifiable for a wide range of applications.
Installing Mixtral 8x7B
Step 1: Installing Ollama
Previously, installing and operating an AI model on one’s computer was a very complex task. However, the introduction of Ollama, an open-source software, has significantly simplified this process. Indeed, Ollama allows users to easily run advanced models such as Mixtral 8x7B, directly on their own systems, paving the way for the democratization of these technologies.
To install Ollama on your computer:
- Go to the GitHub project and follow the instructions:ollama/ollama Public
- Or download the Ollama installation binary directly from https://ollama.ai/download and start the installation on your computer.
Step 2: Starting Mixtral 8x7B
To activate the Mixtral 8x7B neural network, run this command in your terminal:
- During the first execution, Ollama will download the Mixtral 8x7B model, which is 26 GB in size. The download time will depend on your internet connection.
- It is necessary for your system to have at least 48 GB of RAM to efficiently run Mixtral 8x7B.
- In this scenario, choosing a Mac Apple Silicon with its unified memory presents a significant advantage, as it provides the GPU with access to a vast amount of memory, thereby enhancing its processing capabilities.
Testing the Intrinsic Capabilities of Mixtral 8x7B
In this first test, we will examine Mixtral’s ability to generate Java code using the Spring Boot 3.2 framework. This test will serve as a benchmark before specializing our LLM specifically for Spring Boot 3.2, thus providing a point of comparison to evaluate the improvements made by specialization.
Optional Step: Create a Python Virtual Environment
Depending on your preferences, you may create a virtual environment to isolate the test program and its dependencies. There are several tools for creating Python virtual environments, including:
venv: Integrated into Python 3.3 and later, it allows the creation of lightweight virtual environments.
virtualenv: An older, independent tool that offers additional features compared to venv.
conda: Particularly useful for managing environments that require complex dependencies, including non-Python libraries.
pipenv: Combines pip and virtualenv for an improved dependency management workflow.
poetry: Manages dependencies and virtual environments, focusing on ease of use and reproducibility.
With conda, to create a virtual environment named mixtral_ollama under Python 3.11, execute the commands:
Step 3: Implementing the Test Application
This step involves creating two files. The first, requirements.txt
, contains the necessary dependencies for the application. The second, reference_test.py
, is a Python script that sends a request to Mixtral 8x7B. We will use the LlamaIndex library to facilitate the use of LLMs and, subsequently, the integration of different data sources.
- requirements.txt :
- reference_test.py :
Step 4: Querying the Model
We receive the following response from Mixtral:
As Spring Boot 3.2 isn’t out yet, I’ll create a REST controller class for the latest stable version of Spring Boot (2.6.x) using Java 8. Here’s a simple example of a REST controller class to handle GET and POST requests with security and configuration annotations:
This example assumes an ExampleService handling business requests, returning Object or List
For Spring Boot 3.2 (once available), some dependencies or annotations might change, but the general approach remains the same: mapping HTTP requests to controller class methods, using annotations to configure controller behavior.
Note: In this example, a class-level @PreAuthorize annotation requires authentication for all requests to this class’s endpoints, with the user needing “ROLE_USER” role. Modify as per your application’s needs.
Mixtral 8x7B generated code based on Spring Boot 2.6.x as this is the version available in the data that trained the model. Since Spring Boot version 3.2 was only recently released, it was not included in these data. This explains why the model has no knowledge of Spring Boot in version 3.x.
Specializing Our Mixtral Model
Now, we will specialize our model. To do this, we will provide it with PDF documents containing specific information related to the new targeted context, in our example, the Spring Boot 3.2 reference documentation.
Step 5: Implementing the Model Specialization
- We will place the PDF file of the Spring Boot reference documentation in a
./data
subdirectory:
- Add new dependencies to our project in the
requirements.txt
file:
- Implement the specialization script to inject the PDF into the MixTral 8x7B model,
specialized_test.py
. The vector database, Qdrant, acts as a RAG for our model:
Step 6: Querying the New Model
We obtain a completely different response from Mixtral:
Here’s an example of a Java-based REST controller class with the specified requirements for a Spring Boot 3.2 application:
This example includes:
- A REST controller (
@RestController
) with two endpoints: one for GET requests (public), and another for POST requests (private).- Annotations for Spring Security to restrict access to the
/private-data
endpoint.- A custom
RestTemplateBuilder
configuration at the class level with a constructor dependency injection.- The controller uses constructor-based dependency injection instead of field-based to ensure proper initialization and testability.
The specialized model now offers a more sophisticated REST controller implementation for Spring Boot 3.2. However, I haven’t verified this code or confirmed its specificity to Spring Boot 3. The aim was to test the model’s specialization capability rather than the exact accuracy of the generated code.
Conclusion
The combination of Mixtral 8x7B, Ollama, and LlamaIndex marks a significant advancement in customizing AI models and the development of tailor-made applications, by merging technical power with ease of use. This synergy not only enhances the protection of private data but also benefits from an open and free license, thereby encouraging collaboration and innovation. This makes artificial intelligence more accessible and adaptable to a variety of projects and users, democratizing its use in diverse contexts.