Mixtral 8x7B Locally: Train Your LLM with Your Own Data
Mistral AI in the spotlight according to DALL•E

The newly established French company Mistral AI has managed to position itself as a leading player in the world of Artificial Intelligence. With its Large Language Model (LLM), Mixtral 8x7B, based on an innovative concept of Mixture of Experts (MoE), it competes with giants like Meta and its Llama 2 70B model, as well as OpenAI and its famous ChatGPT 3.5. Mistral AI’s adoption of the open-source Apache 2.0 license democratizes access to this cutting-edge technology, allowing a broad range of users and developers to exploit and customize this technology according to their specific needs.

Let’s get hands-on with this model and see how to make the most of Mixtral 8x7B by customizing a LLM model with our own data, locally, to preserve its confidentiality. This approach finally offers unprecedented flexibility and reliability for developers and businesses seeking to integrate AI into their projects, all the while maintaining complete control over their data.



Understanding AI Jargon

Before diving further into our approach, it may be helpful to understand the terms we will use that are at the core of the currently popular AI models:

  • LLM (Large Language Models): These are AI models designed to understand and generate natural language. They are based on vast datasets, with perhaps the most well-known being OpenAI’s ChatGPT. However, there are many others like Google’s BERT, Meta’s Llama, Hugging Face’s BLOOM, Technology Innovation Institute’s Falcon, and the one of our interest today, Mixtral by Mistral AI.

  • RAG (Retrieval-Augmented Generation): This is a means of adding new information to an LLM or specializing it in a specific domain. It requires vectorial databases that allow the LLM to use these new pieces of information and provide more contextual responses.

  • LangChain: This is a development framework dedicated to LLMs. It allows for the combination of a wide variety of language models including LLMs with external sources or user input components. It has become de facto the most used open source framework in applications utilizing LLMs.

  • Token: This represents the basic unit in language processing by AI models. It can represent a word, character, or part of a word like a phoneme, for example. Thus, it is this abstraction that LLM models manipulate, and its size influences their capacity to analyze and generate language.

  • Mixture-of-Experts (MoE): This is a technique where an AI model is divided into specialized ‘experts,’ each handling a different part of the information. Depending on the context of the request, the most relevant expert is solicited, which allows for a more precise and tailored response. This approach improves the quality of the information generated by leveraging the specific skills of each expert.


The Concepts Behind Mixtral 8x7B

Mixtral 8x7B is a Large Language Model (LLM) of the Mixture-of-Experts (MoE) type. It operates by directing each token to 2 out of 8 groups of experts that make up the model. The outputs from these experts are then combined to produce the final result, thus optimizing the processing and generation of the response.

Each expert within the Mixtral 8x7B model has about 7 billion parameters, which explains the model’s name. When processing a request, Mixtral 8x7B uses only 12.6 billion parameters (approximately 2x7B), which speeds up its processing and reduces the necessary resources. The specialization of each expert allows Mixtral 8x7B to outperform larger models like Llama 2 70B (70 billion parameters), while being six times faster. Moreover, it equals or surpasses GPT3.5 on most standard benchmarks.

Licensed under Apache 2.0, Mixtral 8x7B can be reused by developers, researchers, and companies, thus fostering innovation and collaboration in the field of AI. This open license allows for extensive adaptation and customization of the model, making the technology modifiable for a wide range of applications.


Installing Mixtral 8x7B

Step 1: Installing Ollama

Previously, installing and operating an AI model on one’s computer was a very complex task. However, the introduction of Ollama, an open-source software, has significantly simplified this process. Indeed, Ollama allows users to easily run advanced models such as Mixtral 8x7B, directly on their own systems, paving the way for the democratization of these technologies.

To install Ollama on your computer:

  • Go to the GitHub project and follow the instructions:
    ollama/ollama Public
  • Or download the Ollama installation binary directly from https://ollama.ai/download and start the installation on your computer.

Step 2: Starting Mixtral 8x7B

To activate the Mixtral 8x7B neural network, run this command in your terminal:

ollama run mixtral
Shell
  • During the first execution, Ollama will download the Mixtral 8x7B model, which is 26 GB in size. The download time will depend on your internet connection.
  • It is necessary for your system to have at least 48 GB of RAM to efficiently run Mixtral 8x7B.
  • In this scenario, choosing a Mac Apple Silicon with its unified memory presents a significant advantage, as it provides the GPU with access to a vast amount of memory, thereby enhancing its processing capabilities.

Testing the Intrinsic Capabilities of Mixtral 8x7B

In this first test, we will examine Mixtral’s ability to generate Java code using the Spring Boot 3.2 framework. This test will serve as a benchmark before specializing our LLM specifically for Spring Boot 3.2, thus providing a point of comparison to evaluate the improvements made by specialization.

Optional Step: Create a Python Virtual Environment

Depending on your preferences, you may create a virtual environment to isolate the test program and its dependencies. There are several tools for creating Python virtual environments, including:

  • venv: Integrated into Python 3.3 and later, it allows the creation of lightweight virtual environments.

  • virtualenv: An older, independent tool that offers additional features compared to venv.

  • conda: Particularly useful for managing environments that require complex dependencies, including non-Python libraries.

  • pipenv: Combines pip and virtualenv for an improved dependency management workflow.

  • poetry: Manages dependencies and virtual environments, focusing on ease of use and reproducibility.

With conda, to create a virtual environment named mixtral_ollama under Python 3.11, execute the commands:

conda create --name mixtral_ollama python=3.11
conda activate mixtral_ollama
Shell

Step 3: Implementing the Test Application

This step involves creating two files. The first, requirements.txt, contains the necessary dependencies for the application. The second, reference_test.py, is a Python script that sends a request to Mixtral 8x7B. We will use the LlamaIndex library to facilitate the use of LLMs and, subsequently, the integration of different data sources.

  • requirements.txt :
llama-index
Python
  • reference_test.py :
from llama_index.llms import Ollama

llm = Ollama(model="mixtral")

prompt = (
  "Create a REST controller class in Java for a Spring Boot 3.2 application. "
  "This class should handle GET and POST requests, and include security and "
  "configuration annotations."
)

response = llm.complete(prompt)
print(response)
Python

Step 4: Querying the Model

pip install -r requirements.txt
python reference_test.py
Bash

We receive the following response from Mixtral:

As Spring Boot 3.2 isn’t out yet, I’ll create a REST controller class for the latest stable version of Spring Boot (2.6.x) using Java 8. Here’s a simple example of a REST controller class to handle GET and POST requests with security and configuration annotations:

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api/example")
@PreAuthorize("hasRole('ROLE_USER')") // Annotation de sécurité de base
public class ExampleController {

    private final ExampleService exampleService;

    @Autowired
    public ExampleController(ExampleService exampleService) {
        this.exampleService = exampleService;
    }

    // Endpoint GET
    @GetMapping
    public Object getAll() {
        return exampleService.getAll();
    }

    // Endpoint POST
    @PostMapping
    public Object create(@RequestBody ExampleDto exampleDto) {
        return exampleService.create(exampleDto);
    }
}
Java

This example assumes an ExampleService handling business requests, returning Object or List.

For Spring Boot 3.2 (once available), some dependencies or annotations might change, but the general approach remains the same: mapping HTTP requests to controller class methods, using annotations to configure controller behavior.

Note: In this example, a class-level @PreAuthorize annotation requires authentication for all requests to this class’s endpoints, with the user needing “ROLE_USER” role. Modify as per your application’s needs.

Mixtral 8x7B generated code based on Spring Boot 2.6.x as this is the version available in the data that trained the model. Since Spring Boot version 3.2 was only recently released, it was not included in these data. This explains why the model has no knowledge of Spring Boot in version 3.x.


Specializing Our Mixtral Model

Now, we will specialize our model. To do this, we will provide it with PDF documents containing specific information related to the new targeted context, in our example, the Spring Boot 3.2 reference documentation.

Step 5: Implementing the Model Specialization

  • We will place the PDF file of the Spring Boot reference documentation in a ./data subdirectory:
mkdir ./data
curl -o ./data/spring-boot-reference.pdf https://docs.spring.io/spring-boot/docs/3.2.1/reference/pdf/spring-boot-reference.pdf
Bash
  • Add new dependencies to our project in the requirements.txt file:
llama-index
qdrant_client
pypdf
transformers
torch
Python
  • Implement the specialization script to inject the PDF into the MixTral 8x7B model, specialized_test.py. The vector database, Qdrant, acts as a RAG for our model:
from pathlib import Path
import qdrant_client
from llama_index import VectorStoreIndex, ServiceContext, SimpleDirectoryReader
from llama_index.llms import Ollama
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore

# Loading the documents from the disk
documents = SimpleDirectoryReader("./data").load_data()

# Initializing the vector store with Qdrant
client = qdrant_client.QdrantClient(path="./qdrant_data")
vector_store = QdrantVectorStore(client=client, collection_name="springboot")
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Initializing the Large Language Model (LLM) with Ollama
# The request_timeout may need to be adjusted depending on the system's performance capabilities
llm = Ollama(model="mixtral", request_timeout=120.0)
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local")

# Creating the index, which includes embedding the documents into the vector store
index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)

# Querying the index with a specific question
query_engine = index.as_query_engine()
prompt = (
  "Create a REST controller class in Java for a Spring Boot 3.2 application. "
  "This class should handle GET and POST requests, and include security and "
  "configuration annotations."
)
response = query_engine.query(prompt)
print(response)
Python

Step 6: Querying the New Model

pip install -r requirements.txt
python specialized_test.py
Bash

We obtain a completely different response from Mixtral:

Here’s an example of a Java-based REST controller class with the specified requirements for a Spring Boot 3.2 application:

import java.time.Duration;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Configuration;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.client.RestTemplate;
import org.springframework.web.client.RestTemplateBuilder;

@Configuration(proxyBeanMethods = false)
@RestController
public class MyRestController {

    private final RestTemplate restTemplate;
    
    @Autowired
    public MyRestController(RestTemplateBuilder builder) {
        this.restTemplate = builder
                .setConnectTimeout(Duration.ofSeconds(5))
                .setReadTimeout(Duration.ofSeconds(2))
                .build();
    }
    
    @GetMapping("/public-data")
    public ResponseEntity<String> getPublicData() {
        return new ResponseEntity<>("Public data", HttpStatus.OK);
    }
    
    @PostMapping("/private-data")
    @PreAuthorize("hasRole('ROLE_USER')")
    public ResponseEntity<String> postPrivateData(@RequestBody String request) {
        // Process the private data here
        return new ResponseEntity<>("Private data saved", HttpStatus.CREATED);
    }
}
Java

This example includes:

  • A REST controller (@RestController) with two endpoints: one for GET requests (public), and another for POST requests (private).
  • Annotations for Spring Security to restrict access to the /private-data endpoint.
  • A custom RestTemplateBuilder configuration at the class level with a constructor dependency injection.
  • The controller uses constructor-based dependency injection instead of field-based to ensure proper initialization and testability.

The specialized model now offers a more sophisticated REST controller implementation for Spring Boot 3.2. However, I haven’t verified this code or confirmed its specificity to Spring Boot 3. The aim was to test the model’s specialization capability rather than the exact accuracy of the generated code.


Conclusion

The combination of Mixtral 8x7B, Ollama, and LlamaIndex marks a significant advancement in customizing AI models and the development of tailor-made applications, by merging technical power with ease of use. This synergy not only enhances the protection of private data but also benefits from an open and free license, thereby encouraging collaboration and innovation. This makes artificial intelligence more accessible and adaptable to a variety of projects and users, democratizing its use in diverse contexts.

Jean-Jerome Levy

Written by

Jean-Jerome Levy

DevOps Consultant

Seasoned professional in the field of information technology, I bring over 20 years of experience from working within major corporate IT departments. My diverse expertise has played a pivotal role in a myriad of projects, marked by the implementation of innovative DevOps practices.