This repo demonstrates the process of finetuning an existing pre-trained model (Llama-2-7b-hf) on a new dataset (databricks-dolly-15k
) using three different approaches: Alpaca, Lora PEFT, and QLora PEFT. Post training, the script showcases how to perform inference using the trained model.
Before running the code, ensure you have the necessary packages installed. You can install them using the pip command provided at the beginning of each section.
-
Loading Dataset: The dataset
databricks/dolly-15k
is loaded using thedatasets
library. -
Formatting: The samples in the dataset are formatted to include instructions, context (if available), and the response. The purpose of the formatting is to make the data more consistent and user-friendly for the model training process.
-
Model & Tokenizer Initialization: The pre-trained model and its corresponding tokenizer are loaded. The padding token and side are adjusted to suit the model's requirements.
-
Training Setup: Training arguments are defined using the
TrainingArguments
class from thetransformers
library. -
Training: The model is trained using the
SFTTrainer
class. -
Model Saving: The trained model is saved to the specified path.
cd llama-tldr/training/instruction-ft/
python3 train.py
This section is similar to the Alpaca format, with additional steps to integrate the PEFT (Parameter Efficient Fine-Tuning) technique using LoRA (Low Rank Adaptation).
-
Model Modification for LoRA: The model is adjusted to include the LoRA layers and the necessary configurations are set.
-
Training Callbacks: Two callbacks are defined:
SavePeftModelCallback
: To save the PEFT model during training checkpoints.LoadBestPeftModelCallback
: To load the best PEFT model after training completes.
-
Training: The model is trained using the
SFTTrainer
class with the defined callbacks. -
Model Merging & Saving: Post training, the LoRA layers and the base model are merged into a single model and then saved.
cd llama-tldr/training/ft-lora/
python3 train.py
This section extends the Lora PEFT approach by incorporating quantization techniques (using 4-bit integers) to reduce the model's memory footprint.
-
Model Quantization: The model is prepared for 4-bit quantization using the
BitsAndBytesConfig
. -
Training & Model Saving: The training and model saving steps are similar to the Lora PEFT section but tailored for the quantized model.
cd llama-tldr/training/qlora/
python3 train.py
-
Installing Inference Library: The
vllm
library is installed to facilitate inference. -
Generating Texts: A list of prompts is created, and the model generates corresponding responses using the defined sampling parameters.
To use this code:
- Ensure you have the necessary packages installed.
- Define the
model_name
variable with the path to your pre-trained model. - Run the code sequentially, section by section.
- After training, the inference section will generate and print text based on the provided prompts.
- The
datasets
library is used for loading datasets in a convenient format suitable for training. - The
transformers
library provides tools and utilities for working with transformer-based models. - The PEFT technique allows for efficient fine-tuning, saving both time and resources.
- The
vllm
library facilitates easy inference using large language models.
This documentation provides a detailed overview of the code used for finetuning and inference with different techniques. Follow the steps mentioned to train the model on your data and generate responses to your prompts.