This repo contains results, notebooks, and code related to quantizing blip2 with various configs. To get an idea of the main logic, look at the below diagram:
To create env, run, and score:
# conda env create -f environment.yml`
python run.py ./configs/1.json
python score.py ./results/1.json
IMPORTANT: The scoring part of this pipeline relies on the pycocoevalcap
python submodule. To also clone this into the repo run git clone --recurse-submodules https://github.com/gautomdas/blip2-coco
or if you already downloaded the repo and the pycocoevalcap
folder is still empty, run git submodule init && git submodule update
.
- Download the coco data set to the data folder using the following script (assumes you have the environment loaded):
python download_coco.py
- From there you should be able to run all of
demo.ipynb
demo.ipynb
goes over the 3 main steps in the diagram above
The following files are as follows:
run.py
: The singular file used for quantization + inferencing. This takes in a config as./configs/<#>.json
and runs it.blip_quantizer.py
: The quantization class that quantizes a the blip2 model.inference_pipeline.py
: The inference class that takes a model and tasks to produceresults/<#>.json
.scoring_pipeline.py
: The scoring class used to convert results to scores based on task. This is separate from the inferencer/quantizer because it only requires the CPU to run.quant_functions.py
: Functions that areTensor
->Tensor
and perform quantization.utils.py
: Additional utils used for config loading and model printing.multi_sbatch.py
: Runs themain.py
script over many GPUs and different configs.
demo.ipynb
: The above figure demonstrated in a ipynbblip2_analysis.ipynb
: Counting linear layers and params for the BLIP2 modelblip2_dropoff_coco.ipynb
: A look at drop off between different quantizations over the whole modeldataset_usage.ipynb
: A simple file showing how the COCO dataset (and others) are loadedconfig_creator.ipynb
: Create all combinations of configs based on:
for each bit width:
for each model part (ViT, LLM, QFormer):
for each of the 8 combinations of front/middle/end:
try with 2 other models quantized, not quantized, 1 of each, and 1 of each the other way
- Add vqa2 dataset+test
- Migrate datasets to HF
- Look at error propagation through layers for quantizing
- Add GPTQ and AWQ
1082.json:
{
"predictions": [
{
"image_id": 397133,
"caption": "the new xiaomi mi box"
},
{
"image_id": 37777,
"caption": "a white and black image of a smartphone"
},
{
"image_id": 252219,
"caption": "a white and blue box with a black and white logo"
},
{
"image_id": 87038,
"caption": "a white and black table with a white and black table cloth"
},
{
"image_id": 174482,
"caption": "an image of a white table with a black and white image"
},
{
"image_id": 403385,
"caption": "an image of a white wall with a black and white image of a speaker"
},
{
"image_id": 6818,
"caption": "the new apple tv 4k"
},
{
"image_id": 480985,
"caption": "a white and black image of a computer screen"
},
{
"image_id": 458054,
"caption": "a white and black square with a white and black square"
},
...
}