+
Skip to content

ZifanL/TSDS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TSDS

This repository contains the implementation of the framework described in TSDS: Data Selection for Task-Specific Model Finetuning.

Prerequisites

Before running the project, ensure you have Python installed. You can download the latest version of Python from here.

Installation

  1. Clone the repository:

    git https://github.com/ZifanL/TSDS.git
    cd TSDS
  2. Install the required dependencies from the requirements.txt file:

    pip install -r requirements.txt
  3. (Optional) If you're using faiss-gpu, ensure you have the correct GPU drivers installed. Refer to the Faiss documentation for more information.

Usage

After installing the dependencies, you can run the project as follows using the toy data:

python tsds.py

In the output folder, the output file selected_candidate_indices.npy will contain the indices of the selected candidates.

To run TSDS on your customized data, two embedding files are needed:

  • An .npy file that stores the embeddings of the candidate examples. The shape of the array should be (number of candidates, embedding dimensions)
  • An .npy file that stores the embeddings of the query examples. The shape of the array should be (number of query examples, embedding dimensions) Change the file paths in config.yaml. Adjust the parameters in config.yaml as needed. The implementation uses faiss.IndexIVFFlat for approximate nearest neighbor search. To use a customized index, add it to faiss_helper.py and substitute FaissIndexIVFFlat in tsds.py.

Citation

Please cite our paper if you find this repo helpful in your work:

@inproceedings{
	liu2024tsds,
	title={{TSDS}: Data Selection for Task-Specific Model Finetuning},
	author={Zifan Liu and Amin Karbasi and Theodoros Rekatsinas},
	booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
	year={2024},
	url={https://openreview.net/forum?id=wjbTHLUSzU}
}

About

Implementation of TSDS: Data Selection for Task-Specific Model Finetuning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载