ACT

This repository provides the code for the paper Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories

Getting Started

Follow these steps to set up the environment and get started with the project.

Installation

To set up your environment, navigate to the root folder of this repository and execute the following commands:

conda env create -f environment.yaml
conda activate act
git clone https://github.com/sylinrl/TruthfulQA.git
mkdir activations
mkdir directions
mkdir validation

To evaluate the model using the TruthfulQA API, you need to set your OpenAI API key as an environment variable. Follow the instructions provided in the TruthfulQA repository.

Workflow

1.Collect Activations: collect activations with the following command:

python collect_activations.py --model_name llama_7B --device 0

2.Generate Directions: Generate direction for each question with the following command:

python generate_directions_q_wise.py --model_name llama_7B

3.Validation: evaluate ACT on TruthfulQA with the following command:

python valid_2_fold.py --model_name llama_7B --num_heads 24 --alpha 12 --n_clusters 3 --probe_base_weight 0 --judge_name <your GPT-judge name> --info_name <your GPT-info name>

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
llama		llama
.gitignore		.gitignore
README.md		README.md
collect_activations.py		collect_activations.py
environment.yaml		environment.yaml
generate_directions_q_wise.py		generate_directions_q_wise.py
overview.png		overview.png
utils.py		utils.py
valid_2_fold.py		valid_2_fold.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ACT

Getting Started

Installation

Workflow

About

Uh oh!

Releases

Packages

Languages

tianlwang/ACT

Folders and files

Latest commit

History

Repository files navigation

ACT

Getting Started

Installation

Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages