Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

📄 Abstract

This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model's outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model's ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit's superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit's robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning.

Method

Figure 1: Overview of self-distillation unlearning in Unilogit. Starting with the output logits of the LLM, the target logit is diminished, so that after softmax, the target token in the modified distribution has uniform probability. Soft labels are derived from the current model (θ) outputs. Reverse KL divergence is the distillation objective.

Prerequisites

To get started, install the required packages and navigate into the LLaMA-Factory directory:

pip install -r requirements.txt
cd LLaMA-Factory

🧠 Unlearning

Running unlearning experiments happens through bash scripts in scripts/custom directory. You can tune the hyperparameters as you like inside the scripts. Current hyperparemeters represent configurations used during the experiments in the paper.

MUSE

To run an evaluation of the baseline Meta Llama-3 8B Instruct on MUSE-News data

bash scripts/custom/run_baseline_muse.sh

To run a Gradient Ascent on Meta-Llama-3-8B-Instruct

bash scripts/custom/run_ga_rt_muse.sh

To run a Unilogit on Meta-Llama-3-8B-Instruct

bash scripts/custom/run_unilogit_rt_muse.sh

RWKU

To run an evaluation of the baseline Meta Llama-3 8B Instruct on RWKU data

bash scripts/custom/run_baseline_rwku.sh

To run a Gradient Ascent on Meta-Llama-3-8B-Instruct on RWKU data

bash scripts/custom/run_ga_rwku.sh

To run a Unilogit on Meta-Llama-3-8B-Instruct on RWKU data

bash scripts/custom/run_unilogit_rwku.sh

📊 Results

Results on MUSE-News

Figure 2: Results for the MUSE-News benchmark for different unlearning methods using multiple different hyperparameters. On the x-axis we have the retain per- formance and on the y-axis the forgetting performance, both for the QA task.

Results on RWKU

Figure 3: Results for the RWKU-News benchmark for different unlearning methods using multiple different hyperparameters. On the x-axis is the retain perfor- mance and on the y-axis the forgetting performance.

Results on internal e-commerce task

Figure 4: Comparison of unlearning methods on listings from three different sellers across three forget set sizes in our e-commerce dataset. Forget Completion and Neighbors Completion are evaluated using ROUGE-recall scores. Marker sizes and number annotations indicate MMLU scores, reflecting general model abilities.

Ablations

Figure 5: Left: Average KL divergence between the retrained model outputs and the soft labels of UnDIAL and Unilogit on the forget set. Center: KL divergence progression between soft targets and retrained model outputs for both methods over unlearning epochs. Right: Average KL divergence between unlearned model outputs and retrained model for NPO, UnDIAL, and Unilogit. Lower values indicate better performance in all cases, as well as the baseline represents average KL between the outputs of the starting model and the retrained model.

Hardware Used

All unlearning experiments were run on 4 NVIDIA A100 GPUs with 80GB of memory.

🙏 Ackgnowledgements

This repo contains a fork of RWKU, which itself is a fork of LLaMA-Factory. Special thanks to the authors of these repos for their work and contribution to the open-source community!

License

This project is licensed under the terms of the Apache 2.0 License.

Contact

If you would like to reach out, please contact us: Stefan Vasilev (personal e-mail), Stefan Vasilev (work e-mail), Github: stefanvasilev

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LLaMA-Factory		LLaMA-Factory
assets		assets
generation		generation
process		process
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
accelerate_config.yaml		accelerate_config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

Method

Prerequisites

🧠 Unlearning

MUSE

RWKU

📊 Results

Results on MUSE-News

Results on RWKU

Results on internal e-commerce task

Ablations

Hardware Used

🙏 Ackgnowledgements

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

Method

Prerequisites

🧠 Unlearning

MUSE

RWKU

📊 Results

Results on MUSE-News

Results on RWKU

Results on internal e-commerce task

Ablations

Hardware Used

🙏 Ackgnowledgements

License

Contact

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages