Skip to content

eBay/unilogit-acl-2025

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

📄 Abstract

This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model's outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model's ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit's superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit's robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning.

Method

Figure 1

Figure 1: Overview of self-distillation unlearning in Unilogit. Starting with the output logits of the LLM, the target logit is diminished, so that after softmax, the target token in the modified distribution has uniform probability. Soft labels are derived from the current model (θ) outputs. Reverse KL divergence is the distillation objective.

Prerequisites

To get started, install the required packages and navigate into the LLaMA-Factory directory:

pip install -r requirements.txt
cd LLaMA-Factory

🧠 Unlearning

Running unlearning experiments happens through bash scripts in scripts/custom directory. You can tune the hyperparameters as you like inside the scripts. Current hyperparemeters represent configurations used during the experiments in the paper.

MUSE

To run an evaluation of the baseline Meta Llama-3 8B Instruct on MUSE-News data

bash scripts/custom/run_baseline_muse.sh

To run a Gradient Ascent on Meta-Llama-3-8B-Instruct

bash scripts/custom/run_ga_rt_muse.sh

To run a Unilogit on Meta-Llama-3-8B-Instruct

bash scripts/custom/run_unilogit_rt_muse.sh

RWKU

To run an evaluation of the baseline Meta Llama-3 8B Instruct on RWKU data

bash scripts/custom/run_baseline_rwku.sh

To run a Gradient Ascent on Meta-Llama-3-8B-Instruct on RWKU data

bash scripts/custom/run_ga_rwku.sh

To run a Unilogit on Meta-Llama-3-8B-Instruct on RWKU data

bash scripts/custom/run_unilogit_rwku.sh

📊 Results

Results on MUSE-News

Results on MUSE-News

Figure 2: Results for the MUSE-News benchmark for different unlearning methods using multiple different hyperparameters. On the x-axis we have the retain per- formance and on the y-axis the forgetting performance, both for the QA task.

Results on RWKU

Results on RWKU

Figure 3: Results for the RWKU-News benchmark for different unlearning methods using multiple different hyperparameters. On the x-axis is the retain perfor- mance and on the y-axis the forgetting performance.

Results on internal e-commerce task

Results on internal e-commerce task

Figure 4: Comparison of unlearning methods on listings from three different sellers across three forget set sizes in our e-commerce dataset. Forget Completion and Neighbors Completion are evaluated using ROUGE-recall scores. Marker sizes and number annotations indicate MMLU scores, reflecting general model abilities.

Ablations

Ablations

Figure 5: Left: Average KL divergence between the retrained model outputs and the soft labels of UnDIAL and Unilogit on the forget set. Center: KL divergence progression between soft targets and retrained model outputs for both methods over unlearning epochs. Right: Average KL divergence between unlearned model outputs and retrained model for NPO, UnDIAL, and Unilogit. Lower values indicate better performance in all cases, as well as the baseline represents average KL between the outputs of the starting model and the retrained model.

Hardware Used

All unlearning experiments were run on 4 NVIDIA A100 GPUs with 80GB of memory.

🙏 Ackgnowledgements

This repo contains a fork of RWKU, which itself is a fork of LLaMA-Factory. Special thanks to the authors of these repos for their work and contribution to the open-source community!

License

This project is licensed under the terms of the Apache 2.0 License.

Contact

If you would like to reach out, please contact us: Stefan Vasilev (personal e-mail), Stefan Vasilev (work e-mail), Github: stefanvasilev

About

ACL 2025 submission repository, containing only the general benchmarks code

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors