📄 Abstract
This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model's outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model's ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit's superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit's robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning.
Figure 1: Overview of self-distillation unlearning in Unilogit. Starting with the output logits of the LLM, the target logit is diminished, so that after softmax, the target token in the modified distribution has uniform probability. Soft labels are derived from the current model (θ) outputs. Reverse KL divergence is the distillation objective.
To get started, install the required packages and navigate into the LLaMA-Factory directory:
pip install -r requirements.txt
cd LLaMA-Factory
Running unlearning experiments happens through bash scripts in scripts/custom directory. You can tune the hyperparameters as you like inside the scripts. Current hyperparemeters represent configurations used during the experiments in the paper.
To run an evaluation of the baseline Meta Llama-3 8B Instruct on MUSE-News data
bash scripts/custom/run_baseline_muse.sh
To run a Gradient Ascent on Meta-Llama-3-8B-Instruct
bash scripts/custom/run_ga_rt_muse.sh
To run a Unilogit on Meta-Llama-3-8B-Instruct
bash scripts/custom/run_unilogit_rt_muse.sh
To run an evaluation of the baseline Meta Llama-3 8B Instruct on RWKU data
bash scripts/custom/run_baseline_rwku.sh
To run a Gradient Ascent on Meta-Llama-3-8B-Instruct on RWKU data
bash scripts/custom/run_ga_rwku.sh
To run a Unilogit on Meta-Llama-3-8B-Instruct on RWKU data
bash scripts/custom/run_unilogit_rwku.sh
Figure 2: Results for the MUSE-News benchmark for different unlearning methods using multiple different hyperparameters. On the x-axis we have the retain per- formance and on the y-axis the forgetting performance, both for the QA task.
Figure 3: Results for the RWKU-News benchmark for different unlearning methods using multiple different hyperparameters. On the x-axis is the retain perfor- mance and on the y-axis the forgetting performance.
Figure 4: Comparison of unlearning methods on listings from three different sellers across three forget set sizes in our e-commerce dataset. Forget Completion and Neighbors Completion are evaluated using ROUGE-recall scores. Marker sizes and number annotations indicate MMLU scores, reflecting general model abilities.
Figure 5: Left: Average KL divergence between the retrained model outputs and the soft labels of UnDIAL and Unilogit on the forget set. Center: KL divergence progression between soft targets and retrained model outputs for both methods over unlearning epochs. Right: Average KL divergence between unlearned model outputs and retrained model for NPO, UnDIAL, and Unilogit. Lower values indicate better performance in all cases, as well as the baseline represents average KL between the outputs of the starting model and the retrained model.
All unlearning experiments were run on 4 NVIDIA A100 GPUs with 80GB of memory.
This repo contains a fork of RWKU, which itself is a fork of LLaMA-Factory. Special thanks to the authors of these repos for their work and contribution to the open-source community!
This project is licensed under the terms of the Apache 2.0 License.
If you would like to reach out, please contact us: Stefan Vasilev (personal e-mail), Stefan Vasilev (work e-mail), Github: stefanvasilev




