Researchers have introduced RULER, a set of representation-level metrics that expose hidden memorization in machine unlearning systems that otherwise appear to work correctly. Current verification methods only check whether a model's outputs forget target data, but RULER detects when forgotten information persists in a model's internal representations—finding significant residuals in 10 of 12 tested unlearning conditions, particularly in sensitive applications like face recognition.
Why it matters: As machine unlearning becomes critical for privacy compliance and model governance, this work reveals that existing validation methods are inadequate, potentially allowing unsafe models to pass audits—creating urgent questions about the trustworthiness of current unlearning deployments.