Conference2025

DF-P2E: Deepfake — Prediction to Explanation

ACM Multimedia 2025

Saakshi Gupta et al.

// ABSTRACT

A multimodal, explainable, and interactive deepfake detection framework for non-expert users. The system integrates visual saliency maps, image captioning, and narrative explanations to make AI decisions transparent — turning a probability score into an investigator-readable forensic argument.

Contribution

Contributed to a multimodal, explainable, and interactive deepfake detection framework targeting non-expert users — forensic investigators, journalists, fact-checkers — who need to understand why an AI thinks a clip is fake, not just trust the verdict.

Approach

The system layers four explanation modalities on top of an ensemble detection backbone:

Visual saliency — SHAP and LIME heatmaps highlight the regions driving the prediction
Image captioning — CLIP + attention generates natural-language descriptions of suspicious regions
Narrative explanation — an LLM (Llama 3.1 via Ollama) integrates the detection result, saliency, and captions into a forensic narrative
Interactive UI — investigators can probe the model's reasoning at each layer, not just accept the verdict

Why it matters

Courts and editorial decisions can't run on "the AI said so." The contribution of this paper is making the case that explanation is the product, not the prediction.

// CITATION

BibTeX

@inproceedings{gupta2025dfp2e,
  title={DF-P2E: Deepfake — Prediction to Explanation},
  author={Gupta, Saakshi and others},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2025},
  publisher={ACM}
}