// COMPUTER VISION

DF-P2E: Deepfake Detection

Multimodal, explainable deepfake analysis — CSIRO Data61

PyTorchVision TransformersCNNsSHAPLIMECLIPLlama 3.1OllamaHugging Face

Overview

DF-P2E ("Prediction to Explanation") is a deepfake detection framework built at CSIRO's Data61 with a forensics-first mindset. The model isn't enough — investigators need to understand why the model says what it does.

The pipeline

Detection — a CNN + Vision Transformer ensemble classifies frames
Saliency — SHAP and LIME produce visual heatmaps showing which regions drove the decision
Captioning — CLIP + attention generates descriptive captions of suspicious regions
Contextual explanation — Llama 3.1 (via Ollama) takes the detection result + captions + saliency and produces a forensic narrative explanation
REST API — the whole pipeline is exposed as endpoints so it can plug into existing forensic tooling

Why explainability matters here

Courts don't accept "the AI said so." Investigators need to articulate the evidence in human terms. The contextual explanation layer is what turns a probability score into an admissible argument.

Outputs

An interactive web app for exploring detections (Hugging Face Space)
REST API for batch integration
Visual + textual explanations for every prediction
Published as part of DF-P2E at ACM Multimedia 2025

// SCREENSHOTS