// COMPUTER VISION
DF-P2E: Deepfake Detection
Multimodal, explainable deepfake analysis — CSIRO Data61
PyTorchVision TransformersCNNsSHAPLIMECLIPLlama 3.1OllamaHugging Face
Overview
DF-P2E ("Prediction to Explanation") is a deepfake detection framework built at CSIRO's Data61 with a forensics-first mindset. The model isn't enough — investigators need to understand why the model says what it does.
The pipeline
- Detection — a CNN + Vision Transformer ensemble classifies frames
- Saliency — SHAP and LIME produce visual heatmaps showing which regions drove the decision
- Captioning — CLIP + attention generates descriptive captions of suspicious regions
- Contextual explanation — Llama 3.1 (via Ollama) takes the detection result + captions + saliency and produces a forensic narrative explanation
- REST API — the whole pipeline is exposed as endpoints so it can plug into existing forensic tooling
Why explainability matters here
Courts don't accept "the AI said so." Investigators need to articulate the evidence in human terms. The contextual explanation layer is what turns a probability score into an admissible argument.
Outputs
- An interactive web app for exploring detections (Hugging Face Space)
- REST API for batch integration
- Visual + textual explanations for every prediction
- Published as part of DF-P2E at ACM Multimedia 2025
// SCREENSHOTS

