all projects

// COMPUTER VISION

DF-P2E: Deepfake Detection

Multimodal, explainable deepfake analysis — CSIRO Data61

PyTorchVision TransformersCNNsSHAPLIMECLIPLlama 3.1OllamaHugging Face

Overview

DF-P2E ("Prediction to Explanation") is a deepfake detection framework built at CSIRO's Data61 with a forensics-first mindset. The model isn't enough — investigators need to understand why the model says what it does.

The pipeline

  1. Detection — a CNN + Vision Transformer ensemble classifies frames
  2. Saliency — SHAP and LIME produce visual heatmaps showing which regions drove the decision
  3. Captioning — CLIP + attention generates descriptive captions of suspicious regions
  4. Contextual explanation — Llama 3.1 (via Ollama) takes the detection result + captions + saliency and produces a forensic narrative explanation
  5. REST API — the whole pipeline is exposed as endpoints so it can plug into existing forensic tooling

Why explainability matters here

Courts don't accept "the AI said so." Investigators need to articulate the evidence in human terms. The contextual explanation layer is what turns a probability score into an admissible argument.

Outputs

  • An interactive web app for exploring detections (Hugging Face Space)
  • REST API for batch integration
  • Visual + textual explanations for every prediction
  • Published as part of DF-P2E at ACM Multimedia 2025

// SCREENSHOTS

DF-P2E: Deepfake Detection screenshot
DF-P2E: Deepfake Detection screenshot