Table of Contents
What Are Deepfakes?
Deepfakes are synthetic media — images, videos, or audio — created or manipulated using artificial intelligence to depict people saying or doing things they never actually did. The term combines "deep learning" (the AI technique used) with "fake" (the result).
What makes deepfakes particularly concerning is their increasing quality. Early deepfakes were easy to spot with visible artifacts and unnatural movements. Modern deepfakes, powered by advances in AI, can be nearly indistinguishable from authentic media to the untrained eye. This technology has moved from research labs to consumer applications, making it accessible to anyone with a computer and an internet connection.
The implications span fraud, misinformation, harassment, and political manipulation. Understanding how deepfakes are created and how to detect them is becoming an essential digital literacy skill.
How Deepfakes Are Created
Generative Adversarial Networks (GANs)
The original deepfake technology uses GANs — a system of two neural networks working against each other. The generator creates fake images, while the discriminator tries to distinguish them from real ones. Through this adversarial process, the generator becomes increasingly skilled at producing convincing fakes.
Face-swapping deepfakes train on hundreds or thousands of images of two people. The AI learns to map one person's facial expressions, movements, and angles onto the other person's face, creating video where one person appears to be someone else.
Diffusion Models
Newer deepfake techniques use diffusion models — the same technology behind AI image generators like Stable Diffusion and DALL-E. These models can generate highly realistic images from text descriptions, modify existing photos to change a person's appearance, or create entirely fictional people who do not exist.
Diffusion models have dramatically lowered the barrier to creating convincing fake images. A single reference photo can be enough to generate multiple realistic images of a person in different scenarios.
Voice Cloning
Audio deepfakes use AI to clone a person's voice from sample recordings. Modern voice cloning systems need as little as three seconds of audio to create a convincing voice replica that can say anything the operator types. The cloned voice captures not just the tone and pitch but also speech patterns, accent, and emotional inflection.
Voice cloning has been used in social engineering attacks where a cloned executive's voice authorizes fraudulent wire transfers over the phone. In one documented case, criminals used AI voice cloning to impersonate a company CEO and successfully direct a $243,000 transfer.
Visual Telltale Signs
Despite their improving quality, many deepfakes still exhibit artifacts that careful observers can detect.
Eye and Gaze Anomalies
Eyes are one of the hardest features for AI to replicate convincingly. Look for irregular pupil shapes or sizes between the two eyes, unnatural light reflections in the irises (real reflections are consistent between both eyes), a fixed or unnatural gaze that does not track naturally with head movements, and inconsistent blinking patterns (early deepfakes rarely blinked, though this has improved).
Edge and Boundary Issues
The boundary between the generated face and the surrounding image is a common failure point. Watch for a visible seam or color mismatch where the face meets the hair, neck, or ears. Skin texture that changes abruptly at the jawline or forehead edge, earrings, glasses, or hair that appear to flicker or distort during movement, and teeth that appear blurred, merged, or inconsistent between frames are all indicators.
Lighting and Shadow Inconsistencies
AI often struggles with consistent lighting across a scene. The face might be lit from a different angle than the background, shadows may fall in contradictory directions, or skin reflectivity may not match the lighting environment.
Temporal Artifacts in Video
In video deepfakes, watch for momentary glitches between frames — brief distortions when the subject turns their head quickly, expressions that appear to jump unnaturally between frames, or background elements that warp or shimmer near the subject's face.
Audio-Visual Sync
In video deepfakes with audio, lip movements may not perfectly match the spoken words. This is particularly noticeable with consonant sounds like "b," "m," and "p" that require specific lip positions. Slight delays or misalignments between lip movements and audio are strong indicators.
Deepfake Detection Tools
Several tools and services have been developed to help identify deepfake content.
Microsoft Video Authenticator analyzes photos and videos and provides a confidence score indicating the likelihood that the media has been artificially manipulated. It detects blending boundaries and subtle grayscale elements invisible to the human eye.
Intel FakeCatcher uses a different approach, looking for biological signals in real video — specifically, subtle changes in facial blood flow (photoplethysmography) that are present in authentic video but absent in deepfakes.
Deepware Scanner is a mobile app that analyzes video for deepfake indicators, providing accessibility for non-technical users.
Content provenance standards like C2PA (Coalition for Content Provenance and Authenticity) embed cryptographic metadata in media files at the point of capture, creating an unbroken chain of authenticity from camera to publication. This approach verifies that content is real rather than trying to prove that content is fake.
Social Engineering With Deepfakes
Deepfakes are increasingly used as social engineering weapons. Beyond voice cloning for financial fraud, attackers use deepfake video in live video calls to impersonate executives, colleagues, or authority figures. Deepfake images create convincing fake social media profiles for romance scams or business impersonation. Fabricated compromising images are used for extortion and blackmail.
These attacks exploit the human tendency to trust what we see and hear. When a video call appears to show your CEO asking for an urgent fund transfer, the natural instinct is to comply — especially if the voice and appearance are convincing.
Protection Strategies
Establish verification procedures for sensitive requests. Never authorize financial transfers, credential changes, or data access based solely on a phone call or video call, regardless of who appears to be calling. Use a separate communication channel to verify.
Be skeptical of viral media. Before sharing or reacting to shocking videos or images, especially those involving public figures, check whether reputable news sources have verified the content.
Examine media critically. When something seems suspicious, slow down the video, zoom in on faces, check for the visual artifacts described above, and look for the original source of the content.
Support provenance standards. As C2PA and similar standards gain adoption, prefer media from sources that include provenance metadata proving the content's origin and integrity.
The arms race between deepfake creation and detection will continue to escalate. While detection tools will improve alongside the generation technology, developing a critical eye and maintaining verification habits remain your most reliable defenses against AI-generated deception.
Share this article

Raimundo Coelho
Cybersecurity specialist and technology professor with over 20 years of experience in IT. Graduated from Universidade Estácio de Sá. Writing practical guides to help you protect your data and stay safe in the digital world.