Algorithm A set of rules or instructions a computer follows to solve a problem or complete a task. In AI detection, algorithms analyse patterns in video, audio, images and text to determine whether a human or an AI system created the content. AI Aware can also identify when content is a hybrid of the two..
Artificial Intelligence (AI) Technology that allows machines to carry out tasks that would normally require human intelligence. Examples inlclude understanding language, recognising faces, or generating a realistic video. AI Aware builds detection tools that identify when AI has created or manipulated content, helping professionals distinguish the real from the synthetic.
AI Audio Detection The process of analysing a sound recording to determine whether artificial intelligence generated or manipulated it. AI audio detection tools look for spectral artefacts, unnatural breath patterns, and voice synthesis signatures that indicate a recording is not genuine. AI Aware’s audio deepfake detector identifies AI-generated speech, voice clones, and other synthetic acoustics.
AI Content Detection A category of software tools that analyse text, images, video, or audio to assess whether artificial intelligence produced or altered the content. AI content detection is used across legal, education, publishing, and security contexts. AI Aware offers detection tools covering all four content types.
AI-Generated Content Any text, image, video, or audio that an AI system has produced, either fully or in part. AI-generated content ranges from a chatbot writing an email to a generative model synthesising a photorealistic video of a person who does not exist.
AI Image Detection This process determines whether an AI image generator, such as Midjourney, DALL-E, or Stable Diffusion, created an image. Detection models look for characteristic patterns, pixel-level artefacts, and generative signatures that reveal synthetic origin. AI Aware’s image detector analyses still images for signs of AI generation.
AI Text Detection The process of analysing written content to identify whether a human or an AI language model wrote it. Text detectors look for statistical patterns, sentence structure, and linguistic fingerprints associated with AI writing tools such as ChatGPT. AI Aware’s text detector is used by educators, publishers, and legal professionals to verify the origin of written content.
AI Video Detection The process of examining video footage to determine whether AI generated or manipulated it. Detection systems analyse visual artefacts, temporal inconsistencies, and generative model fingerprints frame by frame. See also: Deepfake Detection.
Artefact (AI) A visual, audio, or data anomaly left behind when an AI system generates or edits content. In AI-generated video, artefacts might appear as unnatural blurring around the hairline, flickering at the jaw edge, or inconsistent lighting between frames. Detection tools use these artefacts as evidence of synthetic origin.
B
Bias (in AI) When an AI system produces systematically skewed or unfair outputs because it was trained on unrepresentative or imbalanced data. Bias in AI detection models can cause them to flag certain types of content more readily than others. This is why AI Aware uses diverse training datasets, tests detection findings using demographic groups and publishes confidence scores rather than simple yes/no results.
C
Chatbot An AI tool that holds text-based conversations with users. Chatbots such as ChatGPT are also capable of generating large volumes of written content. This has created demand for AI text detection tools in education, publishing, financial services and legal contexts.
Cloned Voice An AI-generated replica of a specific person’s voice, produced by training a model on recordings of that individual speaking. Voice cloning tools can produce convincing imitations with only a few seconds of sample audio. AI Aware’s audio deepfake detector identifies cloned voices in recordings.
Computer Vision A field of AI that enables machines to interpret and analyse visual information from images and video. Computer vision underpins AI image and video detection, allowing systems to identify the subtle visual inconsistencies that reveal synthetic content.
Confidence Score A numerical measure, usually expressed as a percentage, indicating how certain an AI detection model is about its result. Rather than returning a binary yes/no answer, AI Aware provides confidence scores. This means that users can understand the strength of a finding and make informed decisions.
Content Authenticity The quality of being verifiably genuine. In other words, checking that a piece of media was created by the person or in the circumstances claimed, and has not been manipulated. Establishing content authenticity is a growing challenge as AI generation tools become more accessible. AI Aware’s detection suite supports content authenticity verification for video, audio, image, and text.
D
Data Labelling The process of tagging training data with accurate descriptive information so an AI model can learn from it. For example, labelling images as “AI-generated” or “human-created” enables a detection model to learn the difference between the two.
Dataset A structured collection of data used to train, test, or evaluate an AI model. The breadth and diversity of a dataset directly affects how well a model generalises to new content. This is a key consideration in AI detection. Models must handle content produced by a constantly evolving range of generation tools.
Deep Learning An advanced form of machine learning that uses multi-layered neural networks to learn complex patterns from large amounts of data. Deep learning is the technology behind most modern AI generation tools. AI companies also use this to build the detection systems that identify their output
Deepfake A piece of video, audio, or image content in which AI has fabricated or manipulated the likeness or voice of a real person, typically without their consent. The word combines “deep learning” and “fake.” Deepfakes can create fraud, disinformation, identity theft, and non-consensual content. See: Deepfake Detection.
Deepfake Detection The process of analysing media to determine whether AI created or manipulated it, specifically to fabricate a person’s appearance, voice, or actions. Detection systems examine visual artefacts, temporal consistency, audio forensics, and generative model signatures. AI Aware’s deepfake detector identifies AI-generated video and manipulated audio across a wide range of generation platforms including Synthesia, DeepSwapFace and HeyGen.
Deepfake Video A video in which AI has generated or manipulated a person’s face, body, or voice. This is typically to make them appear to say or do something they never did. It is also used to create a convincing synthetic person from scratch. Deepfake videos increasingly create fraud, political disinformation, and targeted harassment. See: AI Aware’s deepfake detector.
Disinformation False or misleading information deliberately spread to deceive. AI-generated content has significantly lowered the cost and effort of producing disinformation at scale. This makes detection tools increasingly important for media organisations, governments, and platforms.
E
Ensemble Model A detection approach that combines multiple AI models to achieve higher accuracy than any single model alone. AI Aware uses a proprietary ensemble framework to improve detection rates across different content types and generation tools. We detect hybrid content (material that a human creates and AI partially modifies, or that AI generates and a person rewrites).
F
Face Swap A technique in which AI replaces one person’s face in a video with another’s in real time or in post-production. Creators use face swapping as one of the most common methods to make deepfake videos.
Forensic Analysis (Media) The application of scientific and technical methods to examine media files for evidence of manipulation or synthetic origin. AI Aware’s detection tools perform forensic-grade analysis of video, audio, and images. In addition to our self service model, our experts can provide additional structured results suitable for use in professional and legal investigations.
G
GAN (Generative Adversarial Network) An AI architecture in which two neural networks (a generator and a discriminator) compete against each other. The generator tries to produce convincing fake content; the discriminator tries to identify fakes. Over time, this competition produces extremely realistic synthetic media. GANs are responsible for many of the most convincing deepfakes. AI detection tools specifically target the characteristic “GAN signatures” left in generated content.
Generative AI AI that creates new content (text, images, video, audio, or code) rather than simply classifying or analysing existing data. Generative AI tools include ChatGPT (text), Midjourney (images), Synthesia (video), and ElevenLabs (audio). The rise of generative AI is the primary driver of demand for AI detection tools.
H
Hallucination (AI) When an AI model generates false or fabricated information and presents it as fact. In AI-written text, hallucinations can include invented legal citations, non-existent research papers, or incorrect statistics. AI Aware’s text detector helps legal and publishing professionals identify AI-written documents that may contain hallucinated content.
Human-in-the-Loop An AI system design in which a human reviews or validates AI outputs before they are acted upon. AI Aware recommends a human-in-the-loop approach for high-stakes detection decisions. We provide a forensics service that incorporates this and in-depth statistical analysis from our scientific experts, This is recommended in high stakes cases. Please contact us for more information.
K
KYC (Know Your Customer) A regulatory process used by financial institutions and other organisations to verify the identity of their customers, typically using documents and video-based identity checks. KYC processes are an increasingly common target for deepfake attacks, in which fraudsters use AI-generated video to impersonate real individuals. AI detection tools can add a layer of synthetic media verification to video KYC workflows.
L
Large Language Model (LLM) An AI system trained on vast quantities of text to understand and generate human language. LLMs such as GPT-4 power tools like ChatGPT and are responsible for much of the AI-generated written content that organisations now need to detect. See: AI Aware’s text detector.
Lip Sync (AI) A technique in which AI alters the mouth movements in a video to match a different audio track. This makes it appear that a person is saying something they never said. Lip sync manipulation is one of the most common forms of deepfake video used in political disinformation.
M
Machine Learning (ML) A branch of AI in which systems learn from data and improve over time without being explicitly programmed. Machine learning is the foundation of both AI content generation and AI content detection. The models that create deepfakes and the models that detect them are both products of machine learning. Although, at AI Aware our detection techniques incorporate a wide range of additional statistical approaches.
Manipulation (Media) Any alteration to a video, image, or audio file that changes its meaning or misrepresents what was originally recorded. AI manipulation tools can alter facial expressions, voices, backgrounds, and speech in ways that are invisible to the human eye but detectable by forensic analysis tools.
Metadata Data that describes other data. An example is information embedded in a video file about when it was recorded, on what device, and with what software. Metadata analysis is one of several methods AI detection systems use to identify synthetic or manipulated media.
Model In AI, a model is a trained system that takes input data and produces outputs such as predictions, classifications, or generated content. AI Aware’s detection models are trained on large datasets of both human-created and AI-generated content. This enables us to identify synthetic media with high accuracy.
Multi-Modal Detection The ability to detect AI-generated content across multiple media types (video, audio, image, and text) within a single platform. AI Aware offers multi-modal detection across all four content types, making it suitable for organisations that need to verify diverse forms of media.
N
Neural Network A computational system modelled loosely on the human brain, consisting of layers of interconnected nodes that process information. Neural networks power both the generation of synthetic media and the detection systems designed to identify it.
NLP (Natural Language Processing) The field of AI concerned with enabling computers to understand, interpret, and generate human language. NLP underpins AI text generation tools and the detection systems built to identify AI-written content.
O
Open Source AI AI tools whose underlying code is freely available for anyone to use, modify, and distribute. Open source AI has significantly lowered the barrier to creating synthetic media, as tools like Stable Diffusion can be downloaded and run locally without cost or technical restriction.
Out-of-Distribution Content Content that differs significantly from the data an AI model was trained on. An example is a video produced by a newly released AI generator that a detection model has not yet encountered. Out-of-distribution robustness is a key challenge in AI detection. AI Aware’s ensemble model and wide ranging statistical approach is specifically designed to improve performance on out-of-distribution and hybrid content.
Overfitting When a model learns the training data too precisely and performs poorly on new, unseen data. Avoiding overfitting is a core challenge in building reliable AI detection tools, as the model must generalise to AI-generated content it has never seen before.
P
Prompt The instruction or input given to a generative AI system to produce an output. A user might prompt an AI video generator to create a video of a named public figure delivering a statement (and so produce a deepfake). Understanding prompt-based generation is important context for anyone assessing AI-generated content.
Provenance (Media) The verified origin and chain of custody of a piece of media. Examples are who created it, when, with what tools, and whether it has been altered. Establishing media provenance is central to content authentication and is particularly important for legal evidence, journalism, and publishing.
R
Real-Time Detection The ability to analyse and flag synthetic media as it is being streamed or transmitted, rather than after the fact. Real-time detection is particularly important for live video call verification and broadcast monitoring.
Reinforcement Learning A machine learning method in which an AI system learns by receiving rewards for desired outcomes and penalties for undesired ones. Reinforcement learning is used in the development of increasingly sophisticated generative models. This has contributed to an ongoing “arms race” between content generation and content detection.
S
Supervised Learning A machine learning approach in which a model trains on labelled data (i.e. examples where the correct answer is already known). Data scientists train AI detection models partly through supervised learning, using large datasets of labelled human-created and AI-generated content.
Synthetic Data Artificially generated data, used as a substitute for real-world data in AI training. Synthetic data is useful for building detection models, allowing them to train on diverse examples of AI-generated content without requiring equally large quantities of real media.
Synthetic Identity Fraud A form of fraud in which a criminal uses AI-generated or manipulated media (such as deepfake video or cloned audio) to impersonate a real or fictitious person. Synthetic identity fraud is one of the fastest-growing enterprise threats, particularly in remote hiring and financial services.
Synthetic Media Any media content that an AI system has generated or significantly altered. Synthetic media includes deepfake videos, AI-generated images, voice-cloned audio, and AI-written articles. Detecting synthetic media is the core purpose of AI Aware’s detection platform.
T
Temporal Consistency In video analysis, the degree to which movement, lighting, and visual elements remain coherent between frames over time. AI-generated video often exhibits subtle temporal inconsistencies such as flickering artefacts, unnatural motion, or discontinuities. Detection systems use these to identify synthetic content.
Training Data The data used to teach an AI model how to perform its task. For AI detection models, training data includes large quantities of both genuine human-created content and AI-generated content, allowing the model to learn the differences between the two.
U
Unsupervised Learning A machine learning approach in which a model identifies patterns in data without labelled examples. Unsupervised learning is used in some detection approaches to identify anomalies and outliers that may indicate synthetic origin.
V
Voice Cloning The use of AI to create a synthetic replica of a specific person’s voice, trained on recordings of that individual. Voice cloning tools can produce convincing imitations from very short audio samples and are increasingly used in fraud, social engineering attacks, and non-consensual content. AI Aware’s audio deepfake detector detects cloned voices in audio and video recordings.
Voice Assistant An AI system that understands and responds to spoken commands. Examples include Siri, Alexa, and Google Assistant. Voice assistants rely on speech synthesis technology that shares characteristics with voice cloning tools. This makes AI audio detection an important capability for verifying whether a voice is human or synthetic.
