29 terms

A

ai.txt

A proposed standard (similar to robots.txt) specifically for communicating AI-related permissions. Allows site owners to express preferences about AI training, retrieval, and generation uses of their content.

API

A programmatic interface for requesting data or services. In this catalog, APIs can become tollgates, licensed delivery channels, or enforcement points for AI-relevant data access.

B

Bot Authentication

Methods for verifying the identity of an automated client and conveying who operates it. In AI access control, stronger bot identity can make rate limits, differentiated permissions, and enforcement more reliable.

C

Certification

A third-party review or badge signaling that a model, dataset, or vendor meets a stated standard, such as consented training-data sourcing.

D

Data Market Platform

A platform that helps rights holders or data owners offer licensed access to content or datasets for AI use. These systems often handle discovery, permissions, pricing, delivery, and reporting for training or retrieval use cases.

Data Provenance

The documented history of a piece of data: where it came from, how it was processed, and what permissions apply. Critical for compliance and attribution in AI systems.

F

Fairly Trained

A certification program that verifies AI models were trained with proper consent and licensing. Provides third-party validation that a model meets ethical training criteria.

I

IETF AIPref

A proposed Internet Engineering Task Force standard for AI preference signaling. Aims to create a standardized, extensible format for expressing AI-related permissions across the internet.

L

Licensing Collective

An organization that manages rights on behalf of multiple creators, negotiating licenses and distributing payments. Examples include ASCAP for music and newer publisher or creator collectives negotiating AI-related uses.

O

Opt-in

A consent model where explicit permission is required before content can be used. More protective than opt-out but harder to scale. Some licensing collectives operate on opt-in basis.

P

Pipeline Stage

A phase in the AI data lifecycle. Common stages include collection or scraping, training, fine-tuning, retrieval, and user-facing output generation.

Privacy-Preserving Computation

Ways of training, evaluating, or collaborating on AI systems without freely exposing the underlying raw data. In this catalog, it points to approaches where data holders keep more custody and control while still enabling model development.

R

RAG (Retrieval-Augmented Generation)

A technique where AI models retrieve relevant documents at inference time to augment their responses. Raises different licensing questions than training since content is accessed dynamically.

RSL (Really Simple Licensing)

A machine-readable licensing format that allows content creators to specify terms for AI use of their work. Designed to be easily parsed by automated systems for compliance checking.

S

Status: Live

The initiative is publicly live or deployable today, even if adoption is still limited or emerging.

T

TDMRep

Text and Data Mining Reservation Protocol. A W3C Community Group effort for expressing TDM policies, typically via a machine-readable `/.well-known/tdmrep.json` file designed to support EU DSM Directive opt-outs.

Technical Control

Measures that detect, limit, or block unwanted automated access to content. Includes rate limiting, CAPTCHAs, bot detection, and fingerprinting. Unlike preference signals, these are enforced rather than advisory.

Tollgate

A technical mechanism that requires payment, authentication, or rate limiting for automated access to content. Allows monetization of bot traffic while maintaining control over access.

Training Data

The collection of text, images, code, or other content used to train machine learning models. The provenance, licensing, and consent around training data is a central issue in AI governance.

W

W3C

The World Wide Web Consortium, a standards body that incubates web specifications and community group proposals relevant to machine-readable signals and protocols.

X