Catalog

Browse initiatives shaping AI data access, licensing, and enforcement. Search by name or evidence, filter by status, and use the approach sections below when you want the table view.

Updated May 14, 2026

Status
More filters All filters off

Search and status are always visible. Open this panel when you want to sort, include archived entries, or narrow the catalog by approach type and data type.

Catalog scope

Archived entries stay on the site for historical context, but they are hidden by default because no newer public activity was found during review.

Approach types

Approach types describe the main mechanism in play, such as licenses, registries, marketplaces, tollgates, blocking, or certification.

Data types

Data types help you focus on the kinds of inputs an initiative covers, like text, images, audio, video, or code.

54 initiatives

Showing current initiatives. Archived entries are hidden by default.

Browse by primary approach

Use these jump links when you want to scan the table section by section.

Primary approach type

Preference signal

Signals that express whether AI systems may crawl, train on, or reuse content, usually through metadata, headers, or other machine-readable notices.

Initiative Website Latest update Approach type

Creative Commons framework for communicating expectations and building governance infrastructure around AI use of shared knowledge.

More context
creativecommons.org/cc-signals May 13, 2026 Creative Commons outlines signals-to-infrastructure plan Preference signal Also uses New infrastructure

Internet Engineering Task Force is working on a standardized preference signal for AI agents and crawlers ("building blocks that allow for the expression of preferences about how content is collected and processed for Artificial Intelligence (AI) model development, deployment, and use.")

More context
datatracker.ietf.org/wg/aipref/about Apr 27, 2026 Vocabulary draft 06 updated Preference signal

Embedded image and video metadata fields for expressing whether assets may be used in data-mining and generative-AI training datasets.

More context
ns.useplus.org/LDF/ldf-XMPSpecification Mar 30, 2026 IPTC publishes version 2.0 of AI opt-out guidelines Preference signal Pipeline: Collect -> Train -> Fine-tune
TDM·AI In progress

Asset-level protocol for binding machine-readable TDM and AI-training preferences to digital works.

More context
Evidence trail
Data types
Multimodal
Uses
IETF AI Preferences (AIPref)
tdmai.org Nov 04, 2025 Usage vocabulary updated Preference signal Also uses New infrastructure Pipeline: Collect -> Train -> Fine-tune

W3C specification for expressing text and data mining permissions via a well-known JSON file, designed for EU DSM Directive compliance.

More context
w3.org/community/tdmrep Oct 01, 2025 Community group notes outline 2025 alignment work with AI-Pref Preference signal Also uses Formal license

robots.txt can be used to express AI crawler access preferences. See example from OpenAI. Additionally, the X-Robots-Tag response header allows servers to send crawler directives via HTTP response headers.

More context
Evidence trail
Data types
Web content
developers.openai.com/api/docs/bots Sep 17, 2025 OpenAI bots documentation Preference signal

Content Credentials-based preference system for signaling that generative AI should not train on or use a creator's files.

More context
helpx.adobe.com/creative-cloud/apps/adobe-content-authenticity/generative-ai-training-preferences.html Sep 02, 2025 Generative AI training and usage preference documentation Preference signal Pipeline: Train -> Generate
Spawning ai.txt In progress

A proposed convention for AI-specific crawler directives via an `ai.txt` file.

More context
Evidence trail
Data types
Web content
site.spawning.ai/spawning-ai-txt Aug 28, 2025 Improved crawler-control post published Preference signal
User Intents In progress

Proposed AT Protocol mechanism for users to declare data-reuse preferences such as generative-AI training.

More context
Evidence trail
Uses
IETF AI Preferences (AIPref)
demo.user-intents.org Mar 08, 2025 Proposal discussion opened Preference signal Pipeline: Collect -> Train -> Retrieve
trust.txt Live

Publisher-oriented trust file that can also declare whether AI training is allowed through a machine-readable `datatrainingallowed=` field.

More context
Data types
Web content
Signals
Users about 3,000 participating publishers
journallist.net Feb 21, 2025 Browser extension launch described the network at about 3,000 publishers Preference signal Also uses New infrastructure Pipeline: Train

Platform-level HTML and HTTP directives that tell external AI datasets and models not to use artists' work unless they opt in.

More context
deviantart.com/team/journal/UPDATE-All-Deviations-Are-OptedOut-of-AI-Datasets-934500371 Apr 17, 2024 New DeviantArt Studio supports NoAI label presets Preference signal Pipeline: Collect -> Train
TK Labels Live

Local Contexts labels that let Indigenous communities express culturally specific conditions for access and reuse of knowledge and data.

localcontexts.org/labels/traditional-knowledge-labels Jun 17, 2022 Oriana TV case study shows TK Labels in use Preference signal Pipeline: Collect -> Retrieve -> Train

Primary approach type

Formal license

Formal legal terms or license language that grant, restrict, or condition AI-related reuse of content, datasets, or model inputs.

Initiative Website Latest update Approach type

A machine-readable licensing schema for clearly signaling reuse permissions and conditions (including payment or use restriction).

More context
Data types
Web content
Signals
Users 1,500+ organizations Data billions of web pages
rslstandard.org Dec 10, 2025 Technical standards released Formal license Also uses Preference signal

Machine-readable licensing layer that lets websites declare AI usage terms and pricing.

More context
Evidence trail
Data types
Web content
copyright.sh Oct 28, 2025 WordPress plugin launched Formal license
AI-Ready Licenses In progress

Research-backed proposal for modular standard data licenses tailored to AI data sharing.

mlcommons.org/2025/03/unlocking-data-collab Mar 17, 2025 Research findings published Formal license Pipeline: Collect -> Train -> Fine-tune -> Retrieve

Primary approach type

Licensing collective

Shared bargaining, aggregation, or rights-management structures that let many publishers or creators negotiate AI access together.

Initiative Website Latest update Approach type

Publisher coalition building shared standards and licensing frameworks for responsible AI use of journalism.

More context
Data types
Text
Signals
Users 6 founding publisher members
spurcoalition.org May 11, 2026 Mediahuis joins SPUR as a founding member Licensing collective Also uses New infrastructure Pipeline: Train -> Retrieve

Coalition pursuing licensing, compensation, and enforcement for publisher content used by AI systems.

More context
publishersrights.org May 02, 2026 Business model page describes enforcement and future AI licensing Licensing collective Pipeline: Train -> Retrieve

Voluntary collective licensing program covering internal AI reuse, external AI training, and transactional AI uses for copyrighted works.

More context
copyright.com/solutions-rightsholders-ai Mar 03, 2026 Four AI licensing options announced Licensing collective Also uses Formal license Pipeline: Train -> Fine-tune -> Retrieve -> Generate

UK collective licensing offer for AI model training, fine-tuning, and RAG over published text content.

More context
cla.co.uk/ai-and-copyright Mar 01, 2026 PLS launches first opt-in stage for collective AI licensing Licensing collective Also uses Formal license Pipeline: Train -> Fine-tune -> Retrieve

Trade alliance of dataset licensors pushing for legal clarity, ethical sourcing, and scalable licensing markets for AI training data.

More context
Signals
Users 12 announced members
thedpa.ai Dec 10, 2024 DPA welcomed five new members Licensing collective

Primary approach type

Marketplace

Commercial platforms or brokers that package, list, or sell access to datasets, content libraries, or licensing opportunities for AI use.

Initiative Website Latest update Approach type

Licensed access to Stack Overflow's developer knowledge corpus for AI training, fine-tuning, RAG, and agentic use cases.

More context
Data types
TextCode
Signals
Data 83M+ human-verified questions and answers
stackoverflow.co/data-licensing May 14, 2026 Current product page cites 83M+ questions and answers Marketplace Also uses Formal license Pipeline: Train -> Fine-tune -> Retrieve

Opt-in music dataset licensing program that pays rights holders for AI training use of tracks and catalogs.

More context
Data types
Music
Signals
Users 3,000+ music catalogs Data 14M+ opted-in songs Payments nearly $10M annual revenue from eight contracts
sourceaudio.com/blog/2025/06/05/a-new-chapter-in-music-licensing May 06, 2026 April recap covers AI training data economy panel Marketplace Pipeline: Train -> Fine-tune

Rights-cleared book licensing platform for AI training, reference, and transformative use.

More context
Data types
Text
Signals
Users 100+ bestselling authors
createdbyhumans.ai May 02, 2026 About page describes mission and company story Marketplace Pipeline: Train -> Retrieve
Protege Live

AI training data platform for compliant exchange of proprietary, real-world datasets across sectors.

More context
withprotege.ai Feb 12, 2026 HC1 partnership adds large de-identified lab data repository Marketplace Also uses New infrastructure Pipeline: Train -> Fine-tune

Paid marketplace routing premium publisher content into Microsoft Copilot, MSN, and Discover experiences.

More context
Data types
Text
Signals
Users 7 launch publisher partners
about.ads.microsoft.com/en/blog/post/february-2026/building-toward-a-sustainable-content-economy-for-the-agentic-web Feb 03, 2026 Building Toward a Sustainable Content Economy for the Agentic Web Marketplace Pipeline: Retrieve -> Generate
Defined.ai Live

Marketplace for ethically sourced, annotated datasets used to train and fine-tune AI systems.

More context
Data types
Multimodal
Signals
Payments several partners generate $1M+/year
defined.ai Jan 27, 2026 Defined.ai reports 2025 marketplace growth Marketplace Pipeline: Train -> Fine-tune

AI data marketplace for licensed multimedia datasets, now being integrated into Cloudflare's AI crawl and content-access stack.

More context
Data types
Multimodal
humannative.ai Jan 15, 2026 Cloudflare acquisition and integration announcement Marketplace Also uses New infrastructure Pipeline: Train

Rights-cleared image dataset marketplace that uses Zedge and GuruShots creator networks to supply AI training data.

More context
Data types
ImagesVideo
Signals
Data approximately 30 million rights-cleared images
dataseeds.ai Oct 20, 2025 Enterprise customer base growth update Marketplace Pipeline: Train -> Fine-tune

Contributor compensation and licensed visual training-data program tied to Bria's commercially safe generative AI stack.

More context
Data types
Images
Signals
Users 30+ data partners
bria.ai/artist-program Sep 18, 2025 Platform release highlights rights-clear models Marketplace Pipeline: Train

A 50/50 revenue-share platform connecting publishers with AI companies, with 700+ publishers signed up including major news outlets.

More context
Data types
Text
Signals
Users 700+ publishers
prorata.ai Sep 05, 2025 Gist Answers launched Marketplace Also uses Licensing collective / Tollgate
Dappier Live

Rights-cleared content marketplace and monetization layer for RAG, assistants, and other AI applications.

More context
Data types
Text
dappier.com/marketplace Aug 18, 2025 Licensing program launch announced Marketplace Also uses Tollgate Pipeline: Retrieve -> Generate

Music licensing platform for rights-cleared AI training data and audio assets.

More context
Data types
Music
Signals
Data 4.4M+ hours of audio / 32B metadata text pairs / 3PB music data
gcx.co May 15, 2025 Rightsify says Hydra grew out of work on the GCX dataset service Marketplace Pipeline: Train -> Fine-tune

Licensed-content marketplace for the faith ecosystem, seeded with a pooled guarantee for AI assistants and search experiences.

More context
Signals
Payments $5M pooled guarantee
docs.gloo.com/product-guides/licensing Feb 20, 2025 Gloo launches AI Licensing with pooled guarantee Marketplace Pipeline: Retrieve -> Generate

Primary approach type

Tollgate

Access layers that require payment, metering, or authenticated entry before content can be fetched, queried, or reused for AI workflows.

Initiative Website Latest update Approach type

A set of tools to block or charge for scraping; includes AI Audit dashboard, managed robots.txt, and pay-per-crawl marketplace.

More context
Data types
Web content
Signals
Users 3.8M+ domains on managed robots.txt Data 1B+ 402 responses/day
blog.cloudflare.com/control-content-use-for-ai-training Apr 17, 2026 Redirects for AI Training launched Tollgate Also uses Technical blocking / Preference signal
TollBit Live

Add subdomains to make content accessible to AI with blocking and monetization.

More context
Data types
Text
Signals
Users 4,000+ premium publishers
tollbit.com Dec 16, 2025 Imperva integration announced Tollgate Also uses Marketplace

Primary approach type

Technical blocking

Technical controls that deny, rate-limit, or otherwise constrain crawling, downloading, or automated collection unless a requester meets specific conditions.

Initiative Website Latest update Approach type

Edge bot-management layer for detecting and blocking AI crawlers and fetchers that scrape website content.

More context
fastly.com/products/fastly-ai-bot-management Apr 16, 2026 Threat Insights post covers bot traffic and AI-bot access Technical blocking Pipeline: Collect -> Retrieve

Enterprise anti-scraping product that detects and blocks persistent content scrapers, now positioned as part of broader AI and LLM bot management.

More context
akamai.com/products/content-protector Apr 08, 2026 Publishing-focused AI bot report released Technical blocking Pipeline: Collect -> Retrieve

A simple anti-scraping tool intended to protect datasets from basic crawlers/scrapers.

More context
Uses
Cloudflare AI Crawl Control
github.com/Responsible-Dataset-Sharing/easy-dataset-share Jan 09, 2026 easy-dataset-share paper published Technical blocking Pipeline: Collect

Hugging Face Hub access-control feature that requires users to request approval before downloading a dataset.

More context
huggingface.co/docs/hub/datasets-gated Aug 18, 2025 Hugging Face docs updated with EU-specific gated-dataset guidance Technical blocking Also uses New infrastructure Pipeline: Collect -> Train -> Fine-tune

Primary approach type

New infrastructure

New registries, protocols, hosting patterns, or coordination layers that make governed data access, compliance, or contribution easier to operate.

Initiative Website Latest update Approach type
IETF Web Bot Auth In progress

Working group standardizing cryptographic authentication for bots and AI agents on the web.

More context
Evidence trail
Data types
Web content
datatracker.ietf.org/wg/webbotauth/about Apr 01, 2026 Use cases draft updated New infrastructure Pipeline: Collect -> Retrieve
IAB Tech Lab CoMP In progress

Standards initiative for machine-readable commercial agreements, access policies, and monetization workflows before AI crawling or content use.

More context
Data types
Web contentText
iabtechlab.com/standards/comp-content-monetization-protocols-initiative Mar 10, 2026 CoMP v1.0 opened for public comment New infrastructure Also uses Tollgate / Marketplace Pipeline: Collect -> Train -> Retrieve
CommonsDB Live

Registry for public-domain and openly licensed works using verifiable rights declarations and content-derived identifiers.

More context
Evidence trail
Signals
Data 300,000+ declarations
commonsdb.org Jan 20, 2026 Feasibility study part 2 published New infrastructure Pipeline: Collect -> Train -> Retrieve

Enterprise-grade APIs and structured dumps for Wikipedia and sister projects, designed for large-scale reuse in AI, search, and knowledge graphs.

More context
Evidence trail
Data types
TextStructured data
Signals
Users 10+ announced partners Data 920+ datasets / 300M+ unique project pages
enterprise.wikimedia.com Jan 15, 2026 New enterprise partners announced New infrastructure Pipeline: Retrieve -> Train
Amlet Live

AI content registry for publishers and authors that links ownership proof, TDM registration, and licensing rules for AI reuse.

More context
amlet.ai Dec 15, 2025 TDM registry case made for AI licensing workflows New infrastructure Also uses Formal license Pipeline: Collect -> Train -> Retrieve

Community-centered dataset platform for sharing AI-relevant data under contributor-controlled licenses, access rules, and governance terms.

More context
mozilladatacollective.com Nov 25, 2025 Exclusive-hosting FAQ describes dataset protections and management controls New infrastructure Also uses Technical blocking Pipeline: Train -> Fine-tune

Proposal for a commons-based infrastructure for large-scale access to digitized European books with conditional commercial access.

More context
Evidence trail
Data types
Text
openfuture.eu/publication/outline-for-a-european-books-data-commons Nov 20, 2025 Outline paper published New infrastructure Pipeline: Collect -> Train
SyftBox Live

Open-source protocol for privacy-preserving AI and analytics across distributed datasets without centralizing the underlying data.

openmined.org/syftbox Nov 12, 2025 syft-flwr release demonstrates active federated learning workflows on SyftBox New infrastructure Pipeline: Train -> Retrieve

OpenMined architecture for permissioned data contribution and attribution-based access in AI systems.

openmined.org/attribution-based-control Oct 06, 2025 OpenMined explains attribution-based control New infrastructure

Registry and opt-out workflow for marking works that should not be used in future AI training datasets.

More context
Data types
Images
haveibeentrained.com Sep 15, 2025 Face Reveal launched for Have I Been Trained New infrastructure Also uses Preference signal Pipeline: Collect -> Train
FlexOlmo In progress

Distributed language-model training approach that lets data owners contribute experts without sharing raw data or giving up opt-out control.

More context
allenai.org/blog/flexolmo Jul 09, 2025 Ai2 introduces FlexOlmo and invites organizations with sensitive data to participate New infrastructure Pipeline: Train

Participatory governance framework for communities to define conditions for data reuse, including AI training.

blog.thegovlab.org/reimagining-data-governance-for-ai-operationalizing-social-licensing-for-data-reuse May 13, 2025 Operationalization report released New infrastructure Pipeline: Collect -> Train -> Fine-tune -> Retrieve
Credtent Live

Independent creative registry for opting out of AI use, licensing content, and certifying human-created work.

More context
Data types
Multimodal
Signals
Users thousands of creators
credtent.org Mar 31, 2025 Credtent marketplace and opt-out announcement New infrastructure Also uses Marketplace / Certification Pipeline: Train

Python package and API helpers for checking whether works are opted out before model training.

github.com/Spawning-Inc/datadiligence Oct 09, 2024 PyPI release 0.1.7 published New infrastructure Pipeline: Collect -> Train -> Fine-tune

Primary approach type

Certification

Third-party review, badges, or verification programs that signal whether a model, company, or dataset follows stated sourcing or licensing requirements.

Initiative Website Latest update Approach type

Certification program for AI models and companies that meet stated consent and licensing criteria for training data.

More context
Signals
Users 16+ announced certified entities
fairlytrained.org Aug 01, 2024 Individual model certification updated Certification