Mozilla Data Collective

Live

Community-centered dataset platform for sharing AI-relevant data under contributor-controlled licenses, access rules, and governance terms.

Website
mozilladatacollective.com
Latest update
Nov 25, 2025 Exclusive-hosting FAQ describes dataset protections and management controls
Primary approach
New infrastructure
Also uses
Technical blocking
Pipeline
Train / Fine-tune

What it is

Mozilla Data Collective is Mozilla Foundation’s platform for hosting and distributing datasets on terms defined by the people or organizations that steward them. Providers can publish under existing licenses or custom terms, limit access to certain downloader types, and layer conditions such as recognition, exchange, or compensation onto reuse.

For this catalog, it is most important as a community-governed alternative to uncontrolled scraping or one-size-fits-all open release. The platform combines hosted dataset delivery, authenticated access, API-based downloads, and stewardship-oriented protections so communities can share AI-relevant data without giving up control over how it circulates.

Limitations

Mozilla Data Collective is still an emerging platform, and its public materials are stronger on governance and access design than on stable public counts of participating providers or transaction volume.

Evidence trail

Adoption signals

Data volume

470+ datasets

Based on the live homepage inventory snapshot observed on April 3, 2026; dated announcement and FAQ posts explain the platform model but do not publish the current catalog count.