What it is
Mozilla Data Collective is Mozilla Foundation’s platform for hosting and distributing datasets on terms defined by the people or organizations that steward them. Providers can publish under existing licenses or custom terms, limit access to certain downloader types, and layer conditions such as recognition, exchange, or compensation onto reuse.
For this catalog, it is most important as a community-governed alternative to uncontrolled scraping or one-size-fits-all open release. The platform combines hosted dataset delivery, authenticated access, API-based downloads, and stewardship-oriented protections so communities can share AI-relevant data without giving up control over how it circulates.
Limitations
Mozilla Data Collective is still an emerging platform, and its public materials are stronger on governance and access design than on stable public counts of participating providers or transaction volume.
Evidence trail
Adoption signals
Data volume
470+ datasets
Based on the live homepage inventory snapshot observed on April 3, 2026; dated announcement and FAQ posts explain the platform model but do not publish the current catalog count.
- Homepage lists 470+ datasets Apr 03, 2026