Best AI Datasets 2026
The #1 ai datasets in 2026 is csv with a Nerq Trust Score of 83/100 (A-), based on Nerq's independent analysis of 50 ai datasets across 5 trust dimensions. Rankings update daily — last updated: 2026-05-06.
According to Nerq's analysis, the top 5 ai datasets by trust score are: 1. csv (83/100), 2. @wordpress/fields (82/100), 3. @wordpress/dataviews (82/100), 4. datasets (81/100), 5. huggingface-hub (81/100). Nerq Trust Scores range from 65 to 83 among the top 50. Scores are based on 5 independent trust dimensions including security, maintenance, and community adoption. Updated daily.
| # | Name | Trust | Grade |
|---|---|---|---|
| 1 | csv | 83 | A- |
| 2 | @wordpress/fields | 82 | A- |
| 3 | @wordpress/dataviews | 82 | A- |
| 4 | datasets | 81 | A- |
| 5 | huggingface-hub | 81 | A- |
| 6 | @sqlrooms/data-table | 76 | B+ |
| 7 | @humanspeak/svelte-virtual-list | 74 | B |
| 8 | @llm-tools/embedjs | 74 | B |
| 9 | @edgeandnode/amp | 74 | B |
| 10 | @friendliai/ai-provider | 72 | B |
Top 50 AI Datasets by Nerq Trust Score
| # | Name | Trust | Grade | Stars | Description |
|---|---|---|---|---|---|
| 1 | csv | 83 | A- | 1552.1k | A mature CSV toolset with simple api, full of options and tested against large datasets. |
| 2 | @wordpress/fields | 82 | A- | 27.3k | DataViews is a component that provides an API to render datasets using different types of layouts (t... |
| 3 | @wordpress/dataviews | 82 | A- | 47.6k | DataViews is a component that provides an API to render datasets using different types of layouts (t... |
| 4 | datasets | 81 | A- | 16324.3k | HuggingFace community-driven open-source library of datasets |
| 5 | huggingface-hub | 81 | A- | 47675.5k | Client library to download and publish models, datasets and other repos on the huggingface.co hub |
| 6 | @sqlrooms/data-table | 76 | B+ | 4.1k | A high-performance data table component library for SQLRooms applications. This package provides fle... |
| 7 | @humanspeak/svelte-virtual-list | 74 | B | 2.6k | A lightweight, high-performance virtual list component for Svelte 5 that renders large datasets with... |
| 8 | @llm-tools/embedjs | 74 | B | 174 | A NodeJS RAG framework to easily work with LLMs and custom datasets |
| 9 | @edgeandnode/amp | 74 | B | 176 | Build and manage blockchain datasets. |
| 10 | @friendliai/ai-provider | 72 | B | 94 | <!-- header start --> <p align="center"> <img src="https://huggingface.co/datasets/FriendliAI/docu... |
| 11 | autoviz | 72 | B | 3.4k | Automatically Visualize any dataset, any size with a single line of code |
| 12 | @datawheel/vizbuilder | 72 | B | 11 | A React component that generates multiple kinds of charts from a tesseract-olap dataset. |
| 13 | vaex | 72 | B | 4.9k | Out-of-Core DataFrames to visualize and explore big tabular datasets |
| 14 | hapi-csv | 72 | B | 441 | Hapi plugin for converting a Joi response schema and dataset to csv |
| 15 | @lovrabet/dataset-mcp-server | 72 | B | 309 | MCP server for Lovrabet Dataset access |
| 16 | vue-dataset | 71 | B | 584 | A vue component to display datasets with filtering, paging and sorting capabilities! |
| 17 | process-versions | 71 | B | 130 | A dataset showing the compiled process version dependencies of different Node.js versions |
| 18 | abses | 70 | B | 116 | ABSESpy makes it easier to build artificial Social-ecological systems with real GeoSpatial datasets ... |
| 19 | @donedeal0/superdiff | 70 | B | 8.4k | Superdiff provides a rich and readable diff for arrays, objects, texts and coordinates. It supports ... |
| 20 | cellxgene-schema | 70 | B- | 267 | Tool for applying and validating cellxgene integration schema to single cell datasets |
| 21 | azureml-opendatasets | 70 | B- | 8.4k | Provides a set of APIs to consume Azure Open Datasets. |
| 22 | ancpbids | 69 | B- | 3.7k | Read/write/validate/query BIDS datasets |
| 23 | @ldo/jsonld-dataset-proxy | 69 | B- | 754 | Edit RDFJS Dataset just like regular JavaScript Object Literals. |
| 24 | @muze-nl/simplystore | 69 | B- | 1 | SimplyStore is a radically simpler backend storage server. It does not have a database, certainly no... |
| 25 | node-dataset | 69 | B- | 100 | A Node.js module for working with data sets created in code, loaded from files, or retrieved from a ... |
| 26 | data_magic | 68 | B- | 14364.0k | Provides datasets to application stored in YAML files |
| 27 | cellxgene | 68 | B- | 868 | Web application for exploration of large scale scRNA-seq datasets |
| 28 | cemba-data | 68 | B- | 4 | Pipelines for single nucleus methylome and multi-omic dataset. |
| 29 | act-atmos | 68 | B- | 1.2k | Package for working with atmospheric time series datasets |
| 30 | @vespermcp/mcp-server | 67 | B- | 244 | AI-powered dataset discovery, quality analysis, and preparation MCP server with multimodal support (... |
| 31 | arcana | 67 | B- | 618 | Abstraction of Repository-Centric ANAlysis (Arcana): A rramework for analysing on file-based dataset... |
| 32 | azureml-contrib-dataset | 67 | B- | 996 | Contains experimental Dataset features for the azureml-core package. |
| 33 | azureml-datadrift | 67 | B- | 220 | Contains functionality for data drift detection for various datasets used in machine learning. |
| 34 | mnemospark | 67 | B- | 544 | mnemospark is an OpenClaw plugin that gives agentic systems instant, secure access to cloud storage,... |
| 35 | @cherrystudio/embedjs | 67 | B- | 541 | A NodeJS RAG framework to easily work with LLMs and custom datasets |
| 36 | baran | 67 | B- | 1189.5k | Text Splitter for Large Language Model Datasets. |
| 37 | devise-pwned_password | 67 | B- | 2970.8k | Devise extension that checks user passwords against the PwnedPasswords dataset https://haveibeenpwne... |
| 38 | sequel_pg | 67 | B- | 6766.5k | sequel_pg overwrites the inner loop of the Sequel postgres adapter row fetching code with a C versio... |
| 39 | gruff | 67 | B- | 3776.5k | Beautiful graphs for one or multiple datasets. Can be used on websites or in documents. |
| 40 | rgeo-shapefile | 67 | B- | 3620.0k | RGeo is a geospatial data library for Ruby. RGeo::Shapefile is an optional RGeo module for reading t... |
| 41 | vesper-wizard | 66 | B- | 966 | Zero-friction setup wizard for Vesper — local MCP server, unified dataset API, and agent auto-config... |
| 42 | @data_wise/hyper-markdown | 66 | B- | 3 | A powerful Vue 3 Markdown editor with rich features including ECharts, D3.js, Mermaid, KaTeX, and da... |
| 43 | anemoi-datasets | 66 | B- | - | A package to hold various functions to support training of ML models on ECMWF data. |
| 44 | @quicknode/hypercore-cli | 66 | B- | 30 | Developer-friendly CLI for streaming and backfilling HyperCore datasets from Quicknode |
| 45 | @opengis/mapdataset | 66 | B- | 5 | A Map Dataset Component displays geospatial vector data with the ability to filter features and colo... |
| 46 | @stdlib/datasets-standard-card-deck | 65 | B- | 2 | A list of two or three letter abbreviations for each card in a standard 52-card deck. |
| 47 | @stdlib/datasets-female-first-names-en | 65 | B- | 87 | A list of common female first names in English speaking countries. |
| 48 | @stdlib/datasets-spache-revised | 65 | B- | 4 | A list of simple American-English words (revised Spache). |
| 49 | @stdlib/datasets-male-first-names-en | 65 | B- | 99 | A list of common male first names in English speaking countries. |
| 50 | ANAC XML Bandi di Gara | 65 | B- | 600 | Software per la gestione dei Bandi di Gara e generazione dataset XML per ANAC (ex AVCP -Legge 190/20... |
How We Rank AI Datasets
These ai datasets are ranked by Nerq Trust Score, which evaluates security, maintenance, community adoption, and transparency across multiple data points. Only entities with a trust score of 30 or above are included. Scores are updated continuously as new data becomes available.
FAQ
What are the best ai datasets in 2026?
Based on Nerq Trust Scores, the top-ranked ai datasets are listed above, scored on security, activity, documentation, and community metrics.
How are ai datasets ranked?
Nerq ranks tools using Trust Score v2, which combines security analysis, maintenance activity, documentation quality, and community adoption signals.
Are these ai datasets safe to use?
Each tool has an individual safety report. Click any tool name to see its detailed trust analysis.
What does a Nerq Trust Score of A mean?
An A grade (80-89) means the entity has strong signals across security, maintenance, documentation, and community adoption. A+ (90-100) is the highest possible rating.
How does Nerq evaluate ai datasets?
Nerq analyzes ai datasets across multiple dimensions including security vulnerabilities, license compliance, maintenance activity, documentation quality, and community adoption. Each dimension is scored independently and combined into an overall Trust Score (0-100).