Benchmark Tools

The simplest way to find the best AI tools!

Free
Freemium
Web
Android
iOS
Featured
New
Verified
Hiring
NSFW
VMLU favicon
VMLU
Web
benchmark evaluation

A benchmark suite for evaluating large language models focusing on Vietnamese language understanding.

HEAR Benchmark favicon
HEAR Benchmark
Web
audio evaluation

A benchmark for evaluating audio representations across diverse tasks in speech, music, and environmental sound.

DataComp favicon
DataComp
Web
data optimization

A machine learning benchmark focusing on optimizing data selection for model training.

WMDP Benchmark favicon
WMDP Benchmark
Web
risk evaluation

A benchmark for measuring and reducing malicious use of LLMs through unlearning methods.

BEIR favicon
BEIR
Free
Web
model evaluation

A Heterogeneous Benchmark for evaluating Information Retrieval models across diverse datasets.

FedScale favicon
FedScale
Web
federated learning

A scalable and extensible federated learning engine and benchmark.

PHYRE favicon
PHYRE
Web
physical reasoning

A benchmark for physical reasoning with 2D puzzles.

Cosine AI - Genie favicon
Cosine AI - Genie
Web
software engineering
Hiring (3 jobs)

Cosine's Genie is a cutting-edge AI software engineering model with exceptional coding abilities.