This subforum is dedicated to measurement and benchmarking. Discussions may include datasets, benchmarks, evaluation methodologies, reproducibility, comparisons between models or systems, and critiques of existing metrics.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| About the Training category | 0 | 4 | February 5, 2026 | |
| About the Model Releases category | 0 | 15 | September 17, 2025 | |
| About the Inference category | 0 | 3 | February 5, 2026 | |
| About the Prompting category | 0 | 2 | February 5, 2026 | |
| About the Theory category | 0 | 2 | February 5, 2026 | |
| About the Architecture category | 0 | 6 | February 5, 2026 | |
| About the Compute Contributions category | 0 | 9 | February 5, 2026 | |
| Nous Research presents Hermes 4, our latest line of open-source models | 0 | 67 | September 22, 2025 |