This subforum is for discussions about model structure and representation. Topics include attention mechanisms, architectural innovations, scaling patterns, inductive biases, and new ways of organizing computation inside neural networks.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| About the Training category | 0 | 4 | February 5, 2026 | |
| About the Prompting category | 0 | 2 | February 5, 2026 | |
| About the Theory category | 0 | 2 | February 5, 2026 | |
| About the Inference category | 0 | 3 | February 5, 2026 | |
| About the Evaluation category | 0 | 4 | February 5, 2026 | |
| DSA attention (DeepSeek Sparse Attention) | 1 | 58 | January 17, 2026 | |
| About the Model Releases category | 0 | 15 | September 17, 2025 | |
| Nous Research presents Hermes 4, our latest line of open-source models | 0 | 67 | September 22, 2025 |