Atropos Integration $2500 Bounty

aboozle · June 4, 2025, 9:49pm

Forwarding from the Atropos GitHub:

‘Hey! Nous Research here - we’re the authors of an LLM RL environments repo called Atropos which is designed to provide rollouts for multi-environment runs, and where each individual env can be single-turn, multi-turn, or multi-agent, R1-zero style, or have a custom chat template. Furthermore, environments can define token-level advantages and so are not necessarily tied to the same RL training algorithm’

Full description of the bounty here: Atropos integration ($2500 bounty) · Issue #1782 · verl-project/verl · GitHub

JustKitting · December 26, 2025, 5:55pm

Hey, wanted to reach out because i’m getting close to being done with the bounty, but i have a couple of questions for getting it cleaned up.

Currently it seems like the generate endpoint with raw tokens on verl’s custom vllm instance isn’t hooked up so im doing something a bit hacky and tokenizing on both sides after passing the string response from the model which relies on the right tokenizer being used on both sides ( works, but you CAN get the raw tokens and work with those which would be a lot better ) and just wanted to see if there were any issues with that. ( you can see the TODO at around ~450 in the vllm_async_server.py in verl )
I implemented token masking, but do you want a working example of a model trained on the token mask? i can set one up, but i’m not sure if you want that on the recipe’s end or in another repo for cleanliness ( or if you have one available i could just just leverage in the atropos repo )

link to the current state of things ( i still have to clean up some code so this is 100% not cleaned up, but feel free to message me in the discord or here if you have any critiques ) :

teknium · January 3, 2026, 8:22pm

Heya thats fine, you can likely do what we did with Tinker integration, see that repo here:

Feel free to reach out to me or dmayhem on discord for real time discussion too!

Topic		Replies	Views
Announcing Atropos - The First of Three Fates of Reinforcement Learning Updates & Announcements	0	137	May 4, 2025
About the Model Releases category Model Releases	0	18	September 17, 2025
Psyche's Next Training Run Training Runs training	0	227	March 17, 2025
About the Prompting category Prompting	0	6	February 5, 2026
Announcing the Nous Portal Updates & Announcements	1	337	March 12, 2025
Prompt phrasing on model performance, output quality Prompting fine-tuning , llm-research	1	81	December 4, 2025
About the Training Runs category Training Runs	0	11	February 5, 2026
About the Training category Training	0	5	February 5, 2026

Atropos Integration $2500 Bounty

Related topics