Improving mathematical reasoning with process supervision
We've trained a model to achieve a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (βprocess supervisionβ) instead of simply rewarding the correct final answer (βoutcome supervisionβ). In addition to boosting performance relative to outcome supervisio...
Log in to bookmark articles and create collections
Isabella News