Deliberative alignment: reasoning enables safer language models
Deliberative alignment: reasoning enables safer language models Introducing our new alignment strategy for o1 models, which are directly taught safety specifications and how to reason over them.
Log in to bookmark articles and create collections
Isabella News