Transformers With Chain Of Thought: Extending Computational Power
Transformers with "chain of thought" improve reasoning power, but extent depends on length of intermediate generation: logarithmic steps only slightly extend standard transformers, while linear steps enable recognition of all regular languages.
This is a Plain English Papers summary of a research paper called The Expressive Power of Transformers with Chain of Thought. If you like these kinds of analysis, you should subscribe to the AImodels.fyi newsletter or follow me on Twitter. Overview Researchers have found that standard transformer models, which provide immediate outputs, are limited in their ability to solve certain simple reasoning problems. However, transformers can improve their reasoning by generating and conditioning on a sequence of intermediate tokens before answering, known as a "chain of thought" or "scratchp...