Education scientist have embraced the change of focus form a front loaded teaching format to a focus on the learning process of pupils or students. Computer scientists have adopted this strategy in shifting from just knowledge data bases and predictions of likely next words in a sentence or paragraph to learning models. Deep Seek has surprised most large language models by its successful strategy to focus on learning and reasoning. So-called reinforcement learning is key to the programming of next generation AI models. Reasoning in most cases builds on multiple step sequences in answering a more complex question. The model then returns the answer and the steps (reasoning) applied. There is a debate whether summaries or translations of texts would need the reasoning function of AI models. Most of the time reasoning might not be necessary or even counterproductive, if the translation would try to correct an obviously faulty reasoning in a text.
Maybe, imagine also that an ordinary LLM would translate a text containing fake news. A correction loop which involves a cross-checking with reliable external sources like any encyclopedia or wikipedia would complicate the answering procedure of any text. However, this is a bit how the process of reinforcement learning with human feedback (RLHF) works. Reinforcement learning applies a form of accuracy reward, which guides the learning or answering process with checks against mathematical or programming accuracy. Just think of basic logic to be respected in the answer. Similarly, a formal accuracy control checks against mathematical models and ensures the answer is returning a text with a normal sentence structure or numbering of reasoning steps, an intro and concluding phrase, like we were all asked to do in school or universities. The amount of corrections from humans is reduced quite a lot and the computing resources are also only a fraction of the previous LLMs, which are retrieving answers from enormous databases or gigantic data factories consuming lots of energy in the processing of requests. Remember the movie on Kasparov, the world chess champion, who got beaten by a computer from IBM that did not only have a huge stock of previous games and tournaments, but could make judgments on positions and promising strategies to pursue. Don’t be surprised if a DeepSeek answer is superior to what our own mind and reasoning is capable of. Reinforcement learning is a learning tool, which we also may apply, if we deem it appropriate or just as one way of coming to an answer. (useful reference: Sebastian Raschka, Building a LLM from scratch, Manning).
(Image, ChatGPT, 2 humanoid robots are thinking and discussing how to repair a notebook which is sitting on a workbench).