Post by account_disabled on Sept 14, 2023 10:56:24 GMT
AI is much better at producing accurate results because it is a goal-pursuing AI process. Reinforcement learning operates repeatedly toward a desired goal and produces the best answer that is closest to the goal. “LLMs, on the other hand, are not designed to be repetitive or goal-driven,” says Lodge. “It is designed to provide ‘good enough’ one-shot or few-shot answers,” he emphasized.
A 'one-shot' answer is the first answer the model Phone Number List generates by predicting a series of words from the prompt. The 'few-shot' approach provides additional samples or hints to help the model make better predictions. Additionally, LLM allows for different answers to the same question because it accepts a certain degree of randomness to increase the likelihood of a better response.
This does not mean that the LLM camp ignores reinforcement learning. GPT-4 accommodates “Reinforcement Learning with Human Feedback (RLHF)”. That is, the core model is trained by human operators to prefer some answers over others, but does not fundamentally change the answers it generates from scratch. Lodge said that, for example, an LLM could produce the following answer to complete the sentence “Wayne Gretzky likes ice.”
1. Wayne Gretzky loves ice cream.
2. Wayne Gretzky loves ice hockey.
3. Wayne Gretzky loves ice fishing.
4. Wayne Gretzky loves ice skating.
5. Wayne Gretzky likes ice wine.
Here, a human operator could rank the answers by thinking that because Wayne Gretzky is a legendary Canadian ice hockey player, he is likely to like ice hockey (or ice skating). Rankings from human operators and responses from more people are used to train the model. One thing to note is that GPT-4 doesn't pretend to know Wayne Gretzky's preferences exactly, it simply provides the most likely answer to complete the given prompt. Ultimately, LLM is not designed to be highly accurate or consistent. Lodge noted that all this means is that reinforcement learning outperforms generative AI when it comes to applying AI at scale.
Applying Reinforcement Learning to Software
What about software development? Many developers are experiencing increased productivity by using generative AI-based tools such as GitHub's Copilot and Amazon's CodeWhisperer. These tools predict what code is likely to come next based on the code before and after the point of code insertion in the integrated development environment.
In fact , David Ramel of Visual Studio Magazine said that the latest version of CoPilot already generates 61% of Java code. For those who worry that the software developer profession will disappear, these tools require human oversight to check the completed code and edit it to ensure it compiles and runs properly. In fact, autocompletion has been a representative feature of IDEs since the early days of IDEs, and code generators including Copilot have greatly increased the usefulness of that feature. However, this is not the case with the large-scale autonomous coding required to write 61% of Java code.
Lodge said reinforcement learning can accurately perform large-scale unsupervised coding. Of course, there is a reason why Lodge said this. DeepBlue launched 'Cover', a commercial reinforcement learning-based unit test writing tool in 2019. Cover lets you automate complex and error-prone tasks at scale by writing entire unit tests without human intervention.
Considering these facts, wouldn't it be possible to say that Lodge's argument is biased? Of course it is. But Lodge also has a wealth of experience to back up the argument that reinforcement learning can outperform generative AI in software development. Currently, DeepBlue uses reinforcement learning to explore all possible test methods, automatically writes test code for each method, and selects the most appropriate test among these tests. The average time it takes for this tool to generate a test for each method is 1 second.
Lodge says that if your goal is to automate writing 10,000 unit tests for a program that no one understands, reinforcement learning is the only realistic solution. “The LLM is no competition. “There is no way for humans to effectively oversee and modify code at this scale, and making the model larger and more complex does not solve the problem.”
The conclusion is this: The strongest advantage of LLM is that it deals with general language. They can also perform language tasks that have not been explicitly learned. In other words, it is useful for many tasks, including content creation (copywriting). “But that doesn’t mean that an LLM can replace reinforcement learning-based AI models,” Lodge said. “Reinforcement learning is more accurate, more consistent, and works at scale.
A 'one-shot' answer is the first answer the model Phone Number List generates by predicting a series of words from the prompt. The 'few-shot' approach provides additional samples or hints to help the model make better predictions. Additionally, LLM allows for different answers to the same question because it accepts a certain degree of randomness to increase the likelihood of a better response.
This does not mean that the LLM camp ignores reinforcement learning. GPT-4 accommodates “Reinforcement Learning with Human Feedback (RLHF)”. That is, the core model is trained by human operators to prefer some answers over others, but does not fundamentally change the answers it generates from scratch. Lodge said that, for example, an LLM could produce the following answer to complete the sentence “Wayne Gretzky likes ice.”
1. Wayne Gretzky loves ice cream.
2. Wayne Gretzky loves ice hockey.
3. Wayne Gretzky loves ice fishing.
4. Wayne Gretzky loves ice skating.
5. Wayne Gretzky likes ice wine.
Here, a human operator could rank the answers by thinking that because Wayne Gretzky is a legendary Canadian ice hockey player, he is likely to like ice hockey (or ice skating). Rankings from human operators and responses from more people are used to train the model. One thing to note is that GPT-4 doesn't pretend to know Wayne Gretzky's preferences exactly, it simply provides the most likely answer to complete the given prompt. Ultimately, LLM is not designed to be highly accurate or consistent. Lodge noted that all this means is that reinforcement learning outperforms generative AI when it comes to applying AI at scale.
Applying Reinforcement Learning to Software
What about software development? Many developers are experiencing increased productivity by using generative AI-based tools such as GitHub's Copilot and Amazon's CodeWhisperer. These tools predict what code is likely to come next based on the code before and after the point of code insertion in the integrated development environment.
In fact , David Ramel of Visual Studio Magazine said that the latest version of CoPilot already generates 61% of Java code. For those who worry that the software developer profession will disappear, these tools require human oversight to check the completed code and edit it to ensure it compiles and runs properly. In fact, autocompletion has been a representative feature of IDEs since the early days of IDEs, and code generators including Copilot have greatly increased the usefulness of that feature. However, this is not the case with the large-scale autonomous coding required to write 61% of Java code.
Lodge said reinforcement learning can accurately perform large-scale unsupervised coding. Of course, there is a reason why Lodge said this. DeepBlue launched 'Cover', a commercial reinforcement learning-based unit test writing tool in 2019. Cover lets you automate complex and error-prone tasks at scale by writing entire unit tests without human intervention.
Considering these facts, wouldn't it be possible to say that Lodge's argument is biased? Of course it is. But Lodge also has a wealth of experience to back up the argument that reinforcement learning can outperform generative AI in software development. Currently, DeepBlue uses reinforcement learning to explore all possible test methods, automatically writes test code for each method, and selects the most appropriate test among these tests. The average time it takes for this tool to generate a test for each method is 1 second.
Lodge says that if your goal is to automate writing 10,000 unit tests for a program that no one understands, reinforcement learning is the only realistic solution. “The LLM is no competition. “There is no way for humans to effectively oversee and modify code at this scale, and making the model larger and more complex does not solve the problem.”
The conclusion is this: The strongest advantage of LLM is that it deals with general language. They can also perform language tasks that have not been explicitly learned. In other words, it is useful for many tasks, including content creation (copywriting). “But that doesn’t mean that an LLM can replace reinforcement learning-based AI models,” Lodge said. “Reinforcement learning is more accurate, more consistent, and works at scale.