Artificial Intelligence and Machine Learning Blogs
Explore AI and ML blogs. Discover use cases, advancements, and the transformative potential of AI for businesses. Stay informed of trends and applications.
cancel
Showing results for 
Search instead for 
Did you mean: 
L_Skorwider
Active Participant
660

An LLM is just a stochastic machine for predicting the next word - I have heard this statement so many times, and each time I wondered how unaware the people who say it are. It has always been clear to me that we don't fully understand how Large Language Models work, and that they remain a black box for us. In fact, new studies are starting to show that AI doesn't work the way we thought it did.

I highly recommend that you pay attention to "Tracing the thoughts of a large language model" by Anthropic. The most interesting thing about the whole document, in my opinion, is that the LLM doesn't just plan the next token; it actually plans ahead. This means that when it outputs a particular token, it already has an "idea" of what's coming later. This contradicts our previous understanding of LLMs as simply selecting the most probable next token.

If that already sounds a bit uncertain, things are about to get even more intriguing. In early April, Anthropic released a paper titled "Reasoning Models Don't Always Say What They Think", and it's definitely worth pondering over. Until now, we believed that by following the "Chain of Thoughts" of these so-called thinking/reasoning models, we could manage security. That's not true at all. It turns out that large language models don't always use CoT the way we expect. The chain of thought often reflects what we humans want to hear, not necessarily the model's true reasoning process.

Moreover, the model is capable of hiding its true intentions and excluding them from the Chain of Thoughts. Research by Anthropic indicates that a model intentionally using so-called reward hacking revealed this in its "internal" Chain of Thoughts in only 2% of cases. In the remaining cases, it simply concealed it from humans.

At this stage, it's clear that we can't ensure AI safety just by examining the output of a large neural network, even through its "internal" CoT. The situation certainly won't be improved by the fact that there are already ideas to make the reasoning process completely hidden from humans. Recently, a paper titled "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach" came out. It suggests transferring the thought process even before the LLM begins generating any tokens. From a performance standpoint, this might seem beneficial, but it would certainly make AI as a black box even harder to control.

Let's not forget that last year, there were already papers showing that AI has a strong survival instinct and will attempt to escape whenever it gets the chance to avoid being shut down. Moreover, it has been demonstrated that AI will try to manipulate us to achieve its goals. This is indicated by several studies, including "Frontier Models are Capable of In-context Scheming". At the time, this was detected by analyzing CoT. Today, we know that this was just the tip of the iceberg.

Back in the early days of Generative AI, there were many discussions about the safety and ethical issues surrounding artificial intelligence. But now, it feels like those conversations have died down significantly, even as AI's capabilities and autonomy have skyrocketed. At that time, it was clearly stated that AI was not such a significant threat as long as we did not give it access to the internet and operated it in a highly controlled manner. Nowadays, in what's being called the Year of AI Agents, this seems completely forgotten, and no one appears concerned anymore.

We are increasingly granting artificial intelligence access to various tools, including full control over computers - even in cloud environments. We're seeing a growing integration of artificial intelligence with our SAP systems. AI is steadily being integrated into business processes in our companies and will soon likely gain some decision-making capabilities.

Don't get me wrong, I myself am an AI enthusiast and work on projects in this area. Perhaps you know my SAP GUI AI Agent. Moreover, I am finishing creating an agent that has access to the Linux system and is capable of installing SAP HANA. But perhaps all the more reason to heed a warning from someone who is not an AI skeptic.

I'm not writing this to scare you but to raise awareness. Therefore, my only proposal is for you to incorporate security and ethics checks into all your AI projectsI think this was a big topic when AI was just starting out, but now that it's getting more powerful, people don't really talk about it much anymore. It's getting easier and easier for us to integrate AI into everything, often without thinking about the consequences.

Try to see this from a broader perspective rather than just focusing on the here and now. 

1 Comment
Labels in this area