AI Memory And Context: Open Source, DeepSeek, Meta, And Model Research

1 year ago 46

System of neurons with glowing connections on black background

getty

Ask ten people how large language models work, and you’ll get ten answers. Or maybe you’ll only really get two or three answers: the rest of your respondents will respond with some version of “don’t know, don’t care” or lament a lack of general understanding of the “guts” of these pervasive systems.

It’s rare that we’ve encountered such ubiquitous technology, while recognizing that the majority of people don’t really know how it works. To be fair, though, the stage was set already for this kind of knowledge gap in the tech world. Not too many people felt like explaining what the cloud was 10 or 15 years ago, and when it comes to crypto, most people are still averse to trying to describe the blockchain in any great detail.

With AI, though, it’s different. The stakes are different – the impact on our society and our personal lives is different. So it helps to know a little more about how AI agents, LLMs and neural nets, are making decisions and processing what’s around them.

I’m going to go into a few basic elements of what I’ve heard people talking about using some specific quotes from renowned computer scientist Yann LeCun that I got from a recent event. LeCun had some very important things to say about the real state of AI development, and what it means for our world.

A Collection of Computers

Here’s the first major idea that people close to the industry have been presenting to audiences for the last few years.

In a sense, it’s a disclaimer about the rapid emergence of AI as a sort of human surrogate. It’s a qualifier for our general sense that there are no roadblocks to AI getting more and more human over time.

If you look at the result of big new models like OpenAI o1, or the newest one from DeepSeek, you might think that AI can leapfrog over the bar of human intelligence, just by adding power and tokens.

But according to some of the best theorists, that would be wrong, because rather than being linear, true intelligence is a collection of systems working together to produce a very precise and robust result.

In Marvin Minsky‘s ‘Society of the Mind,’ the famous theoretician comes up with a broad statement about artificial intelligence itself, as opposed to the human mind. The human mind, he suggests, is not one computer, but many small interconnected agents that constitute a “society of mind,” individual components all working together toward a greater whole result.

Think about that: people in society working together toward a coherent end. That’s the model that Minsky puts forth - not just one very powerful supercomputer becoming God-like in its intellect, but a “village” of interconnected parts or modules, like cells in an organism.

Yann LeCun echoes this, in commenting on the development of LLMs.

“We are nowhere near being able to reproduce the kind of intelligence that we can observe in not humans, but (even) animals,” he says. “Intelligence is not just a linear thing, where, you know, when you cross the barrier, you have human intelligence, superintelligence. It's not like that at all. It's a collection of skills, and an ability to acquire new skills extremely quickly, or even to solve problems without actually learning anything.”

When you start to think about AI like this, you change the way you frame it, and it makes a big difference.

Memory and Context

LeCun also suggests that these systems need a “persistent memory” to be powerful. And that’s something that experts have been telling us again and again as they work on newer AI models.

You can talk about this in terms of memory, or in terms of context – for example, in the context windows that the systems use to perceive what they’re working on.

To do this, let’s talk about two forms of memory – parametric memory, and working memory.

You can sort of map these two types of memory to the two different types of human memory that make up our overall cognitive operations – short-term memory and long-term memory. The parametric memory would be the long-term memory – the historic context and knowledge base that undergirds the long-term thinking of the machine. The working memory would be the short-term memory – and this would basically correspond to a context window and context clues that the machine learns to think in real time. (see more at this resource from the Association of Data Scientists)

Both of these have their own importance in the AI system, just like they do in the human brain. Neurologists will talk about the difference in function between long-term and short-term human memory – and theoreticians should be talking about the difference between parametric and working memory in AI.

A New Framework

If, as Minsky and LeCun point out, the human mind is more elaborate than a single supercomputer, what’s the takeaway?

“We need a completely new architecture,” LeCun says. “It's not going to happen with LLMs. It's not going to happen with anything we've done so far. We need … to have common sense, … if you give a standard puzzle to an LLM, it will just regurgitate the answer.”

One thing he suggest is the emergence of something called “world models” where AI starts to build the context that it needs to progress, to become more cognitively like a human being.

“What happens if you have some idea of the state of the world, and you imagine an action it might take?” he asks. “Can you predict the next state of the world that will result from this action? … If you have such a system, then you might be able to predict what the sequence of action will produce, figure out, have an objective that figures out whether your task is being fulfilled, and then, by optimization, figure out a sequence of actions that satisfies that task. They call this objective, driven AI. So that's what we're working on… and maybe we'll have some tangible results that can be advertised in the wider world within two to three years, and maybe we'll be on a path toward (AI human-level intelligence) within five (years,) but it's almost certainly harder than we think.”

The Primacy of Open Source Models

I would not end this without referring to LeCun’s current remarks on open source models.

The business world (and the investing world) is having a fit over DeepSeek’s new announcement of a prime open source AI model, upending plans by U.S. companies like Meta. But instead of seeing it as “U.S. vs. China,” LeCun urges us to see it as proprietary vs. open source. And that is also going to help ground our expectations of what will happen this year and beyond.

Let’s apply these frameworks to our study of AI as we move forward.

Read Entire Article