Anyone familiar with how we ourselves learn knows that this won’t lead very far in terms of intelligence. Mothers narrate things to us as children, giving descriptions of how they work; children’s books and TV shows act as supplementary material. Then, we go out and play with these things, discovering the meaning of the descriptions for ourselves. Over thousands of iterations we’ll recognize that a specific word pops up in similar contexts, and learn how it describes their similarities with our own internal abstractions. Our very first meanings relate our actions to their effects on the environment; LLMs know little of this.
This can be bootstrapped. Through sheer largesse current models know some things, and can create plausible simulations of uncomplicated physical events. It’s seen enough gamedev projects to know how to make a simple bouncing ball simulation. This will create the thing to be narrated, and give the narration. The child, trained on the simulation rather than the code, learns what the narration means in a real sense. Once it’s mastered it, it teaches the narrator, as it knows more meaningfully than it, who the applies this imperceptibly improved knowledge in other domains. And so forth.