As I have said elsewhere previously, scalability of compute is not enough to deliver inflexion points in technology progress towards AGI.
The brain has a great variety of different neuronal glial cell types. Combination of those types in Hebbian/neuroplastic neural circuits and Hebbian/neuroplastic assemblies of neurons makes for even more heterogeneity of structure, and therefore of function. LLM adaptors and transformers alter architecture of ANN/LLM machine learning systems in such a way as to make the structure/function coupling of such system more heterogeneous, but what is likely most required is new 'neuron' types (entirely new ML models and architectures) and new ways of emulating neuroplasticity. 
In a real and powerful sense adaptors which add significant new functions and new ways of doing functions are analogous to new neuron types. That's a big hint, and the approach to new 'neuronal' architecture delivering new functionality must be further pursued.
The good news is that multiple realisability is evidently at least partly true. ML/ANNs don't have to work exactly like human neurology to produce superior results to the human brain, just as an F-16 doesn't have to produce flight like a sparrow does to leave the sparrow in the dust. However, they likely need more ways to emulate neuronal structure-function heterogeneity and neuroplastic combination thereof than they currently have.
WHERE WOULD I START TO GET THE NEXT MAJOR INFLEXION POINT TOWARDS AGI?
What is the big, obvious hint?
[W]hatever the human brain does to provide reasoning and the ability to hypothesise: (further) emulate that.
Mimicking nature within the permissable limits of our technology and materials seems to be a winning strategy. It worked for machine learning. What Geoffrey Hinton and the team at Google did in the 1980s-90s didn't initially look like it was working, but now we have ChatGPT 4.5 and generative video which makes entire production studio departments completely redundant (and let's be honest: most visual artists are largely out of a job.)
So, whatever the human brain does to provide reasoning and the ability to hypothesise: emulate that.
How? Emulate Neurological Structural-Functional Heterogeneity, and Neuroplasticity
1. Emulating neuronal and neurological structural-functional heterogeneity including emulating neuron types and brain modules, at least in terms of functional roles (since exact emulation of function is perhaps not as straightforward as hopeful multiple-realisability might suggest.)
2. Better emulating Hebbian neuronal and neurological neuroplasticity.
How Would I Start (Very Specifically)?
Move #1: Generative imagery plus language processing
The first (obvious) thing that occurs to me is that generative image ML systems and LLMs for text are doing two different jobs and each job respectively maps fairly coherently to each of two large categories of brain function which are combined in human cognition, mental imagery and language processing:
- ANN generative image systems do something akin to mental imagery (the lateralgeniculate nucleus, striate cortex, PFC, and central executive of the brain, among other components), albeit using very different physical apparatus. 
- ANN LLMs do something very like what the language centres (phonological loop and central executive) of the brain do. 
The traditional and still current basic model of the function of the brain in combing these looks like this:
Now, except for people with aphantasia (the inability to do mental imagery), all of us should be able to agree that we often think and reason using both language and mental imagery. Recent experiments in neuropsychology support this.
It occurs to me that combining text prompting and image generation is a hint, but not quite close enough. What is required is something more like producing text in images based upon prompts. In other words: the system we're looking for to get started start is a generative image system that can accurately produce properly spelled phrases and text in images.
The overarching objective is to emulate whatever human neurology does to 'seemlessly' combine mental imagery and language processing in conceptualisation and reasoning. (Although the existence of aphantasic humans might indicate this is not necessary.)
This has been a difficult ask for a while, but it evidently not impossible. I would approach the people at Luma Labs and ask them how their Photon generative image system does this very accurately (>95% accuracy) for phrases like "The Chronicles of Xeo Woolfe." It's that system which needs to be considered as a primary candidate for producing a new model based upon mental imagery plus something like mentalese. The idea is to get the architecture of the system to combine mental imagery and propositional/sentential statistical inference. The point is not to make an image from a prompt, but to make an accurate image of a sentence.
While the exact architectural details of the Luma Photon models are proprietary to Luma Labs, it's clear they are leveraging their own advancements in machine learning to power their generative image system. 
They emphasize a novel architecture, efficiency, quality, and features tailored for creative workflows.
Generative image systems already do what we could fairly call mental imagery: producing many very good visual representations of something requested linguistically and propositionally by synthesising 'remembered' statistical 'representations'.
Move #2: Mental imagary using representations which are not just weighted statistical inferences.
Ideally, the mental imagery component should probably involve more than one conception and implementation of representations: not just a statistical inferential approximation of representation based upon weighted mid teir nodes.
What's needed next? Concepts. Internal Representation for Full Conceptualisation.
To begin to do emulation of cognitive conceptualisation and concepts it's likely that what's needed is to emulate what human neurology is doing in Baddeley's model of working memory:
There are no generative image systems currently which can fulfil the following kind of prompt:
"Make a very simple and sparse psychological/neurological scientific diagram of Baddeley's model of working memory including only the episodic buffer, visuospatial sketchpad, and phonological loop."
If there were, it would suggest that this requirement had already been partly answered.
Gemini Flash 2.0 came closest in my experiment:
And Luma couldn't do it:
And neither could ImageFX:
The Difficult Ask: Representation and Integration for Semantic Memory, Procedural Memory, and Episodic Memory.
Semantic and procedural memory emulation will likely be necessary to provide fuller concepts and conceptualisation, and this will likely require the emulation of the human neurological integration of mental imagery with language and concepts, and new ways of representing concepts will likely be necessary. So again different approaches to representation in ANNs is the likely requirement. This is the hardest ask, but again I would approach Luma Labs first, or someone with similar models.
The Next Step: Proprioception, Interoception, and What it Feels Like
The next frontier is already being approached by numerous manufacturers of AI based general purpose robots: the integration of environment navigation and body-image emulation. However, it may well be necessary for some sense of limbic and emotional cognitive processing to be emulated. I am not sure how necessary this latter addition is for reasoning. I suspect it could possibly be undeleteriously left out of the equation just like the F-16 doesn't need to flap its wings, but this is far from clear. We may end up faced with Hume's challenge that human beings are emotional decision makers, and not reasoning/rational decision makers.
As I have suggested, the first step is to emulate the way human neurology generates mental imagery and concepts, but more importantly to emulate the way it uses these in conjunction with language processing (from the phonological loop) to do conceptualisation and reasoning.