8 Comments
User's avatar
Nir Ronen's avatar

Really enjoyed this post -- especially the analogy to computer architecture.

Coming out of the 90s, I saw a similar pattern: a lot of experimentation at the ISA level, followed by consolidation around x86 and ARM once they became “good enough.” After that, innovation shifted up the stack (multicore, parallelism, accelerators), and even academic research started aligning with what incumbents were building and funding.

I’m curious how far you think robotics is along that same curve.

In conversations I’ve had, a recurring theme is the long tail of exceptions in real-world deployments—cases where systems don’t fail cleanly, or don’t even realize they’ve failed. That feels quite different from compute, where abstraction boundaries are tighter and reliability is more predictable.

Also, in compute, convergence was accelerated by massive scale—hundreds of millions of CPUs shipped, which exposed edge cases and forced the ecosystem to mature quickly. Robotics seems to still be far from that level of deployment.

Do you think robotics platforms are actually close to being “good enough” for commoditization to take hold—or do the combination of long-tail reliability challenges and lower deployment volumes slow down (or fundamentally change) that convergence?

Avik De's avatar

Thanks, and great points.

- For maturity vs. commoditization, while it's true that the maturity is very low (compared to computer engineering especially), our point was that it's even less mature if built from scratch in a lab. That gap is probably going to keep getting wider, even while deployed robots are maturing.

- Your point about "academic research started aligning with what incumbents were building and funding" is actually really interesting. Chris actually had some further observations about this that we removed from this article, but might write a follow-on in the next article. The gist is similar to what you said, when labs are getting used to fine-tuning commercial models (or building commercial hardware), the research gets biased toward the peculiarities of those platforms. A related effect is that because there is commercial hype around humanoids right now, a lot of research is moving toward the humanoid form factor.

Chinmay Adhvaryu's avatar

“domain-specific diversification if the largest companies with the largest datasets corner the end-to-end behavior cloning approach.”

Would you say this is already happening when we look at aerial robotics, AVs and other forms of robotics?

Avik De's avatar

That sounds interesting! Do you mind expanding a bit more on what you're noticing in those other research areas? My personal familiarity with cutting edge research in aerial robotics and AVs is more limited than other form factors.

Chinmay Adhvaryu's avatar

First, really enjoyed this post. I took some robotics courses in undergrad about 20 years ago and have been trying to catch up on where the field has gone. This was a great entry point back in.

Also I'm not a robotics researcher, just someone curious about the field and trying to build my understanding. So take this observation with that context!

You frame domain-specific diversification as something that may happen if the largest companies corner end-to-end behavior cloning. I maybe getting the framing wrong and you meant this for humanoid or more general purpose robotics. What I was noticing is that this pattern might already exists in domains like AVs and aerial robotics.

Waymo and Tesla have converged on very different architectures, and it seems like a direct consequence of building moats around fundamentally different proprietary datasets and sensor suites. Autonomous drones have their own models and I haven't seen research in using VLMs for autonomous drones. The physics, weight constraints, and sensor profiles seem to have forced their own model diversity from early on. All of them are more up the stack innovation imo.

My (possibly naive) read is that dataset and sensor constraints naturally push toward architectural divergence as you point out, and these domains might be early proof that the pattern you're predicting is real. Does that suggest the diversification in general-purpose robotics is already beginning at the margins? Or do you see it as still waiting on a clearer inflection point like humanoids hitting a wall with end-to-end approaches?

Either way, really appreciate you writing this and would love to hear your views on what I just shared.

Avik De's avatar

In terms of form factor, there's definitely more movement toward humanoid form factors since it isn't "solved" (as flying robots might be seen to be) and there's a lot of promise about them being general purpose (i.e. could hypothetically do everything). So, at the moment, the form factor is still kind of converging. I think it'll probably take some time to see if the diversification pans out. There are small tendrils, though. Unitree just today announced a leg-less model (https://www.linkedin.com/posts/unitreerobotics_unitree-humanoidrobot-embodiedai-activity-7455580649131802624-aWNA). There are companies like Cobot as well, but they are relatively rare.

Similarly, in terms of algorithm I think the end-to-end behavior cloning (even if indirectly via "world model" approaches) is probably going to be the most common approach for the near future. There are obviously many other research approaches, but they are relatively less common, and we'll have to wait to see if there's an inflection point like you say.

Roy Xing's avatar

On the topic of IL/BC/VLAs dominating manipulation and locomotion. I might be wrong, but I was under the impression that despite recent advances these two application fields are still separate in their approaches? For example, I haven't seen any VLAs full control a whole loco-manipulation task, giving action chunks for both the arms and legs. Vice versa, the Omni-* related lines of works (OmniXtreme, OmniRetarget, similar works not named Omni-...) use IL/BC for locomotion and even gross loco-manipulation tasks, but haven't been able to cross over into fine-grained manipulation that VLAs dominate.

Maybe it's just cope on my end, but I think (or hope) that there's valuable research to be done in domains in which we don't have ample data (and might never will), such as control policies for non-humanoid morphologies. Perhaps, despite the hype on "one policy to control them all" there will be specialization in morphology control for applications? Maybe that's wishful thinking lol.

Looking forward to more of these articles!

Avik De's avatar

You're correct that VLAs have been applied more toward manipulation applications than whole body coordination and control, and that there are other dedicated ways to solve locomotion control. However, in my perception at least, these seem to be moving toward a unified strategy of behavior cloning + end-to-end models, even if the machinery doesn't quite allow for everything to work together with a single model yet.

Your point about the humanoid morphologies is good too -- the current approaches also appear to be increasingly targeted towards those, tied up with commercial hype. That was part of the observation in the last part. If/when there is more non-humanoid research again, we could see innovation in the overall data & method.