Lessons from AVs on safety in end-to-end pipelines
Recent developments in autonomous vehicles on recognizing and handling distribution shift
This short post covers a couple of recent updates from the autonomous vehicle (AV) industry with connections to broader and more general safety in robotics.
Recognizing performance deterioration
This Verge article from March 19 reports that there could be an impending recall of Tesla’s Full-Self Driving (FSD) service. I’m not interested in making any judgments about self-driving capability, but rather whether the root cause has anything we can learn from in broader robotics.
The issue appears to be that the system didn’t know when it wasn’t working well (causing the issues in the NHTSA filing), or that it did and didn’t notify the driver (which is unlikely, so we’ll assume the former).

This phenomenon isn’t isolated to AVs. The latest article in my Vision-Language-Action (VLA) robotics pipeline series went hands-on into debugging one, and while we found some techniques that can aid developers, they didn’t directly help at inference time. Item 1 in Rui Xu’s candid post-mortem of K-Scale Labs mentions the pitfalls of trusting a “large model” vs. dedicated safety features. Recent papers on VLAs mention the fragility when moving away from the training distribution (e.g. Fang et al Jun 2025, Hu et al Jan 2026).
Potential solutions: redundancy, confidence, architecture
NVIDIA recently announced their new Alpamayo model and accompanying AV stack as a reference open model and toolchain. During the CES 2026 keynote, Jensen Huang said something intriguing about safety:

This parallel or hybrid architecture with a classical stack and a policy arbitrator were also covered in this CounterPoint research article. Interestingly, I can’t find references from NVIDIA themselves about this parallel system other than Jensen’s keynote — it’s possible it is just early in development.
A related approach is to have the VLA output some kind of confidence (vs. a separate “policy arbitrator”). Zollo et al (Dec 2025) formalizes the problem of confidence calibration for VLA policies, describes how to extract confidence estimates from contemporary VLA architectures, and notes that current VLAs lack a reliable mechanism for quantifying the uncertainty of their chosen action sequences. It also introduces two potential remedies: prompt ensembles and action-wise Platt scaling.
Lastly, inserting some debuggable interfaces into end-to-end pipelines can facilitate inspection and safety — lower-level controllers can apply dedicated safety constraints based on the information passed down from a higher-level controller. This appears to still be possible in most successful humanoid robotics demonstrations of today due to a combination of factors. Keeping that architectural feature around may have long-standing benefits, based on current events in the AV industry!
Thanks for reading! I have been working on the next part of the end-to-end pipeline series, with a deep dive into the action head and closed-loop behavior. If you liked this post, please share and subscribe.


